开发者

php system, python and utf-8

开发者 https://www.devze.com 2023-04-04 06:46 出处:网络
I have a python program running very well. It connects to several websites and outputs the desired information. Since not all websites are encoded with utf-8, I am requesting the charset from the head

I have a python program running very well. It connects to several websites and outputs the desired information. Since not all websites are encoded with utf-8, I am requesting the charset from the headers and using unicode(string, encoding) method to decode (I am not sure whether its the appropriate way to do this but it works pretty well). When I run the python program I receive no ??? marks and it works fine. But when I run the program using php's system function, I receive this error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u0131' in position 41: ordinal not in range(128)

This is a python specific error but 开发者_开发百科what confuses me is that I don't receive this error when I run the program using the terminal. I only receive this when I use php's system function and call the program from php. What may be the cause behind this problem?

Here is a sample code:

php code that calls python program:

system("python somefile.py $search") // where $search is the variable coming from an input

python code:

encoding = "iso-8859-9"
l = "some string here with latin characters"
print unicode("<div class='line'>%s</div>" % l, encoding)
# when I run this code from terminal it works perfect and I receive no ??? marks
# when I run this code from php, I receive the error above


From the PrintFails wiki:

When Python finds its output attached to a terminal, it sets the sys.stdout.encoding attribute to the terminal's encoding. The print statement's handler will automatically encode unicode arguments into str output.

This is why your program works when called from the terminal.

When Python does not detect the desired character set of the output, it sets sys.stdout.encoding to None, and print will invoke the "ascii" codec.

This is why your program fails when called from php. To make it work when called from php, you need to make explicit what encoding print should use. For example, to make explicit that you want the output encoded in utf-8 (when not attached to a terminal):

ENCODING = sys.stdout.encoding if sys.stdout.encoding else 'utf-8'
print unicode("<div class='line'>%s</div>" % l, encoding).encode(ENCODING)

Alternatively, you could set the PYTHONIOENCODING environment variable. Then your code should work without changes (both from the terminal and when called from php).


When you run the python script in your terminal, your terminal is likely to be encoded in UTF8 (specially if you are using linux or mac).

When you set l variable to "some string with latin characters", that string will be encoded to the default encoding, if you are using a terminal l will be UTF8 and the script wont crash.

A little tip: if you have a string encoded in latin1 and you want it in unicode you can do:

variable.decode('latin1')

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号