开发者

python encoding

开发者 https://www.devze.com 2023-04-07 05:56 出处:网络
Using mechanize, I retrieved source page of an web which contains some non-ASCII characters, such as Chinese characters.

Using mechanize, I retrieved source page of an web which contains some non-ASCII characters, such as Chinese characters.

Code goes below:

#using python2.6
from mechanize import Browser

br = Browser()
br.open("http://www.example.html")

src = br.reponse().read()  #retrieve the source of the web

print src   #print the src

Question:

1.According to the source of the page, I can see th开发者_Go百科at, its charset=gb2312, but when I print src, all the contents are correct, I mean no gibberish. Why? Does print know the src's encoding?

2.Should I explicitly decode or encode the src?


src is a unicode, which has no encoding. print (or more correctly, sys.stdout.write()) figures out what encoding to use when outputting.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号