开发者

write()-ing an encoded string in Python 3.x

开发者 https://www.devze.com 2023-04-04 04:06 出处:网络
I\'ve got a unicode string (s) which I want to write into a file. In Python 2 I could write: open(\'filename\', \'w\').write(s.encode(\'utf-8\'))

I've got a unicode string (s) which I want to write into a file.

In Python 2 I could write:

open('filename', 'w').write(s.encode('utf-8'))

But this fails for Python 3. Apparently, s.encode() returns something of type 'bytes', which the write() function does not accept:

TypeError: must be str, not bytes

Does anyone know how to port the above code to Python 3?

Edit:

Thanks to all of you who proposed using binary mode! Unfortunately, this causes a problem with the \n characters. Is there any way to achieve the 开发者_如何学Gosame result I had with Python 2 (namely to encode non-ANSI characters in UTF-8 while keeping the OS-specific rendition of \n)?

Thanks!


You do not want to muck around with manually encoding each and every piece of data like that! Simply pass the encoding as an argument to open, like this:

#!/usr/bin/env python3.2

slist = [
    "Ca\N{LATIN SMALL LETTER N WITH TILDE}on City",
    "na\N{LATIN SMALL LETTER I WITH DIAERESIS}vet\N{LATIN SMALL LETTER E WITH ACUTE}",
    "fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade",
    "\N{GREEK SMALL LETTER BETA}-globulin"
]

with open("/tmp/sample.utf8", mode="w", encoding="utf8") as f:
    for s in slist:
        print(s, file=f)

Now if you the file you made, you’ll see that it says:

$ cat /tmp/sample.utf8
Cañon City
naïveté
façade
β-globulin

And you can see that those are the right code points this way:

$ uniquote -x /tmp/sample.utf 
Ca\x{F1}on City
na\x{EF}vet\x{E9}
fa\x{E7}ade
\x{3B2}-globulin

See how much easier that is? Let the stream object handle any low-level encoding or decoding for you.

Summary: Don't call encode or decode yourself when all you are doing is using them to process a homogeneous stream that's all of it in the same encoding. That's way too much bother for zero gain. Use the encoding argument just once and for all.


Open the file in binary mode, that's the least invasive way in terms of changes.

On the other hand, you could set the output file encoding with open() and avoid explicit string encoding altogether.

You might want to read the manual of the open() function.


Open the file in binary mode

open('filename', 'wb').write(s.encode('utf-8'))
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号