开发者

Python convert and save unicode string to a list

开发者 https://www.devze.com 2023-04-12 21:18 出处:网络
I need to insert a series of names (like \'Alam\\xc3\\xa9\') into a list, and than I have to save them into a SQLite database.

I need to insert a series of names (like 'Alam\xc3\xa9') into a list, and than I have to save them into a SQLite database.

I know that I can render these names correctly by tiping:

print eval(repr(NAME)).decode("utf-8")

But I have to insert them in开发者_如何学JAVAto a list, so I can't use the print

Other way for doing this without the print?


Lots and lots of misconceptions here.

The string you quote is not Unicode. It is a byte string, encoded in UTF-8.

You can convert it to Unicode by decoding it:

unicode_name = name.decode('utf-8')

When you print the value of unicode_name to the console, you will see one of two things:

>>> unicode_name
u'Alam\xe9'
>>> print unicode_name
Alamé

Here, you can see that just typing the name and pressing enter shows a representation of the Unicode code points. This is the same as typing print repr(unicode_name). However, doing print unicode_name prints the actual characters - ie behind the scenes, it encodes it to the correct encoding for your terminal, and prints the result.

But this is all irrelevant, because Unicode strings can only be represented internally. As soon as you want to store it in a database, or a file, or anywhere, you need to encode it. And the most likely encoding to choose is UTF-8 - which is what it was in originally.

>>> name
'Alam\xc3\xa9'
>>> print name
Alamé

As you can see, using the original non-decoded version of the name, repr and print once again show the codes and the characters. So it's not that converting it to Unicode actually makes it any more "really" the correct character.

So, what to do if you want to store it in a database? Nothing. Nothing at all. Sqlite accepts UTF-8 input, and stores its data in UTF-8 format on the disk. So there is absolutely no conversion needed to store the original value of name in the database.


Are you looking for something like this?

[n.decode("utf-8") for n in ['Alam\xc3\xa9', 'Alam\xc3\xa9', 'Alam\xc3\xa9']]
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号