I have code:
encoding = guess_encoding()
text = unicode(text, encoding)
when wrong symbol appears in text UnicodeDecode exception is raised. How can I silently skip exception replacing wrong s开发者_运维知识库ymbol with '?' ?
Try
text = unicode(text, encoding, "replace")
From the documentation:
'replace' causes the official Unicode replacement character, U+FFFD, to be used to replace input characters which cannot be decoded.
If you want to use "?" instead of the official Unicode replacement character, you can do
text = text.replace(u"\uFFFD", "?")
after converting to unicode.
In Python 3, you can decode a bytes object into a string using the decode method. It accepts two parameters:
encoding, which is"utf-8"by default, anderrors, which defines what to do on illegal character sequences. The default value is"strict", which raises aUnicodeDecodeError; other alternatives areignoreandreplace-- the latter replaces illegal characters with the Unicode replacement character"\uFFFD".
Therefore, you'd need to do this to decode-and-replace:
encoding = guess_encoding()
text = text_bytes.decode(encoding, errors='replace').replace('\uFFFD', '?')
As Sven Marnach pointed out in a comment, you can supply the errors argument directly to open; otherwise you'd get the decode errors while reading the file (if it falls out of the character map).
加载中,请稍侯......
精彩评论