开发者

Strange character rendered correctly in notepad, but as a control character elsewhere

开发者 https://www.devze.com 2023-04-09 15:26 出处:网络
I have a .csv list of businesses. The file has some strange characters in. For example, in this field: Stocktonon-Tees, the first hyphen, between Stockton and on seems to be a char开发者_Python百科act

I have a .csv list of businesses. The file has some strange characters in. For example, in this field: Stocktonon-Tees, the first hyphen, between Stockton and on seems to be a char开发者_Python百科acter with the value 6 rather than a hyphen, with the value 45. Stack overflow will probably sanatize this so you can't see it, so here is a pastebin:

http://pastebin.com/NuyyaQy9

Can anyone explain why this could be? Is it some encoding issue that I have missed? Or a corruption in the dataset?


Yes, it's almost certainly an encoding issue. A file just consists of binary data - it's how you interpret that binary data that matters. It sounds like Notepad is guessing at the originally-intended encoding, but whatever else you're using isn't.

Unfortunately you haven't said anything about what software is trying to read the file or what wrote it in the first place - but you should look at what encoding Notepad thinks it is, and work from there.

If it's your code that wrote the file out, and you get to decide the encoding, I'd recommend UTF-8 as a good general purpose, platform-portable encoding.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号