开发者

Reading characters from an unknown character encoding

开发者 https://www.devze.com 2023-02-11 07:13 出处:网络
I have a string that came from an old database of unknown character encoding.I am having trouble encoding/filtering the string to show the correct text.

I have a string that came from an old database of unknown character encoding. I am having trouble encoding/filtering the string to show the correct text.

What the data looks like in the database: Marronnière à quatre pans

What we need the string to show up as: Marronnière à quatre pans

开发者_如何学Go

Specifically, I am having trouble parsing the string so I can display the character à (à)

This is an asp.Net 2.0 site written in VB using a Sql Server 2005 Database. Not sure if it matters, but data comes from a column with this collation: SQL_Latin1_General_CP1_CI_AS

I've tried encoding the string to various encodings in the code to no avail. I've also passed the string (encoded different ways) into a byte array to find a unique byte pattern for the bad characters without success.

Any ideas or leads would be greatly appreciated, thanks.


It sounds like the collation in the SQL Server database doesn't match the character encoding that was actually used :( It's a fairly common mistake for careless developers.

That's why the SQL Server administration tools are showing weird characters rather than the strings that you are expecting.

Possibly it is UTF-8? In UTF-8 à is represented by the bytes 0xC3 0xA8, which would be interpreted under the Windows code page Latin-1 as è. I know nothing about SQL Server collations, but it seems likely that SQL_Latin1_CP1_CI_AS is similar to Windows "Latin-1".

You either need to

  • fix up the encoding when reading from the database. Ugly and confusing for the next poor victim who has to deal with this database and code.
  • or, better, correct the data in the database so that it matches the collation. You might like to change the collation to UTF-8 or UTF-16: you will also need to change the data though.
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号