I'm developing a website using python to preprocess request and开发者_开发知识库 a MySQL database to store information.
All my tables are utf8 and I also use utf8 as Content-type.
I have this code to establish connection to the db:
database_connection = MySQLdb.connect(host = database_host, user = database_username, passwd = database_password, db = database_name, use_unicode = True)
cursor = database_connection.cursor()
cursor.execute("""SET NAMES utf8;""");
cursor.execute("""SET CHARACTER SET utf8;""");
cursor.execute("""SET character_set_connection=utf8;""");
Running a simple test on my GoDaddy hosting printing the results of a simple SELECT query like this:
print results.encode("utf-8")
Shows a double encoded string. (So all non-ascii characters are transformed into two different specials). But if I leave the encode statement, it gives an encoding error for each non-ascii letter.
It sounds as though results
contains a Unicode string that was incorrectly decoded from a byte string coming from the database. I.e. when you read the data from the database, it decoded the byte string as Latin-1 rather than the UTF-8 it really is.
So if you fix the decoding of the database contents, then you should be in business.
I use something like this which I found on the internet during one of my own encoding hunts. You can keep on chaining encoding styles to find a fit.
Also, as others said, try fixing the source first. This hack is just to figure out what encoding is being actually returned. Hope this helps.
#this method is a simple recursive hack that is going to find a compatible encoding for the problematic field
#does not guarantee successful encoding match. If no match is found, an error code will be returned: ENC_ERR
def findencoding(field, level):
print "level: " + str(level)
try:
if(level == 0):
field = field.encode('cp1252')
elif(level == 1):
field = field.encode('cp1254')
else:
return "ENC_ERR"
except Exception:
field = findencoding(field,level+1)
return field
精彩评论