开发者

How to solve this double encoding?

开发者 https://www.devze.com 2023-03-30 11:37 出处:网络
I\'m developing a website using python to preprocess request and开发者_开发知识库 a MySQL database to store information.

I'm developing a website using python to preprocess request and开发者_开发知识库 a MySQL database to store information.

All my tables are utf8 and I also use utf8 as Content-type.

I have this code to establish connection to the db:

database_connection = MySQLdb.connect(host = database_host, user = database_username, passwd = database_password, db = database_name, use_unicode = True)
cursor = database_connection.cursor()
cursor.execute("""SET NAMES utf8;""");
cursor.execute("""SET CHARACTER SET utf8;""");
cursor.execute("""SET character_set_connection=utf8;""");

Running a simple test on my GoDaddy hosting printing the results of a simple SELECT query like this:

print results.encode("utf-8")

Shows a double encoded string. (So all non-ascii characters are transformed into two different specials). But if I leave the encode statement, it gives an encoding error for each non-ascii letter.


It sounds as though results contains a Unicode string that was incorrectly decoded from a byte string coming from the database. I.e. when you read the data from the database, it decoded the byte string as Latin-1 rather than the UTF-8 it really is.

So if you fix the decoding of the database contents, then you should be in business.


I use something like this which I found on the internet during one of my own encoding hunts. You can keep on chaining encoding styles to find a fit.

Also, as others said, try fixing the source first. This hack is just to figure out what encoding is being actually returned. Hope this helps.

#this method is a simple recursive hack that is going to find a compatible encoding for the problematic field
#does not guarantee successful encoding match. If no match is found, an error code will be returned: ENC_ERR

def findencoding(field, level):
    print "level: " + str(level)
    try:
        if(level == 0):
            field = field.encode('cp1252')
        elif(level == 1):
            field = field.encode('cp1254')
        else:
            return "ENC_ERR"
    except Exception:
        field = findencoding(field,level+1)

    return field   
0

精彩评论

暂无评论...
验证码 换一张
取 消