开发者

Problem due to double quote while parsing csv.

开发者 https://www.devze.com 2023-04-03 10:45 出处:网络
I have csv file in the follwing format, \"1\";\"A\";\"A:\"61 B & BA\";\"C\" Following is my code to read csv file,

I have csv file in the follwing format,

"1";"A";"A:"61 B & BA";"C"

Following is my code to read csv file,

with open(path, 'rb') as f:
    reader = csv.reader(f, delimiter = ';', quotechar = '"')
    for row in reader:
        print row

The problem is, it breaks row in 5 fields,

['1', 'A', 'A:61 B &amp', ' BA', 'C']

Whereas I was expecting my output to be,

['1', 'A', 'A:61 B & BA', 'C']

When I remove double quote before 61 B in the csv file, I get output as,

['1', 'A', 'A:61 B & BA', 'C'] which is perfectly fine, but why 开发者_开发问答is double quote in the middle of the field is causing problem even though delimiter and quotechar has been defined?


Your csv file is invalid. If a quote occurs inside a (quoted) string, it must be escaped by doubling it.

"1";"A";"A:""61 B & BA";"C"

would result in

['1', 'A', 'A:"61 B & BA', 'C']

How should the CSV module guess the difference between quotes that delimit an item and quotes within the item?


I suspect the double-quote should be replaced by ".


you defined a delimiter that is in use in your text: the ampersand entity has a semicolon. I'd recommend changing your delimiter to something that will not show up in the text. (like a pipe character or something.)

0

精彩评论

暂无评论...
验证码 换一张
取 消