开发者

Reading CSV File - invalid byte sequence in UTF-8

开发者 https://www.devze.com 2023-03-31 12:32 出处:网络
I have been using a rake file for a number of months to read in data from a CSV file.I have recently tried to read in a new CSV file but keep getting the error \"invalid byte sequence in UTF-8\".I hav

I have been using a rake file for a number of months to read in data from a CSV file. I have recently tried to read in a new CSV file but keep getting the error "invalid byte sequence in UTF-8". I have tried to manually work out where the problem is, but with little success. The csv file is just text and URLs, there were a few unusual characters initially (where the original text had fancy bulletpoints) but I have removed those and cannot find any additional anomalies.

Is there a way to get round开发者_高级运维 this problem automatically and identify and remove the problem characters?


I've found a solution to discard all invalid utf8 bytes from a string :

ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
valid_string = ic.iconv(untrusted_string + ' ')[0..-2]

(taken from this blog post)

Hope this helps.


Where abouts do you put these. I have something like this:

CSV.foreach("/Users/CarlBourne/Customers/Lloyds/small-test2.csv", options) do |row |

    name, workgroup, address, actual, output = row
    next if nbname == "NBName"
    @ssl_info[name] = workgroup, address, actual, output

    ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
    clean = ic.iconv(output + ' ')[0..-2]

puts clean

end

However it doesn't seam to work.

0

精彩评论

暂无评论...
验证码 换一张
取 消