开发者

Simple perl split() and Regular Expression question [duplicate]

开发者 https://www.devze.com 2023-03-14 08:34 出处:网络
This question already has answers here: Closed 11 years ago. Possible Duplicate: How can I parse quoted CSV in Perl with a regex?
This question already has answers here: Closed 11 years ago.

Possible Duplicate:

How can I parse quoted CSV in Perl with a regex?

I am attempting to take a CSV file and import each row into an array (where each element represents a column). The format of a CSV file is very simple:

item1,item2,item3
nextrowitem1,item2,item3
"items,with,commas","are,in,quotes"

I imported the CSV file using:

open(FILE, "test.csv");
@lines = <FILE>;

Then I looped through it using:

foreach(@lines){
    @items = split(/regular expression/);
    /*Do stuff with @items array*/
}

(Note that you do not need to use split(/regular expressi开发者_JAVA百科on, $string); because split() assumes $_ if no string is supplied)

Before I tested the file using a CSV file where none of the items contained commas and the simple regular expression of split(/,/). This worked just fine, so there is nothing wrong with the file, reading it, or my loop after this regular expression. However when I hit items that contained a comma they got understandably divided like so:

1 => "items
2 => with
3 => commas"
4 => "are
5 => in
6 => quotes"

Instead of the desired:

1 => items,with,commas
2 => are,in,quotes

Can anyone help me develop a regular expression to split this array correctly? Basically if the item starts with a quote ("), it needs to wait until "," to split. If the item does not start with a quote, it needs to wait until , to split.


Try reading Text::CSV as a possible option that already does this. The problem with doing parsing of a CSV into a regular expression is that you have to look for things like "," (which you indicated) as well as just a , separation.


Just use Text::CSV_XS instead...


See my post that solves this problem for more detail.

^(?:(?:"((?:""|[^"])+)"|([^,]*))(?:$|,))+$ Will match the whole line, then you can use the matched captures to get your data out (without the quotes).

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号