开发者

Need help splitting this string of names (first name and last name pairs delimited by commas and "and")

开发者 https://www.devze.com 2023-03-31 11:38 出处:网络
I\'m using perl and need to split strings of author names delimited by commas as well as a last \"and\". The names are formed as first name and last name, looking like this:

I'm using perl and need to split strings of author names delimited by commas as well as a last "and". The names are formed as first name and last name, looking like this:

$string1 = "Joe Smith, Jason Jones, Jane Doe and Jack Jones";
$string2 = "Joe Smith, Jason Jones, Jane Doe, and Jack Jones";
$string3 = "Jane Doe and Joe Smith";
# Next line doesn't work because there is no comma between last two names
@data = 开发者_如何学Csplit(/,/, $string1);

I would just like to split the full names into elements of an array, like what split() would do, so that the @data array would contain, for example:

@data[0]: "Joe Smith"
@data[1]: "Jason Jones"
@data[2]: "Jane Doe"
@data[3]: "Jack Jones"

However, the problem is that there is no comma between the last two names in the lists. Any help would be appreciated.


You could use a simple alternation in your regular expression for split:

my @parts = split(/\s*,\s*|\s+and\s+/, $string1);

For example:

$ perl -we 'my $string1 = "Joe Smith, Jason Jones, Jane Doe and Jack Jones";print join("\n",split(/\s*,\s*|\s+and\s+/, $string1)),"\n"'
Joe Smith
Jason Jones
Jane Doe
Jack Jones

$ perl -we 'my $string2 = "Jane Doe and Joe Smith";print join("\n",split(/\s*,\s*|\s+and\s+/, $string2)),"\n"'
Jane Doe
Joe Smith

If you also have to deal with the Oxford Comma (i.e. "this, that, and the other thing"), then you could use

my @parts = split(/\s*,\s*and\s+|\s*,\s*|\s+and\s+/, $string1);

For example:

$ perl -we 'my $s = "Joe Smith, Jason Jones, Jane Doe, and Jack Jones";print join("\n",split(/\s*,\s*and\s+|\s*,\s*|\s+and\s+/, $s)),"\n"'
Joe Smith
Jason Jones
Jane Doe
Jack Jones

$ perl -we 'my $s = "Joe Smith, Jason Jones, Jane Doe and Jack Jones";print join("\n",split(/\s*,\s*and\s+|\s*,\s*|\s+and\s+/, $s)),"\n"'
Joe Smith
Jason Jones
Jane Doe
Jack Jones

$ perl -we 'my $s = "Joe Smith and Jack Jones";print join("\n",split(/\s*,\s*and\s+|\s*,\s*|\s+and\s+/, $s)),"\n"'
Joe Smith
Jack Jones

Thanks to stackoverflowuser2010 for noting this case.

You'll want the \s*,\s*and\s+ at the beginning to keep the other branches of the alternation from splitting on the comma or "and" first, this order appears to be guaranteed as well:

Alternatives are tried from left to right, so the first alternative found for which the entire expression matches, is the one that is chosen.


Before split, replace and with a ,:

$string1 =~ s{\s+and\s+}{,}g;
0

精彩评论

暂无评论...
验证码 换一张
取 消