开发者

egrep regular expression works within PHP, but doesn't work at unix shell - escaping issues?

开发者 https://www.devze.com 2023-04-11 03:17 出处:网络
I think my problem has something to do with escaping differences between using a regex within PHP versus using it at Bash commandline.

I think my problem has something to do with escaping differences between using a regex within PHP versus using it at Bash commandline.

Here is my regex that is working in PHP:

$emailregex = '^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})$';

So I try giving the following at commandline and it doesn't seem to match anything. (where emails.txt is a long plain text file with thousands of (possibly badly-formed) email addresses, one per line).

 [root@host dir]# egrep '^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})$' emails.txt

I have tried surrounding the regex with double-quotemarks instead of single-quotemarks, but it made no difference. Do I need to add some backslashes into the regex?

SOLVED! Thank you! My file was created in Windows and extra CR in the END-OF-LINE markers did not agree with the dolla开发者_StackOverflow社区r sign in the regex.


Single quotes should work with bash...

It works for me with this simple case:

echo test@test.com | egrep '^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})$'

In your text file, the line has to only contain the email address. Any additional spaces on the line will throw it off. For example this doesn't print anything:

echo " test@test.com" | egrep '^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})$'

Your problem might be that you have a dos formatted file. In that case the extra \r will make it so that the regex doesn't match since it will think there's an extra character at the end of the line. You can run dos2unix against it, or make your regex less restrictive by removing the beginning and end markers from your regex:

egrep '[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})'


WWorks for me:

JPP-MacBookPro-4:tmp jpp$ cat emails.txt
aa@bb.com
bb@cc.com
not an email
cc@dd.ee.ff

JPP-MacBookPro-4:tmp jpp$ egrep '^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})$' emails.txt
aa@bb.com
bb@cc.com
cc@dd.ee.ff
JPP-MacBookPro-4:tmp jpp$

Beware trailing whitespace/tabs/and returns - they have a way of biting regexs

There is a great ref on shell quoting here http://www.mpi-inf.mpg.de/~uwe/lehre/unixffb/quoting-guide.html

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号