开发者

Why this Regex, matches incorrect characters?

开发者 https://www.devze.com 2023-04-11 10:23 出处:网络
I need to match these characters. This quote is from an API documentation (external to our company): Valid characters: 0-9 A-Z a-z & # - . , ( ) / : ; \' @ \"

I need to match these characters. This quote is from an API documentation (external to our company):

Valid characters: 0-9 A-Z a-z & # - . , ( ) / : ; ' @ "

I开发者_StackOverflow used this Regex to match characters:

^[0-9a-z&#-\.,()/:;'""@]*$

However, this wrongly matches characters like %, $, and many other characters. What's wrong?

You can test this regular expression online using http://regexhero.net/tester/, and this regular expression is meant to work in both .NET and JavaScript.


You are not escaping the dash -, which is a reserved character. If you add replace the dash with \- then the regex no longer matches those characters between # and \


Move the literal - to the front of the character set:

^[-0-9a-z&#\.,()/:;'""@]*$

otherwise it is taken as specifying a range like when you use it in 0-9.


- sign, when not escaped, has special meaning in square brackets. #-\. is transformed into #-. (BTW, backslash before dot is not necessary in square brackets), which means "any character between # (ASCII 0x23) and . (ASCII 0x2E). The correct notation is

^[0-9a-z&#\-.,()/:;'"@]*$


The special characters in a character class are the closing bracket (]), the backslash (\), the caret (^) and the hyphen (-).

As such, you should either escape them with a backslash (\), or put them in a position where there is no ambiguity and they do not need escaping. In the case of a hyphen, this would be the first or last position.

You also do not need to escape the dot (.).

Your regex thus becomes:

^[-0-9a-z&#.,()/:;'"@]*$


As a side note, there are many available regex evaluators which provide code hinting. This way, you can simply hover your mouse over your regular expression and it can be explained in English words. One such free one is RegExr.

Typing your original regex in it and hovering over the hyphen shows:
Matches characters in the range '#-\'


Try that

^[0-9a-zA-Z\&\#\-\.\,\(\)\/\:\;\'\"\@]*$
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号