开发者

Regex for finding "real" 3-digit sequences (ignoring those embedded in 4-digit sequences)

开发者 https://www.devze.com 2023-04-12 12:00 出处:网络
I\'d like a regex (using Java) that captures three digits such as \"876\", but not if they are buried inside a 4-digit sequence.

I'd like a regex (using Java) that captures three digits such as "876", but not if they are buried inside a 4-digit sequence.

To capture "876" within "876" and "foo876" and " 876 " and "876" and "food876" and "4foo876".

But NOT within "88foo9876" or "9876" or "a8876" or "a8876foo".

How do I do this?

I want to say something like X(\d\d\d)X, but in 开发者_Python百科place of the first X in that to say "\D or ^ (start-string)" and in place of the second X in that to say "\D or $ (end-string)".

Edit:

For answers, see Xanatos, also Code Jockey, and Tim Pietzcker.


well, then! for X(\d\d\d)X as you asked for, use

(?<=\D|^)(\d\d\d)(?=\D|$)

which is

(?<=\D|^)      # lookbehind for «\D or ^ (start-string)»
(\d\d\d)       # then match «three digits such as "876"»
(?=\D|$)       # lookahead for «\D or $ (end-string)»

and will

...capture "876" within "876" and "foo876" and " 876 " and "876" and "food876".

But NOT within "88foo9876" or "9876" or "a8876" or "a8876foo".

as you specified :D

Here it is shown below in RegexBuddy:

Regex for finding "real" 3-digit sequences (ignoring those embedded in 4-digit sequences)

if you're using a language without lookbehind (like ECMA/JavaScript) you'll have to either use

(\D|^)(\d\d\d)(?=\D|$)     # and use the second capturing group -or-
                           # use
(?:\D|^)(\d\d\d)(?=\D|$)   # and use the first capturing group


EDIT: Updated according to clarified specs:

(?<!\d)\d{3}(?!\d)

Explanation:

(?<!\d) # Assert that there is no digit before the current position
\d{3}   # Match exactly three digits
(?!\d)  # Assert that there is no digit after the current position

(initial version preserved for archival purposes :))

^\D*\d{3}$

if I understand you correctly.

Explanation:

^     # start of string
\D*   # zero or more non-digits
\d{3} # exactly 3 digits
$     # end of string


^\D*\d{3}$

The above works but your requirements are a little vague. Non digit means literally non digits so everything else is allowed even spaces.


(?<!\d)(\d{3})(?!\d)

Test here: http://gskinner.com/RegExr/?2utct

Using zero width capturing groups. Means 3 digits not preceeded by a digit and not followed by a digit. The only thing captured is the 3 digits.

Note that if you are using .NET, instead of \d you should use [0-9] to not capture things like 09E6 ০ BENGALI DIGIT ZERO (the ০ is your digit :-) )


I’m assuming that what you actually want is a regexp that matches legal variable names as defined by many programming languages. Let’s say you’re after strings with at least one non-digit at the beginning, then anything: that would be /^\D+.*/ (your mileage may vary, depending on the programming language). Of course, if I’m right in my assumption, \D is actually not at all what you want at the beginning; you’d rather want a list of characters that can legally start a variable (roughly, alphabetical character, plus the underscore, and possibly a few other characters). Hence that would be more like /[A-Za-z_]+.*/

But you really need to be more specific, as has already been said.


This is a regex that will match a sequence of 3 digits not immediatly preceded or succeeded by another digit.

[^\d](\d{3})[^\d]/

The caret (^) negates the character class which means it matches anything except digits. the {3} specifies how many digits need to be in the digit sequence in the middle.

Edit, sorry I didn't test it on single strings in which the sequence starts and/or ends with a digit in a sequence we want.. This should fix that, gets a few extra captures, but you can ignore those. Fixed that too since I'm too much of a perfectionist

(?:^|[^\d])(\d{3})(?:[^\d]|$)

Some more explanation, in parts.

(?:^|[^\d]) ; the ?: makes the group (everything, between the () brackets) non-capturing. ^|[^\d] means either the start of the string, or anything that isn't a digit.

(\d{3}) ; capture group of exactly 3 digits

(?:[^\d]|$) ; basically does the same as the beginning but then with the end of a string or anything that is not a digit...

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号