开发者

How can I handle multiple parenthesis in a regex?

开发者 https://www.devze.com 2022-12-29 04:26 出处:网络
I have strings of this type: text (more text) What I would like to do is to have a regular expression that extracts the \"more text\" segment of the string.So far I have been using this regular exp

I have strings of this type:

text (more text)

What I would like to do is to have a regular expression that extracts the "more text" segment of the string. So far I have been using this regular expression:

"^.*\\((.*)\\)$"

Which although it works on many cases, it seems to fail if I have something of the sort:

text (more text (even more text))

开发者_StackOverflowWhat I get is: even more text)

What I would like to get instead is: more text (even more text) (basically the content of the outermost pair of brackets.)


Besides lazy quantification, another way is:

"^[^(]*\\((.*)\\)$"

In both regexes, there is a explicitly specified left parenthesis ("\\(", with Java String escaping) immediately before the matching group. In the original, there was a .* before that, allowing anything (including other left parentheses). In mine, left parentheses are not allowed here (there is a negated character class), so the explicitly specified left parenthesis in the outermost.


I recommend this (double escaping of the backslash removed, since this is not part of the regex):

^[^(]*\((.*)\)

Matching with your version (^.*\((.*)\)$) occurs like this:

  1. The star matches greedily, so your first .* goes right to the end of the string.
  2. Then it backtracks just as much as necessary so the \( can match - that would be the last opening paren in the string.
  3. Then the next .* goes right to the end of the string again.
  4. Then it backtracks just as much so the \) can match, i.e. to the last closing paren.

When you use [^(]* instead of .*, it can't go past the first opening paren, so the first opening paren (the correct one) in the string will delimit your sub-match.


Try:

"^.*?\\((.*)\\)$"

That should make the first matching less greedy. Greedy means it swallows everything it possibly can while still getting an overall pattern match.

The other suggestion:

"^[^(]*\\((.*)\\)$"

Might be more along the line of what you're looking for though. For this simple example it doesn't really matter so much, but it could if you wanted to expand on the regex, for example by making the part inside the braces optional.


Try this:

"^.*?\\((.*)\\)$"


True regular expressions can't count parentheses; this requires a pushdown automaton. Some regex libraries have extensions to support this, but I don't think Java's does (could be wrong; Java isn't my forté).

BTW, the other answers I've seen so far will work with the example given, but will break with, e.g., text (more text (even more text)) (another bit of text). Changing greediness doesn't make up for the inability to count.


$str =~ /^.*?\((.*)\)/


I think the reason is because you second wildcard is picking up the closing parenthesis. You'll need to exclude it.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号