Regex: Match brackets both greedy and non greedy_问答_开发者

Regex: Match brackets both greedy and non greedy

开发者 https://www.devze.com 2023-03-07 01:57 出处：网络

I\'m using python regular expression module, re . I need to match anything inside \'(\' \')\' on this two phrases, but \"not so greedy\". Like this:

I'm using python regular expression module, re .

I need to match anything inside '(' ')' on this two phrases, but "not so greedy". Like this:

show the (name) of the (person)

calc the sqrt of (+ (* (2 4) 3))

The result should return, from phrase 1:

name
person

The result should return from phrase 2:

+ (* (2 4) 3)

The problem is that, to fit first phrase, I used '\(.*?\)'

This, on second phrase, just fits + (* (2 4)

An开发者_开发问答d using '\(.*\)' to fit second phrase correctly, on first phrase fits (name) of the (person)

What regex work on both phrases correctly?

Pyparsing makes it easy to write simple one-off parsers for stuff like this:

>>> text = """show the (name) of the (person)
...
... calc the sqrt of (+ (* (2 4) 3))"""
>>> import pyparsing
>>> for match in pyparsing.nestedExpr('(',')').searchString(text):
...   print match[0]
...
['name']
['person']
['+', ['*', ['2', '4'], '3']]

Note that the nesting parens have been discarded, and the nested text returned as a nested structure.

If you want the original text for each parenthetical bit, then use the originalTextFor modifier:

>>> for match in pyparsing.originalTextFor(pyparsing.nestedExpr('(',')')).searchString(text):
...   print match[0]
...
(name)
(person)
(+ (* (2 4) 3))

What you're trying to do looks like a shunting yard (actually it looks like LISP, so maybe you should check PyLisp out). There is no need to use regexps to parse these kind of expressions.

See Shunting yard article @ wikipedia and it's Python implementation.

This matches all of the required info:

(?:\()(.*?\){2})|(?:\()(.*?)(?:\))

Group 1 = + (* (2 4) 3)

The last ")" can be stripped off with .strip(')')

Group 2 = name, person

As long as the brackets are not nested, you can use a lazy regex:

\(.*?\)

While you can theoretically parse a limited amount of nesting in a regex, it's very hard and not worth the effort. It's much easier to do that using a custom python function. See this answer for a good explanation.