开发者

Python Regex Capture Only Certain Text [duplicate]

开发者 https://www.devze.com 2023-04-06 01:27 出处:网络
This question already开发者_JS百科 has answers here: Python extract pattern matches (10 answers) Closed 4 years ago.
This question already开发者_JS百科 has answers here: Python extract pattern matches (10 answers) Closed 4 years ago.

I am trying to find functionality in python similar to the Ruby function scan. My goal is to grab all the text in-between two curly braces in a list. If there are multiple pairs of curly braces in the string, I want to have multiple entries in the list.

When I run this code:

 match = re.search(r'\{(.+)\}', request.params['upsell'])
 print match.group()

I match the right text. However, what is captured includes the curly braces. I don't want to include this text, I want to include everything in between, but not the curly braces Thanks!


Use group(1), or lookbehinds/aheads. (Also, be sure to take the advice of F.J. and J.F. and use either .+? or [^{}]*

import re
match = re.search(r'\{(.+)\}', "asdfasd {asdf}asdfasdf")
print match.group(1)

or with lookbehinds/aheads:

import re
match = re.search(r'(?<=\{)(.+)(?=\})', "asdfasd {asdf}asdfasdf")
print match.group()


re.findall(r'\{(.+?)\}', request.params['upsell'])

This will return a list where each entry is the contents of a different group of curly braces. Note that this will not work for nested braces.

The ? after the .+ will make it a lazy match (as opposed to greedy). This means that the match will stop at the first "}", instead of continuing to match as many characters as possible and ending on the last closing brace.

re.findall() will search through your string and find all non-overlapping matches, and return the group. Alternatively you could use re.finditer() which will iterate over Match objects, but then you would need to use match.group(1) to get only what it inside of the braces. This is also what you would need to change in your example, match.group() returns the entire match not the captured group, for that you need to put the number for the group you want.


>>> import re
>>> re.findall(r'{([^{}]*)}', '{a} { {b} c { {d} } }')
['a', 'b', 'd']
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号