Two very close regexes with lookahead assertions in Python - why does re.split() behave differently?_问答_开发者

Two very close regexes with lookahead assertions in Python - why does re.split() behave differently?

开发者 https://www.devze.com 2023-03-21 02:52 出处：网络

I was trying to anser this question where the OP has the following string: \"path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism\"

I was trying to anser this question where the OP has the following string:

"path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"

and wants to split it to obtain the following list:

['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']

I tried to solve it by using a simple lookahead assertion in a regex, (?=path:). Well, it did not work:

>>> s = "path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
>>> r = re.compile('(?=path:)')
>>> r.split(s)
['path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism']

However, in this answer, the answerer got it wor开发者_Python百科king by preceding the lookahead assertion with a whitespace:

>>> line = 'path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism'
>>> re.split(' (?=path:)', line)
['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']

Why did the regex work with the whitespace? Why did it not work without the whitespace?

Python's re.split() has a documented limitation: It can't split on zero-length matches. Therefore the split only worked with the added space.