\"I didnt see him until 1\" />
开发者

Matching everything but words, numbers and spaces

开发者 https://www.devze.com 2023-04-02 03:23 出处:网络
This code will replace everything except for words, but how do I get it to also leave the numbers and spaces untouched? e.g. \"I didn\'t see him u开发者_JAVA百科ntil 1.\" -> \"I didnt see him until 1\

This code will replace everything except for words, but how do I get it to also leave the numbers and spaces untouched? e.g. "I didn't see him u开发者_JAVA百科ntil 1." -> "I didnt see him until 1"

text = regex.sub("\P{alpha}+","",text)


Don’t use Python’s re library on Unicode. It works very poorly. Use Matthew Barnett’s regex library instead. It works much, much better.

It also runs on both Python 2 and 3 and on both narrow and wide builds, but for reasons largely unrelated to that particular library I strongly recommend that you run only a wide build of Python 3 and eschew all other combinations.


Python regexes don't support Unicode properties. You can try:

text = re.sub("[^a-zA-Z0-9 ]+","",text)

Instead. If you do have something like Ponyguruma installed, you can use:

text = re.sub("[\P{Alnum}\PZ]+","",text) # pZ is shorthand for p{Separator}
0

精彩评论

暂无评论...
验证码 换一张
取 消