开发者

How can I compress repetitive characters to a single character using RE in Python?

开发者 https://www.devze.com 2023-01-31 14:11 出处:网络
I want to be able to replace any consecutive occurrences of punctuation characters in a string with a single occurrence.

I want to be able to replace any consecutive occurrences of punctuation characters in a string with a single occurrence. For example:

The first thing that came to mind was to:

for char in string.punctuation:
  text = re.sub( "\\" + char + "+",  char,  text )

However, since this is going to run in a repetitive process, I was wondering if there is a way to achieve this in a single RE, in order to make it run faster. What do you think?


You could try:

text = re.sub(r"([" + re.escape(string.punctuation) + r"])\1+", r"\1", text)

This uses re.escape() to ensure that the punctuation characters are properly escaped as necessary. The \1 backreferences refer to the part within the parentheses (), which is the first punctuation character matched. So this replaces instances of two or more repeated punctuation characters with the same single character.


re.sub(r'([!?.])\1+', r'\1', text)

0

精彩评论

暂无评论...
验证码 换一张
取 消