开发者

String.replaceAll is considerably slower than doing the job yourself

开发者 https://www.devze.com 2023-03-11 18:12 出处:网络
I have an old piece of code that performs find and replace of tokens within a string. It receives a map of from and to pairs, iterates over them and for each of those pairs, iterates over the target

I have an old piece of code that performs find and replace of tokens within a string.

It receives a map of from and to pairs, iterates over them and for each of those pairs, iterates over the target string, looks for the from using indexOf(), and replaces it with the value of to. It does all the work on a StringBuffer and eventually returns a String.

I replaced that code with this line: replaceAll("[,. ]*", "");

And I ran some comparative performance tests.

When comparing for 1,000,000 iterations, I got this:

Old Code: 1287ms

New Code: 4605ms

3 times longer!

I then tried replacing it with 3 calls to replace:

replace(",", "");

replace(".", "");

replace(" ", "");

This resu开发者_运维知识库lted with the following results:

Old Code: 1295

New Code: 3524

2 times longer!

Any idea why replace and replaceAll are so inefficient? Can I do something to make it faster?


Edit: Thanks for all the answers - the main problem was indeed that [,. ]* did not do what I wanted it to do. Changing it to be [,. ]+ almost equaled the performance of the non-Regex based solution. Using a pre-compiled regex helped, but was marginal. (It is a solution very applicable for my problem.

Test code:

Replace string with Regex: [,. ]*

Replace string with Regex: [,. ]+

Replace string with Regex: [,. ]+ and Pre-Compiled Pattern


While using regular expressions imparts some performance impact, it should not be as terrible.

Note that using String.replaceAll() will compile the regular expression each time you call it.

You can avoid that by explicitly using a Pattern object:

Pattern p = Pattern.compile("[,. ]+");

// repeat only the following part:
String output = p.matcher(input).replaceAll("");

Note also that using + instead of * avoids replacing empty strings and therefore might also speed up the process.


replace and replaceAll uses regex internally which in most cases gives a serious performance impact compared to e.g., StringUtils.replace(..).

String.replaceAll():

public String replaceAll(String regex, String replacement) {
        return Pattern.compile(regex).matcher(this ).replaceAll(
             replacement);
}

String.replace() uses Pattern.compile underneath.

public String replace(CharSequence target, CharSequence replacement) {
  return Pattern.compile(target.toString(), Pattern.LITERAL)
         .matcher(this ).replaceAll(
           Matcher.quoteReplacement(replacement.toString()));
}

Also see Replace all occurrences of substring in a string - which is more efficient in Java?


As I have put in a comment [,. ]* matches the empty String "". So, every "space" between characters matches the pattern. It is only noted in performance because you are replacing a lot of "" by "".

Try doing this:

Pattern p = Pattern.compile("[,. ]*");
System.out.println(p.matcher("Hello World").replaceAll("$$$");

It returns:

H$$$e$$$l$$$o$$$$$$W$$$o$$$r$$$l$$$d$$$!$$$

No wonder it is slower that doing it "by hand"! You should try with [,. ]+


When it comes to replaceAll("[,. ]*", "") it's not that big of a surprise since it relies on regular expressions. The regex engine creates an automaton which it runs over the input. Some overhead is expected.

The second approach (replace(",", "")...) also uses regular expressions internally. Here the given pattern is however compiled using Pattern.LITERAL so the regular expression overhead should be negligable.) In this case it is probably due to the fact that Strings are immutable (however small change you do, you will create a new string) and thus not as efficient as StringBuffers which manipulate the string in-place.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号