开发者

In C++, what's the fastest way to replace all occurrences of a substring within a string with another string?

开发者 https://www.devze.com 2023-04-12 17:32 出处:网络
I\'m looking for the most efficient (in terms of \"fastest\") way to replace all occurrences of a substring within a string with another string. All I\'ve came up with so far is:

I'm looking for the most efficient (in terms of "fastest") way to replace all occurrences of a substring within a string with another string. All I've came up with so far is:

std::string StringReplaceAll(const std::string &cstSearch, const std::string &cstReplace, const std::string &cstSubject)
{
    if(cstSearch.length() > cstSubject.length() || cstSearch == cstReplace || cstSubject.empty() || cstSearch.empty() || cstSubject.find(cstSearch) == std::string::npos)
    {
        return cstSubject;
    }

    std::ostringstream                                  ossReturn;
    std::string::const_iterator                         ci(cstSubject.cbegin());
    const std::string::const_iterator::difference_type  ciFindSize(std::distance(cstSearch.cbegin(), cstSearch.cend()));

    for(std::string::const_iterator ciNow; (ciNow = std::search(ci, cstSubject.cend(), cstSearch.cbegin(), cstSearch.cend())) != cstSubject.cend(); ci = ciNow)
    {
        std::copy(ci, ciNow, std::ostreambuf_iterator<char> (ossReturn));
        std::copy(cstReplace.cbegin(), cstReplace.cend(), std::ostreambuf_iterator<char> (ossReturn));
        std::advance(c开发者_如何转开发iNow, ciFindSize);
    }

    std::copy(ci, cstSubject.cend(), std::ostreambuf_iterator<char> (ossReturn));

    return ossReturn.str();
}

... and this one is way(!!!) too slow for my needs :-(

Looking forward to learn from you guys!


First, I'd use std::string, rather than std::ostringstream, to build up the results; std::ostringstream is for formatting, and there's no formatting to be done here. Other than that, you've got basically the correct algorithm; using std::search to find where the next replacement should be done. I'd use a while loop to make things a bit more readable, which gives:

std::string
replaceAll( std::string const& original,
            std::string const& before,
            std::string const& after )
{
    std::string retval;
    std::string::const_iterator end     = original.end();
    std::string::const_iterator current = original.begin();
    std::string::const_iterator next    =
            std::search( current, end, before.begin(), before.end() );
    while ( next != end ) {
        retval.append( current, next );
        retval.append( after );
        current = next + before.size();
        next = std::search( current, end, before.begin(), before.end() );
    }
    retval.append( current, next );
    return retval;
}

(Note that using std::string::append will be faster than using std::copy; the string knows how many it must add, and can resize the string accordingly.)

Afterwards, it would be trivial to catch the special case where there is nothing to replace, and return the initial string immediately; there might be some improvements to be had using std::string::reserve as well. (If before and after have the same length, retval.reserve( original.size() ) is a clear win. Even if they don't, it could be a win. As for first counting the number of substitutions, then exactly calculating the final size, I don't know. You'll have to measure with your actual use cases to find out.)


I asked about this same thing at http://hardforum.com/showthread.php?t=979477 years ago.

I don't remember it all that well, but the following code was in comment #31 and I think it was faster than my other attempts (but not faster than MikeBlas' metered_string example):

#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>
#include <sstream>

using namespace std;

inline string replaceAll(const string& s, const string& f, const string& r) {
    if (s.empty() || f.empty() || f == r || f.size() > s.size() || s.find(f) == string::npos) {
        return s;
    }
    ostringstream build_it;
    typedef string::const_iterator iter;
    iter i(s.begin());
    const iter::difference_type f_size(distance(f.begin(), f.end()));
    for (iter pos; (pos = search(i , s.end(), f.begin(), f.end())) != s.end(); ) {
        copy(i, pos,  ostreambuf_iterator<char>(build_it));
        copy(r.begin(), r.end(), ostreambuf_iterator<char>(build_it));
        advance(pos, f_size);
        i = pos;
    }
    copy(i, s.end(), ostreambuf_iterator<char>(build_it));
    return build_it.str();
}

int main() {
    const string source(20971520, 'a');
    const string test(replaceAll(source, "a", "4"));
}

See the thread for more examples and lots of discussion.

If I remember correctly, it was really easy to make things faster than boost's replace_all.

Here's a clearer c++0x version:

#include <string>
#include <algorithm>
#include <iterator>
#include <sstream>
#include <ostream>
using namespace std;

string replace_all_copy(const string& s, const string& f, const string& r) {
    if (s.empty() || f.empty() || f == r || f.size() > s.size()) {
        return s;
    }
    ostringstream buffer;
    auto start = s.cbegin();
    while (true) {
        const auto end = search(start , s.cend(), f.cbegin(), f.cend());
        copy(start, end,  ostreambuf_iterator<char>(buffer));
        if (end == s.cend()) {
            break;
        }
        copy(r.cbegin(), r.cend(), ostreambuf_iterator<char>(buffer));
        start = end + f.size();
    }
    return buffer.str();
}

int main() {
    const string s(20971520, 'a');
    const string result = replace_all_copy(s, "a", "4");
}

// g++ -Wall -Wextra replace_all_copy.cc -o replace_all_copy -O3 -s -std=c++0x


I think std::search uses a trivial algorithm, at least the reference states the complexity of a naiive string matching algorithm. If you replace it by an implementation of Boyer-Moore, you should be able to see significant performances increases.

Apart from that you are heavily dependent on good compiler optimizations. By returning the string instead of passing a string* result, you cause unnecessary copying of the result. However, compilers may help you there. But just to be sure you can alos try passing a pointer to the result as argument and appending to that string. However, swapping std::search for an implementation of a non trivial algorithm (boyer-moore as mentioned above or knuth-morris-pratt) should have an impact that is larger by several orders of magnitude compared to fine-tuning the return mechanism.


Try this one.

template<class T> inline void Replace(T& str, const T& str1, const T& str2)
{
    const T::size_type str2Size(str2.size());
    const T::size_type str1Size(str1.size());

    T::size_type n = 0;
    while (T::npos != (n = str.find(str1, n))) {
        str.replace(n, str1Size, str2);
        n += str2Size;
    }
}

std::string val(L"abcabcabc");
Replace(val, L"abc", L"d");


I found this one to be faster:

typedef std::string String;

String replaceStringAll(String str, const String& old, const String& new_s) {
    if(!old.empty()){
        size_t pos = str.find(old);
        while ((pos = str.find(old, pos)) != String::npos) {
             str=str.replace(pos, old.length(), new_s);
             pos += new_s.length();
        }
    }
    return str;
}

A comparison with James kanzes replaceAll function:

replaceAll      : 28552
replaceStringAll: 33518
replaceAll      : 64541
replaceStringAll: 14158
replaceAll      : 20164
replaceStringAll: 13651
replaceAll      : 11099
replaceStringAll: 5650
replaceAll      : 23775
replaceStringAll: 10821
replaceAll      : 10261
replaceStringAll: 5125
replaceAll      : 10283
replaceStringAll: 5374
replaceAll      : 9993
replaceStringAll: 5664
replaceAll      : 10035
replaceStringAll: 5246
replaceAll      : 8570
replaceStringAll: 4381

Times are calculated in nanoseconds with std::chrono::high_resolution_clock


I would propose an optimization step: make a preliminary pass to check if there is anything to replace at all. If there is nothing to replace, just return and avoid allocating memory and copying.


i tested 3 ways to replace substring

  1. boost_replace_all
  2. regex_replace_all
  3. basic handmade string replace algorithm
void regex_replace_test(std::string &str) {

    std::string res = std::regex_replace(str, std::regex(SUBSTRING), REPLACEABLE);

}

void boost_replace_all_test(std::string &str) {
    boost::replace_all(str, SUBSTRING, REPLACEABLE);
}

void handmade_replace_test(std::string &str) {
    size_t pos = 0;
    while ((pos = str.find(SUBSTRING, pos)) != std::string::npos) {
        str.replace(pos, SUBSTRING.length(), REPLACEABLE);
        pos += REPLACEABLE.length();
    }
}

input data info

test to replace SUBSTRING  in BASE string.

BASE string length = 48004

SUBSTRING = fdkfgkd

SUBSTRING length = 7

SUBSTRING's quantity in BASE string= 4364

REPLACEABLE = ddddddd

REPLACEABLE length = 7

i got the following results:

regex_replace_test = 14 ms , 14921715 ns

boost_replace_all_test = 4 ms , 4167104 ns

handmade_replace_test = 0 ms , 372201 ns

fastest way is basic handmade substring replace method.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号