开发者

Algorithm: Find all common substrings between two strings where order is preserved

开发者 https://www.devze.com 2023-04-08 21:19 出处:网络
Would like to discuss algorithms, no code. Problem: Let S and T be two sequences of elements. Find the common subsequences between them where the order of the elements is preserved.

Would like to discuss algorithms, no code.

Problem: Let S and T be two sequences of elements. Find the common subsequences between them where the order of the elements is preserved.

It should have O(n + m) running time where n is the length of S, and m is the length of T. I would also like to make the assumption that for the most part the two sequences will be similar.

An optimal solution?: After some research, one solution that appears to be optimal is to first build a generalised suffix tree for the two sequences. Then find the longest common substring and consider this subsequence to be part of the solution. Then either remove this subsequence from the tree or build a new suffix tree with this subsequence removed from the two original sequences to form S' and T'. Then find the longest common substring between S' and T', and so on.

To analyze the running time, building the tree takes O(n) and you can find the leng开发者_高级运维ths and starting positions of the longest common substrings of S and T in O(n + m).

Are there other (more) practical solutions that someone knows of or can link to? Any published papers considering the same or related problem you all know about? Input and constructive criticism about the above solution? Thanks for all your time!


My first thought was the use of a suffix tree, and relating it to the LCS problem. But I am not sure what a better solution would be off the top of my head. I did a quick search and came across a few papers and projects that might be useful, but no guarantees.

http://dl.acm.org/citation.cfm?id=1625377 (direct link here I believe: http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-101.pdf)

http://code.google.com/p/all-common-subsequences/

Sorry, it has been a long day and I am not quite awake enough to attempt a better solution myself.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号