开发者

Any way to compare/match sentences with only a different word order?

开发者 https://www.devze.com 2023-03-24 14:25 出处:网络
I have 2 MySQL tables , each with address data of companies in it. One table is more recent, but has no telephone and no website data. Now I want to unite these tables into 1 recent and complete table

I have 2 MySQL tables , each with address data of companies in it. One table is more recent, but has no telephone and no website data. Now I want to unite these tables into 1 recent and complete table.

But for some companies the order of the words is different,like this:

'Bakery Johnson' in table 1 and 'Johnson Bakery' in table 2.

Now I need to find a way to compare these values, as they're obviously the same company.

I think I wil开发者_如何学Gol somehow have to split those names first, and then order the different parts alphabetically.

Any chance anybody has done something like this before, and willing to share some code or function?

UPDATE: I found a function that sorts words inside a string. I can use this to detect name swaps as described above. It's quite SLOW though...

See : MySQL: how to sort the words in a string using a stored function?


If your table is MyISAM you can run this query:

SELECT  *
FROM    mytable
WHERE   MATCH(name) AGAINST ('+bakery +johnson')

This will find all records containing the words bakery and johnson (and probably some other words too).

Creating a FULLTEXT index on the table:

CREATE FULLTEXT INDEX
        fx_mytable_name
ON      mytable (name)

will speed up this query.


Going back a bit on your solution, you could go with a similar way as modern phones resolve duplicate names conflicts

You present your user with the option, as he finds something suspicious:

Is this a duplicate? Use our [ Merge ] option

You are merging Bakery Johnson, please select the source/original item:

[ Johnson Bakery v ] (my amazing dropdown!)

Everything not already in Johnson Bakery gets ported to Bakery Johnson (orders for example), you may also show an intermediate screen displaying what will be merged, or let the user pick, for example, he wants the address info from Johnson Bakery and orders from both etc

It is not self correcting as you asked, but the collaboration from the users may be more accurate than AI here. I also love low-tech solutions like this so let us know what you ended up doing.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号