开发者

MySQL function for checking similarity percentage between two texts

开发者 https://www.devze.com 2023-04-12 04:44 出处:网络
I need MySQL code for checking similarity percentage between text submitted via form against a number of texts stored in MySQL database.

I need MySQL code for checking similarity percentage between text submitted via form against a number of texts stored in MySQL database.

I am looking for MySQL stored procedure that will work the like PHP's similar_text() function. There is already MySQL Levenshtein distance procedure but it's not sufficient.

When the user submits the text t开发者_C百科he algorithm should return any entry in database with given percentage of similarity to the text submitted (it will compare only one column in database), e.g return all entries from database that have similarity > 40% with the text submitted by the user.

E.g table

TABLE - Articles
id, article_body, article_title

Code should return all rows that have similarity percentage > 40% (or other given value) with the text (article_body) the user have submitted.


I'd do it in the application.

Maybe result of SOUNDEX function will help you -

SELECT SOUNDEX('Hello'), SOUNDEX('Hello world'), SOUNDEX('hellboy');
+------------------+------------------------+--------------------+
| SOUNDEX('Hello') | SOUNDEX('Hello world') | SOUNDEX('hellboy') |
+------------------+------------------------+--------------------+
| H400             | H4643                  | H410               |
+------------------+------------------------+--------------------+


I think algorithm should be like this..

  • first calculate the length of given word ( using LENGTH ).
  • then search that word in specific column ( using INSTR or any specific function )
  • now calculate the length of each matched word and use simple maths

for eg: I want to Search 'Hell' with matching more than 50% and in my db there are 2 matching words 'Hello World' and 'Hellboy'

length(hell) = 4
length(hello world ) = 11
length (hellboy) =7

for hello world (11-4)/11 = 63.63%
for hellboy (7-4)/7 = 42.85%

now only Hello World will re retrived based on above calculation.

hope it works..

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号