similarity
Lucene numDocs and doqFreq on custom similarity class
im doing an aplication with Lucene (im a noob with it) and im facing some problems. My aplication uses the Lucene 2.4.0 library with a custom similaraty implementation (the jar is imported)[详细]
2022-12-26 06:05 分类:问答How to detect if two news articles have the same topic? (Python semantic similarity)
I\'m trying to scrape 开发者_Go百科headlines and body text from articles on a few specific sites, similar to what Google does with Google News.[详细]
2022-12-25 22:55 分类:问答Is there some algorithm to compare the DOM similarity of different pages?
Has anyone som开发者_开发问答e experience about this?You could first get all DOM elements and then remove their content and attributes. After the content has been removed you could convert all tags to[详细]
2022-12-24 14:48 分类:问答Detecting similar words among n text documents
I have n documents and want to find common words that are included in these documents. For example I want to say (n-3) documents incl开发者_Python百科ude the word \"web\".[详细]
2022-12-23 13:54 分类:问答Percentage Similarity Analysis (Java)
I have following situation: String a = \"A Web crawler is a computer program that browses the World Wide Web internet automatically\";[详细]
2022-12-22 15:28 分类:问答Collaborative Filtering: Non-Personalized item-to-item similarity
I\'m trying to compute item-to-item similarity along the lines of Amazon\'s \"Customers who viewed/purchased X have also viewed/purchased Y and Z\".All of the examples and references I\'ve seen are fo[详细]
2022-12-22 05:53 分类:问答Comparing strings with tolerance
I\'m looking for a way to compare a string with an array of strings. Doing an exact 开发者_如何转开发search is quite easy of course, but I want my program to tolerate spelling mistakes, missing parts[详细]
2022-12-21 21:20 分类:问答Text similarity function for strict document similarity
I\'m writing a piece of java software that has to make the final judgement on the similarity of two documents encoded in UTF-8.[详细]
2022-12-21 18:24 分类:问答How do you efficiently implement a document similarity search system?
How do you implement a \"similar items\" system for items described by a set of tags? In my database, I have three tables, Article, ArticleTag and Tag.开发者_开发技巧 Each[详细]
2022-12-19 04:47 分类:问答Appropriate similarity metrics for multiple sets of 2D coordinates
I have a collection of 2D coordinate sets (on the scale of a 100K-500K points in each set) and I am looking for the most efficient way to measure the similarity of 1 set to the other. I know of the us[详细]
2022-12-17 18:41 分类:问答