开发者

Solr facet counts are not correct, how to deduplicate

开发者 https://www.devze.com 2023-03-28 23:42 出处:网络
We are using two solrs to index the files. Sometimes one article is indexed in both solrs because we do update. It cause a problem that the facet counts are not correct due to these duplicated article

We are using two solrs to index the files. Sometimes one article is indexed in both solrs because we do update. It cause a problem that the facet counts are not correct due to these duplicated articles. How can I de-duplicate t开发者_如何学编程he counts?


My advise would be not to keep duplicated articles. So you need a method to identify this duplicates articles and deleted it form one SOLR.

If you don't want to delete duplicate articles you still need to keep track of them. Knowing which articles from SOLR1 are duplicates in SOLR2 will help you de-duplicate the counts like this:

  • create an extra field in SOLR1 named :

    IsDuplicateField = true, if article is duplicated in SOLR2
                     = false, otherwise
    
  • when you do the query to SOLR1 add: IsDuplicatedField=true to facets.

  • when retrieving result just decrease the total number of facet counts with total number of IsDuplicateField from SOLR1.

In this situation the facet IsDuplicateField will retrieve all the articles that are duplicated and match your query.

Good luck !

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号