开发者

How to use NGramTokenizerFactory or NGramFilterFactory?

开发者 https://www.devze.com 2023-02-04 07:13 出处：网络

Recently, I am studying how to store and index using Solr. I want to do facet.prefix search. With whitespace tokenizer, \"Where are you\" will be splited into three words and indexed. If I search face

相关专题：lucene solr tokenize

Recently, I am studying how to store and index using Solr. I want to do facet.prefix search. With whitespace tokenizer, "Where are you" will be splited into three words and indexed. If I search facet.prefix="where are", no result will be returned.

I google and found NGramFilterFactory can help me. But when I apply this filter factory, I found the result is "w, h, e, ..., wh, ..", which split the sentence by character, not by token word.

I use the parameters maxGramSize and minGramSize, set to 1 and 3. 开发者_开发技巧Does the NGramFilterFactory work right? Should I add some other parameters? Is there some other filter factories which can help me?

Thanks!

Facets should only be applied to non tokenized fields like strings. if you want that results will be displayed for "what are" use no tokenizer at all for that field (or a copyField directive). I guess that you want to use facet.prefix for autocompletion. you can do this, look here.

for the ngramtokenizer check this out.