开发者

Lucene Analyzer for Indexing and Searching

开发者 https://www.devze.com 2023-04-12 08:57 出处:网络
I have a field that I am indexing with Lucene like so: @Field(name=\"hungerState\", index=Index.TOKENIZED, store=Store.YES)

I have a field that I am indexing with Lucene like so:

@Field(name="hungerState", index=Index.TOKENIZED, store=Store.YES)
public HungerState getHungerState() {

The possible values of this field are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY

When these values are indexed using the StandardAnalyzer, the terms end up as hungry, slightly since it tokenizes on punctuation and ignores the "not".

If I change the index to index=Index.UN_TOKENIZED, the indexed terms are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY, as expected.

My search API has 1 "search" method that constructs the Query like so:

MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_30, getSearchFields(), new StandardAnalyzer(Version.LUCENE_30));
parser.setDefaultOperater(Que开发者_JS百科ryParser.AND_OPERATOR);
Query query = parser.parse(searchTerms);

This handles searches where searchTerms = "foo", which searches all fields returned by getSearchFields() on "foo", and also where searchTerms specifies fields and values to search (ie "hungerState:HUNGRY")

My problem is with the latter scenario. Since the query parser is using a StandardAnalyzer, searches for hungerState:SLIGHTLY_HUNGRY get parsed into hungerState:"slightly hungry" and searches for hungerState=NOT_HUNGRY get parsed into hungerState=hungry.

When the field is indexed using the StandardAnalyzer, I get unexpected results (searches for HUNGRY and NOT_HUNGRY return results for all 3 values). When the field is indexed as UN_TOKENIZED, I don't get any results since the query parser tokenizes the search string and makes it lowercase.

I've even tried specifying an Analyzer for indexing like KeywordAnalyzer, but it pretty much has no effect since the entire search string is analyzed with StandardAnalyzer every time.

Any advice would be appreciated. Thanks!


You're using a standard analyzer for your query parser, so yes your query will be analyzed with a standard analyzer. Just switch to using a keyword analyzer:

MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_30, getSearchFields(), 
          new KeywordAnalyzer(Version.LUCENE_30));

You may want to use a PerFieldAnalyzerWrapper if your other fields aren't keywords.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号