开发者

How to configure SOLR so users can make prefix search by default?

开发者 https://www.devze.com 2023-04-06 03:25 出处:网络
I am using SOLR 3.2. My application issues search queries on SOLR instance, for a text field type. How can i make SOLR to return results like \"book\", \"bookshelf\", \"bookasd\" so on, when user issu

I am using SOLR 3.2. My application issues search queries on SOLR instance, for a text field type. How can i make SOLR to return results like "book", "bookshelf", "bookasd" so on, when user issues a query like "book". Should i append "*" characters to the query string manually or is there a setting in SOLR so it will do prefix searches on the field by default?

This is the schema.xml section for text field type:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
       开发者_如何学编程 <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
      </analyzer>
      <analyzer type="query">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenat0All="1" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
      </analyzer>
    </fieldType>


There are several ways to do this, but performance wise you might want to use EdgeNgramFilterFacortory


I had the same requirement on a project. I had to implement Suggestion. What i did was defining this suggester fieldType

<fieldType class="solr.TextField" name="suggester">
    <analyzer  type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        
        <filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="3" outputUnigrams="true" outputUnigramsIfNoShingles="false" />
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true" />
    </analyzer>
    <analyzer  type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

I used ShingleFilterFactory because I needed to get suggestion composed of one ore more words.

Then I used faceting queries to get suggestions.

Facet.Limit=10

Facet.Prefix="book"

Facet.Field="Suggester" //this is the field with fieldType="suggester" in which I saved the data

I know it uses facet results but maybe it solves your problem.

If my or Jayendra Patil's answer doesn't provide you a solution you can also take a look at EdgeNGramFilterFactory


You would either have to do the handling on the client side by appending the wildcard characters at the end of the search terms.

The impact :-

  1. Wildcard queries have a performance impact
  2. Wildcard queries do not undergo analysis. So the query time analysis won't be applied to you search terms

The other option is to implement custom query parser with the handling you need.


I'm sure you figured this out by now, but just so there's an answer here:

I handled this by taking the last term and putting an OR with the last term plus a wildcard, e.g. "my favorite book" becomes "my+favorite+(book OR book*)", and would return "my favorite bookshelf". You probably want to do some processing on the input anyway (escaping, etc).

If you are specifically looking for the text typed to match the beginning of the result, then edge n-grams are the way to go, but from reading your question it didn't seem you were really asking for that.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号