开发者

SolR : full sentence spellcheck

开发者 https://www.devze.com 2023-03-28 20:36 出处:网络
I\'m trying to configure a spellchecker to autocomplete full sentences from my query. I\'ve already been able to get this results:

I'm trying to configure a spellchecker to autocomplete full sentences from my query.

I've already been able to get this results:

"american israel" :

-> "american something"

-> "israel something"

But i want :

"american israel" :

-> "american israel something"

This is my solrconfig.xml :

<searchComponent name="suggest_full" class="solr.SpellCheckComponent">
 <str name="queryAnalyzerFieldType">suggestTextFull</str>
 <lst name="spellchecker">
  <str name="name">suggest_full</str>
  <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
  <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
  <str name="field">text_suggest_full</str>
  <str name="fieldType">suggestTextFull</str>
 </lst>
</searchComponent>

<requestHandler name="/suggest_full" class="org.apache.solr.handler.component.SearchHandler">
<lst name="defaults">
 <str name="echoParams">explicit</str>
 <str name="spellcheck">true</str>
 <str name="spellcheck.dictionary">suggest_full</str>
 <str name="spellcheck.count">10</str>
 <str name="spellcheck.onlyMorePopular">true</str>
</lst>
<arr name="last-components">
 <str>suggest_full&开发者_StackOverflow社区lt;/str>
</arr>
</requestHandler>

And this is my schema.xml:

<fieldType name="suggestTextFull" class="solr.TextField">
  <analyzer type="index">  
    <tokenizer class="solr.KeywordTokenizerFactory"/>  
    <filter class="solr.LowerCaseFilterFactory"/>  
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">  
    <tokenizer class="solr.KeywordTokenizerFactory"/>  
    <filter class="solr.LowerCaseFilterFactory"/>  
  </analyzer>
</fieldType>

...

<field name="text_suggest_full" type="suggestTextFull" indexed="true" stored="false" multiValued="true"/>

I've read somewhere that I have to use spellcheck.q because q use the WhitespaceAnalyzer, but when I use spellcheck.q i get a java.lang.NullPointerException

Any ideas ?


If you spellcheck fields ( text_suggest_full ) contain american something and israel something so make sure, that there also exist an document/entry , with the value american israel something.

Solr will not merge american something and israel something to one term and will not apply the result to your spellchecking for american israel.


Wouldnt be there an autocomplete approach more suitable? See this article e.g.


You can use the suggester / a flexible "autocomplete" component; you must have version 3.X of solr

SolrConfig.xml :

 <searchComponent name="suggest" class="solr.SpellCheckComponent">
    <lst name="spellchecker">
    <str name="name">suggest</str>
    <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
    <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
    <str name="field">name_autocomplete</str>
    </lst>
    </searchComponent>


    <requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchHandler">
    <lst name="defaults">
    <str name="spellcheck">true</str>
    <str name="spellcheck.dictionary">suggest</str>
    <str name="spellcheck.count">10</str>
    </lst>
    <arr name="components">
    <str>suggest</str>
    </arr>
    </requestHandler>

Shema.xml

<field name="name_autocomplete" type="text" indexed="true" stored="true" multiValued="false" />

Add copyField

<copyField source="name" dest="name_autocomplete" />

Reload solr, reindex all and test : http://localhost:8983/solr/suggest?q=&amerspellcheck=true&spellcheck.collate=true&spellcheck.build=true

Get something like :

<?xml version="1.0" encoding="UTF-8"?>
<response>
  <lst name="spellcheck">
    <lst name="suggestions">
      <lst name="ameri">
        <int name="numFound">2</int>
        <int name="startOffset">0</int>
        <int name="endOffset">2</int>
        <arr name="suggestion">
          <str>american morocco</str>
          <str>american morocco something</str>
        </arr>
      </lst>
      <str name="collation">american morocco something</str>
    </lst>
  </lst>
</response>

Hope that help

Cheers


IMHO, a problem with the spellcheck component is that each word is spell checked against the full index. The "collation" of the spell checked words does not neccesary match an single document within the index, but might come from separate indexed documents.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号