Solr4.2.1 拼写检查组件
org.apache.solr.spelling.IndexBasedSpellChecker
org.apache.solr.spelling.FileBasedSpellChecker
IndexBasedSpellChecker是基于Solr或lucene索引字段的,FileBasedSpellChecker是基于字典文件的,这个在用于词的搜索热门度排名有用。
在solr 4.0版本引入了solr.DirectSolrSpellChecker拼写检查组件,是个实验性的组件,可以为主索引提供拼写建议功能,且不需要在每次commit索引时重建。
4.x的配置:
solrconfig.xml<!-- 自定义自动完成单个词字段类型 -->
<fieldType class="solr.TextField" name="text_auto_s" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" atenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<!-- 自定义自动完成短语字段类型,如果使用词组,你需要定义自己的分词类(对于中文如庖丁、iK等) -->
<fieldType class="solr.TextField" name="text_auto">
<analyzer>
<!-- 整个字段做为一个词,不进行分词 -->
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
要想把拼写检查组件合并在/select查询功能中:<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<!-- 查询分析器,如果不指定的话,默认会使用field字段类型的分词器作为拼写检查用,为了提高校正的准确率,一般对校正的词,不要进行分词,所以用string就好了,拼写检查的配置主要是在solrconfig.xml里面配置. -->
<str name="queryAnalyzerFieldType">string</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">text_spell</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<str name="distanceMeasure">internal</str>
<float name="accuracy">0.5</float>
<int name="maxEdits">2</int>
<int name="minPrefix">1</int>
<int name="maxInspections">5</int>
<int name="minQueryLength">4</int>
<float name="maxQueryFrequency">0.01</float>
<!-- uncomment this to require suggestions to occur in 1% of the documents
<float name="thresholdTokenFrequency">.01</float>
-->
</lst></searchComponent>
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="df">text_spell</str><!--The default field for spell checking. -->
<str name="spellcheck.dictionary">default</str>
<!--<str name="spellcheck.dictionary">wordbreak</str>-->
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.alternativeTermCount">5</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollationTries">10</str>
<str name="spellcheck.maxCollations">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
<requestHandler name="search" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="q">abcdefghik</str><!-- 增加没有q参数的容错性 -->
<int name="rows">10</int>
</lst><!-- 这行代码非常重要,如果没有这行,拼写检查,是不同时起作用的 -->
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
//拼写检查建议
query.getSolrQuery().set("spellcheck", "true");
query.getSolrQuery().set("spellcheck.q",condition.getSearchWord());
query.getSolrQuery().set("spellcheck.count", 5);....
//当搜索不到结果时,显示建议词
SpellCheckResponse spellCheckResponse = rsp.getSpellCheckResponse();
if (spellCheckResponse != null) {
if(!spellCheckResponse.isCorrectlySpelled()){
List<String> wordList = new ArrayList<String>();
for(Suggestion s:spellCheckResponse.getSuggestions()){
wordList.addAll(s.getAlternatives());
}
result.setSuggestions(wordList);
}
}