您的位置：首页 > 其它

和solr的N天N夜（二）--加入中文分词器

2016-03-17 14:59 417 查看

因为solr本身对中文的分词效果较差，所有需要集成第三方的中文分词器。针对Solr的分词器比较多，其中最常用的的两个是mmseg4j和ik-analyzer。在这里，我选用的是mmseg4j。

1：导入对应的jar包：

下载mmseg4j-solr-2.3.1-SNAPSHOT.jar、mmseg4j-core-1.10.1-SNAPSHOT.jar两个jar包之后，拷贝到solr工程的lib目录下。

2：配置schema.xml

<span style="font-size:14px;"> <span style="font-weight: normal;">   <!-- mmseg4j中文分词器配置，配置filedType类型名称-->
<fieldtype name="textComplex" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicPath="dic"/>
</analyzer>
</fieldtype>
<fieldtype name="textMaxWord" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word" />
</analyzer>
</fieldtype>

<fieldtype name="textSimple" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="simple" dicPath="n:/custom/path/to/my_dic" />
</analyzer>
</fieldtype></span></span>

mmseg4j提供了三种类型的分词器，具体类型查看其它文档即可。

在配置了fieldType之后，就可以在schema.xml中配置field中配置需要查询的中文字段。

<!--iamge_info表字段-->
   <field name="src" type="string" indexed="true" stored="true"/>
   <!--<field name="key_info" type="string" indexed="true" stored="true"/>-->
   <field name="key_info" type="textMaxWord" indexed="true" stored="true"/>
   <field name="update_date" type="date" indexed="true" stored="true"/>

3：重启tomcat服务器，并测试

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： solr 搜索数据索引

相关文章推荐

新的分享

章节导航