您的位置：首页 > 其它

IKAnalyzer 添加扩展词库和自定义词

2015-12-01 15:52 351 查看

原文链接/article/7725509.html

IKanalyzer分词器

IK分词器源码位置 http://git.oschina.net/wltea/IK-Analyzer-2012FF

IKanalyzer源码基本配置

如图所示是IKanlyzer加载默认配置的路径

项目中配置扩展词库

如图所示，当我们导入Ikanlyzer jar包后，使用扩展词库只需要在项目的src根目录下建立IKAnalyzer.cfg.xml文件，文件中配置扩展词库和停用词库的路径，具体配置如下所示：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict">com/zhaochao/ikconf/ext.dic;com/zhaochao/ikconf/mine.dic;</entry>
<!--用户可以在这里配置自己的扩展停止词字典 -->
<entry key="ext_stopwords">/com/zhaochao/ikconf/stopword.dic</entry>
</properties>

测试结果

当我们不添加任何自定义词时分词结果如下图所示：

java|是|一个|好|语言|从到|2015年|12月|1日|它|已经有|20|年的历史|了|

当我们添加如下自定义词时

分词结果为：

java|是|一个好语言|从到|2015年12月1日|它|已经有|20年的历史了|

测试代码：

public static void main(String[] args) throws IOException {
String s = "JAVA是一个好语言，从到2015年12月1日它已经有20年的历史了";
queryWords(s);
}

public static void queryWords(String query) throws IOException {
Configuration cfg = DefaultConfig.getInstance();
System.out.println(cfg.getMainDictionary()); // 系统默认词库
System.out.println(cfg.getQuantifierDicionary());
List<String> list = new ArrayList<String>();
StringReader input = new StringReader(query.trim());
IKSegmenter ikSeg = new IKSegmenter(input, true);   // true 用智能分词 ，false细粒度
for (Lexeme lexeme = ikSeg.next(); lexeme != null; lexeme = ikSeg.next()) {
System.out.print(lexeme.getLexemeText()+"|");
}

}

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航