比较lucene各种英文分析器Analyzer
2016-04-06 13:55
441 查看
比较常用的几种英文分析器,他们之间的区别见程序中的注释。
SimpleAnalyzer
StandardAnalyzer
WhitespaceAnalyzer
StopAnalyzer
SimpleAnalyzer
StandardAnalyzer
WhitespaceAnalyzer
StopAnalyzer
package analyzer; import java.io.Reader; import java.io.StringReader; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.SimpleAnalyzer; import org.apache.lucene.analysis.StopAnalyzer; import org.apache.lucene.analysis.StopFilter; import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.WhitespaceAnalyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; public class TestAnalyzer { private static String testString1 = "The quick brown fox jumped over the lazy dogs"; private static String testString2 = "xy&z mail is - xyz@sohu.com"; public static void testWhitespace(String testString) throws Exception{ Analyzer analyzer = new WhitespaceAnalyzer(); Reader r = new StringReader(testString); Tokenizer ts = (Tokenizer) analyzer.tokenStream("", r); System.err.println("=====Whitespace analyzer===="); System.err.println("分析方法:空格分割"); Token t; while ((t = ts.next()) != null) { System.out.println(t.termText()); } } public static void testSimple(String testString) throws Exception{ Analyzer analyzer = new SimpleAnalyzer(); Reader r = new StringReader(testString); Tokenizer ts = (Tokenizer) analyzer.tokenStream("", r); System.err.println("=====Simple analyzer===="); System.err.println("分析方法:空格及各种符号分割"); Token t; while ((t = ts.next()) != null) { System.out.println(t.termText()); } } public static void testStop(String testString) throws Exception{ Analyzer analyzer = new StopAnalyzer(); Reader r = new StringReader(testString); StopFilter sf = (StopFilter) analyzer.tokenStream("", r); System.err.println("=====stop analyzer===="); System.err.println("分析方法:空格及各种符号分割,去掉停止词,停止词包括 is,are,in,on,the等无实际意义的词"); //停止词 Token t; while ((t = sf.next()) != null) { System.out.println(t.termText()); } } public static void testStandard(String testString) throws Exception{ Analyzer analyzer = new StandardAnalyzer(); Reader r = new StringReader(testString); StopFilter sf = (StopFilter) analyzer.tokenStream("", r); System.err.println("=====standard analyzer===="); System.err.println("分析方法:混合分割,包括了去掉停止词,支持汉语"); Token t; while ((t = sf.next()) != null) { System.out.println(t.termText()); } } public static void main(String[] args) throws Exception{ // String testString = testString1; String testString = testString2; System.out.println(testString); testWhitespace(testString); testSimple(testString); testStop(testString); testStandard(testString); } }
相关文章推荐
- java Lucene 中自定义排序的实现
- 使用Java的Lucene搜索工具对检索结果进行分组和分页
- 关于lucene搜索时排序的问题
- 从零开始使用Hubbledotnet进行全文搜索-前言
- 打造自己的搜索引擎
- Lucene整合"庖丁解牛"中文分词包
- JAVA lucene全文检索工具包的理解与使用 分享
- Lucene:基于Java的全文检索引擎简介
- 使用Lucene 3.3.0的结构遍历TokenStream的内容.
- hadoop+lucene+web 综合小demo
- Lucene 学习笔记(一)
- spring4.2 定时任务
- lucene集成IK实现中文分词检索
- lucene4.2 + IKanalyzer2012FF_u1简单示例 .
- lucene solr在tomcat中的配置
- Lucene 3.6 contrib 学习总结
- lucene全文检索学习记录,附带源码——三种实现,超全超细致
- elasticsearch安装与调试
- Lucene 2.0.0下载安装及测试
- Lucene使用与优化