基于Lucene3.5.0怎样从TokenStream获得Token
2015-03-15 15:06
239 查看
通过学习Lucene3.5.0的doc文档,对不同release版本号 lucene版本号的API修改做分析。最后找到了有价值的修改信息。
LUCENE-2302: Deprecated TermAttribute and replaced by a new CharTermAttribute. The change is backwards compatible, so mixed new/old TokenStreams all work on the same char[] buffer independent
of which interface they use. CharTermAttribute has shorter method names and implements CharSequence and Appendable. This allows usage like Java's StringBuilder in addition to direct char[] access. Also terms can directly be used in places where CharSequence
is allowed (e.g. regular expressions). (Uwe Schindler, Robert Muir)
以上信息可以知道,原来的通过的方法已经不可以提取响应的Token了
通过分析Api文档信息 可知,CharTermAttribute已经成为替换TermAttribute的接口
因此我编写了一个样例来更好的从TokenStream中提取Token
LUCENE-2302: Deprecated TermAttribute and replaced by a new CharTermAttribute. The change is backwards compatible, so mixed new/old TokenStreams all work on the same char[] buffer independent
of which interface they use. CharTermAttribute has shorter method names and implements CharSequence and Appendable. This allows usage like Java's StringBuilder in addition to direct char[] access. Also terms can directly be used in places where CharSequence
is allowed (e.g. regular expressions). (Uwe Schindler, Robert Muir)
以上信息可以知道,原来的通过的方法已经不可以提取响应的Token了
StringReader reader = new StringReader(s); TokenStream ts =analyzer.tokenStream(s, reader); TermAttribute ta = ts.getAttribute(TermAttribute.class);
通过分析Api文档信息 可知,CharTermAttribute已经成为替换TermAttribute的接口
因此我编写了一个样例来更好的从TokenStream中提取Token
package com.segment; import java.io.StringReader; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.analysis.tokenattributes.TermAttribute; import org.apache.lucene.util.AttributeImpl; import org.wltea.analyzer.lucene.IKAnalyzer; public class Segment { public static String show(Analyzer a, String s) throws Exception { StringReader reader = new StringReader(s); TokenStream ts = a.tokenStream(s, reader); String s1 = "", s2 = ""; boolean hasnext= ts.incrementToken(); //Token t = ts.next(); while (hasnext) { //AttributeImpl ta = new AttributeImpl(); CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class); //TermAttribute ta = ts.getAttribute(TermAttribute.class); s2 = ta.toString() + " "; s1 += s2; hasnext = ts.incrementToken(); } return s1; } public String segment(String s) throws Exception { Analyzer a = new IKAnalyzer(); return show(a, s); } public static void main(String args[]) { String name = "我是俊杰,我爱编程,我的測试用例"; Segment s = new Segment(); String test = ""; try { System.out.println(test+s.segment(name)); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } } }
相关文章推荐
- 基于Lucene3.5.0怎样从TokenStream获得Token
- 基于Lucene3.5.0怎样从TokenStream获得Token
- 基于Lucene3.5.0怎样从TokenStream获得Token
- 基于Lucene3.5.0怎样从TokenStream获得Token
- 基于Lucene3.5.0怎样从TokenStream获得Token
- 基于Lucene3.5.0如何从TokenStream获得Token
- 基于Lucene3.5.0如何从TokenStream获得Token
- lucene中的Token, TokenStream, Tokenizer, Analyzer
- Lucene.Net 3.0.3如何从TokenStream中获取token对象
- lucene中的Token, TokenStream, Tokenizer, Analyzer
- lucene中的Token, TokenStream, Tokenizer, Analyzer
- Lucene分词实现---Analyzer、TokenStream(Token、Tokenizer、Tokenfilter)
- Lucene分词实现---Analyzer、TokenStream(Token、Tokenizer、Tokenfilter)
- 配置Paoding分词时出现的错误:java.lang.AbstractMethodError: org.apache.lucene.analysis.TokenStream.incrementToken()
- lucene中的Token, TokenStream, Tokenizer, Analyzer
- Lucene--TokenStream(TokenFilter、Tokenizer)
- org.apache.lucene.analysis.TokenStream.incrementToken()Z
- Lucene.Net 3.0.3如何从TokenStream中获取token对象
- 基于 Token 的身份验证方法(流程)
- 几种常见的基于Lucene的开源搜索解决方案对比