基于Lucene3.5.0如何从TokenStream获得Token
2012-06-15 18:17
369 查看
原文地址:/article/2365400.html
通过学习Lucene3.5.0的doc文档,对不同release版本 lucene版本的API改动做分析。最后找到了有价值的改动信息。
LUCENE-2302: Deprecated TermAttribute and replaced by a new CharTermAttribute.
The change is backwards compatible, so mixed new/old TokenStreams all work on the same char[] buffer independent of which interface they use. CharTermAttribute has shorter method names and implements CharSequence and Appendable. This allows usage like Java's
StringBuilder in addition to direct char[] access. Also terms can directly be used in places where CharSequence is allowed (e.g. regular expressions). (Uwe Schindler, Robert Muir)
以上信息可以知道,原来的通过的方法已经不能够提取响应的Token了
[java] view
plaincopy
StringReader reader = new StringReader(s);
TokenStream ts =analyzer.tokenStream(s, reader);
TermAttribute ta = ts.getAttribute(TermAttribute.class);
通过分析Api文档信息 可知,CharTermAttribute已经成为替换TermAttribute的接口
因此我编写了一个例子来更好的从TokenStream中提取Token
[html] view
plaincopy
package com.segment;
import java.io.StringReader;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.tokenattributes.TermAttribute;
import org.apache.lucene.util.AttributeImpl;
import org.wltea.analyzer.lucene.IKAnalyzer;
public class Segment {
public static String show(Analyzer a, String s) throws Exception {
StringReader reader = new StringReader(s);
TokenStream ts = a.tokenStream(s, reader);
String s1 = "", s2 = "";
boolean hasnext= ts.incrementToken();
//Token t = ts.next();
while (hasnext) {
//AttributeImpl ta = new AttributeImpl();
CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class);
//TermAttribute ta = ts.getAttribute(TermAttribute.class);
s2 = ta.toString() + " ";
s1 += s2;
hasnext = ts.incrementToken();
}
return s1;
}
public String segment(String s) throws Exception {
Analyzer a = new IKAnalyzer();
return show(a, s);
}
public static void main(String args[])
{
String name = "我是俊杰,我爱编程,我的测试用例";
Segment s = new Segment();
String test = "";
try {
System.out.println(test+s.segment(name));
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
通过学习Lucene3.5.0的doc文档,对不同release版本 lucene版本的API改动做分析。最后找到了有价值的改动信息。
LUCENE-2302: Deprecated TermAttribute and replaced by a new CharTermAttribute.
The change is backwards compatible, so mixed new/old TokenStreams all work on the same char[] buffer independent of which interface they use. CharTermAttribute has shorter method names and implements CharSequence and Appendable. This allows usage like Java's
StringBuilder in addition to direct char[] access. Also terms can directly be used in places where CharSequence is allowed (e.g. regular expressions). (Uwe Schindler, Robert Muir)
以上信息可以知道,原来的通过的方法已经不能够提取响应的Token了
[java] view
plaincopy
StringReader reader = new StringReader(s);
TokenStream ts =analyzer.tokenStream(s, reader);
TermAttribute ta = ts.getAttribute(TermAttribute.class);
通过分析Api文档信息 可知,CharTermAttribute已经成为替换TermAttribute的接口
因此我编写了一个例子来更好的从TokenStream中提取Token
[html] view
plaincopy
package com.segment;
import java.io.StringReader;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.tokenattributes.TermAttribute;
import org.apache.lucene.util.AttributeImpl;
import org.wltea.analyzer.lucene.IKAnalyzer;
public class Segment {
public static String show(Analyzer a, String s) throws Exception {
StringReader reader = new StringReader(s);
TokenStream ts = a.tokenStream(s, reader);
String s1 = "", s2 = "";
boolean hasnext= ts.incrementToken();
//Token t = ts.next();
while (hasnext) {
//AttributeImpl ta = new AttributeImpl();
CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class);
//TermAttribute ta = ts.getAttribute(TermAttribute.class);
s2 = ta.toString() + " ";
s1 += s2;
hasnext = ts.incrementToken();
}
return s1;
}
public String segment(String s) throws Exception {
Analyzer a = new IKAnalyzer();
return show(a, s);
}
public static void main(String args[])
{
String name = "我是俊杰,我爱编程,我的测试用例";
Segment s = new Segment();
String test = "";
try {
System.out.println(test+s.segment(name));
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
相关文章推荐
- 基于Lucene3.5.0如何从TokenStream获得Token
- 基于Lucene3.5.0怎样从TokenStream获得Token
- 基于Lucene3.5.0怎样从TokenStream获得Token
- 基于Lucene3.5.0怎样从TokenStream获得Token
- 基于Lucene3.5.0怎样从TokenStream获得Token
- 基于Lucene3.5.0怎样从TokenStream获得Token
- 基于Lucene3.5.0怎样从TokenStream获得Token
- Lucene.Net 3.0.3如何从TokenStream中获取token对象
- Lucene.Net 3.0.3如何从TokenStream中获取token对象
- 关于websphere 上如何使用 ssl 向 qyapi.weixin.com 发送 获得access_token 的请求
- lucene中的Token, TokenStream, Tokenizer, Analyzer
- 如何获得存储在AccountManager里的Token
- lucene中的Token, TokenStream, Tokenizer, Analyzer
- 配置Paoding分词时出现的错误:java.lang.AbstractMethodError: org.apache.lucene.analysis.TokenStream.incrementToken()
- lucene中的Token, TokenStream, Tokenizer, Analyzer
- Lucene分词实现---Analyzer、TokenStream(Token、Tokenizer、Tokenfilter)
- lucene中的Token, TokenStream, Tokenizer, Analyzer
- 基于 session 和基于 token 的用户认证方式到底该如何选择
- facebook开发如何获得当前登录用户的token
- org.apache.lucene.analysis.TokenStream.incrementToken()Z