您的位置:首页 > 其它

基于Lucene3.5.0如何从TokenStream获得Token

2012-06-15 18:17 369 查看
原文地址:/article/2365400.html

通过学习Lucene3.5.0的doc文档,对不同release版本 lucene版本的API改动做分析。最后找到了有价值的改动信息。

LUCENE-2302: Deprecated TermAttribute and replaced by a new CharTermAttribute.
The change is backwards compatible, so mixed new/old TokenStreams all work on the same char[] buffer independent of which interface they use. CharTermAttribute has shorter method names and implements CharSequence and Appendable. This allows usage like Java's
StringBuilder in addition to direct char[] access. Also terms can directly be used in places where CharSequence is allowed (e.g. regular expressions). (Uwe Schindler, Robert Muir)

以上信息可以知道,原来的通过的方法已经不能够提取响应的Token了

[java] view
plaincopy

StringReader reader = new StringReader(s);

TokenStream ts =analyzer.tokenStream(s, reader);

TermAttribute ta = ts.getAttribute(TermAttribute.class);

通过分析Api文档信息 可知,CharTermAttribute已经成为替换TermAttribute的接口
因此我编写了一个例子来更好的从TokenStream中提取Token

[html] view
plaincopy

package com.segment;

import java.io.StringReader;

import org.apache.lucene.analysis.Analyzer;

import org.apache.lucene.analysis.Token;

import org.apache.lucene.analysis.TokenStream;

import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

import org.apache.lucene.analysis.tokenattributes.TermAttribute;

import org.apache.lucene.util.AttributeImpl;

import org.wltea.analyzer.lucene.IKAnalyzer;

public class Segment {

public static String show(Analyzer a, String s) throws Exception {

StringReader reader = new StringReader(s);

TokenStream ts = a.tokenStream(s, reader);

String s1 = "", s2 = "";

boolean hasnext= ts.incrementToken();

//Token t = ts.next();

while (hasnext) {

//AttributeImpl ta = new AttributeImpl();

CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class);

//TermAttribute ta = ts.getAttribute(TermAttribute.class);

s2 = ta.toString() + " ";

s1 += s2;

hasnext = ts.incrementToken();

}

return s1;

}

public String segment(String s) throws Exception {

Analyzer a = new IKAnalyzer();

return show(a, s);

}

public static void main(String args[])

{

String name = "我是俊杰,我爱编程,我的测试用例";

Segment s = new Segment();

String test = "";

try {

System.out.println(test+s.segment(name));

} catch (Exception e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

}

}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: