通过stanford-postagger对英文单词进行词性标注
2014-03-10 10:03
405 查看
1.models介绍
该版本的词性标注工具中有一个models文件夹,该文件夹下有两种类型的文件:.tagger类型和. props类型。其中.tagger类型的文件是词性标注训练出来的模型文件,. props类型是其对应的properties文件。models文件夹下所有的文件如下图:
2.程序及说明
这个开源词性标注工具中有三种分类器,english-bidirectional-distsim.tagger english-left3words-distsim.tagger wsj-0-18-bidirectional-nodistsim.tagger,根据他的说明文档,标注的准确率大概在97.01%,另外,该工具还可以对中文、德文等语言进行词性标注。下面来看看标注程序及标注结果:
2.1.标注程序
[java] viewplaincopy
public class Tagger {
public static void main(String[] args) throws Exception {
String str = "The list of prisoners who may be released in coming days includes militants" +
" who threw firebombs, in one case at a bus carrying children; stabbed and shot" +
" civilians, including women, elderly Jews and suspected Palestinian collaborators; " +
"and ambushed and killed border guards, police officers, security agents and soldiers. " +
"All of them have been in prison for at least two decades; some were serving life sentences.";
MaxentTagger tagger = new MaxentTagger("c:/wsj-0-18-bidirectional-nodistsim.tagger");
Long start = System.currentTimeMillis();
List<List<HasWord>> sentences = MaxentTagger.tokenizeText(new StringReader(str));
System.out.println("Tagging 用时"+(System.currentTimeMillis() - start)+"毫秒");
for (List<HasWord> sentence : sentences) {
ArrayList<TaggedWord> tSentence = tagger.tagSentence(sentence);
System.out.println(Sentence.listToString(tSentence, false));
}
}
}
2.2.标注结果
[plain] viewplaincopy
Tagging 用时84毫秒
The/DT list/NN of/IN prisoners/NNS who/WP may/MD be/VB released/VBN in/IN coming/VBG days/NNS includes/VBZ militants/NNS who/WP threw/VBD firebombs/NNS ,/,
in/IN one/CD case/NN at/IN a/DT bus/NN carrying/VBG children/NNS ;/: stabbed/VBN and/CC shot/VBN civilians/NNS ,/, including/VBG women/NNS ,/, elderly/JJ
Jews/NNS and/CC suspected/JJ Palestinian/JJ collaborators/NNS ;/: and/CC ambushed/VBN and/CC killed/VBN border/NN guards/NNS ,/, police/NN officers/NNS ,/,
security/NN agents/NNS and/CC soldiers/NNS ./.All/DT of/IN them/PRP have/VBP been/VBN in/IN prison/NN for/IN at/IN least/JJS two/CD decades/NNS ;/: some/DT
were/VBD serving/VBG life/NN sentences/NNS ./.
下面这张表,是英文单词的词性表
从上面的表和程序的标注结果来看,分词是很准确的。
相关文章推荐
- stanford-postagger-full词性标注
- 如何使用斯坦福pos tagger进行词性标注[转—英文]
- stanford-postagger中文词性标注
- Stanford 英文词性标注(Part-of-speech)缩写查询
- 对英文单词的词性标注
- stanford命令行进行词性标注
- nlp---使用NLTK进行建构词性标注器
- 显示InputDialog输入对话框实现对用户输入的英文单词进行简单处理
- 通过英文剧集、电影学单词的小工具
- Stanford nlp 初步之词性标注
- ictclas,ansj,结巴分词,StanfordNLP中文分词以及所用词性标注集
- 通过英文剧集、电影学单词的小工具
- 对博客订阅源URL中的单词进行计数 (仅限英文博客,中文订阅源不支持 )
- 采用Stanford CoreNLP实现英文单词词形还原
- linux中list结构应用及构造hashtable进行英文文章单词数统计
- 对英文单词进行单数复数判断和转换方法
- 采用Stanford CoreNLP实现英文单词词形还原
- 通过sql语句进行排序(中文、英文都可以排序)
- Android 对SD卡下英文单词进行发音
- 通过PHP把一篇英文文档中所有单词的首字母转为大写