1.getting started Stanford CoreNLP
2017-04-10 19:50
363 查看
CoreNLP的核心包包括两个类:Annotation 和 Annotator。
Annotations 是用来保存 annotators的结果的数据结构,Annotations 一般是map,Annotators 更像函数,不过他们对Annotations进行操作,而不是针对Objects。
Annotators 可以进行 tokenize,parse,NER,POS。Annotators 和Annotations 整合在 AnnotationPipelines 中,Stanford CoreNLP 继承了AnnotationPipeline 类,并且自定义了NLPAnnotators。Annotators
的输出需要使用 CoreMap 和 CoreLabel来获取。
1. 通过StanfordCoreNLP(Properties props)来创建StanfordCoreNLP对象
2. 通过annotate(Annotation document) 来解析任意的文本。
运行后若出现错误
解决办法是在pom.xml中加入依赖关系
Annotations 是用来保存 annotators的结果的数据结构,Annotations 一般是map,Annotators 更像函数,不过他们对Annotations进行操作,而不是针对Objects。
Annotators 可以进行 tokenize,parse,NER,POS。Annotators 和Annotations 整合在 AnnotationPipelines 中,Stanford CoreNLP 继承了AnnotationPipeline 类,并且自定义了NLPAnnotators。Annotators
的输出需要使用 CoreMap 和 CoreLabel来获取。
1. 通过StanfordCoreNLP(Properties props)来创建StanfordCoreNLP对象
2. 通过annotate(Annotation document) 来解析任意的文本。
public class WordSeg { public static void main(String[] args) { // 创建一个StanfordCoreNLP对象, // 包括POS tagging, lemmatization, NER, parsing, and coreference // resolution Properties props = new Properties(); props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,dcoref"); // 创建一个Stanford coreNLP对象 StanfordCoreNLP pipeline = new StanfordCoreNLP(props); String text = "Until the 19th century and the first opium war, Shanghai was considered to be essentially a fishing village. However, in 1914, Shanghai had 200 banks dealing with 80% of its foreign investments in China."; // 用上述文本创建一个空的Annotation Annotation document = new Annotation(text); System.out.println("空的Annotation:"+document); // 对文本进行所有上述定义的操作 pipeline.annotate(document); // 这是text中所有的sentences // CoreMap<class object,custom types> List<CoreMap> sentences = document.get(SentencesAnnotation.class); for (CoreMap sentence : sentences) { System.out.println("sentence:"+sentence); // CoreLabel是具有特殊token处理方法的CoreMap for (CoreLabel token : sentence.get(TokensAnnotation.class)) { System.out.println("token:"+token); // 这是token的文本内容(word) String word = token.get(TextAnnotation.class); System.out.println("word:"+word); // 这是token的词性标注标签 String pos = token.get(PartOfSpeechAnnotation.class); System.out.println("pos:"+pos); // 这是token的NER标签 String ne = token.get(NamedEntityTagAnnotation.class); System.out.println("ne:"+ne); } // 这是sentence的句法分析树 Tree tree = sentence.get(TreeAnnotation.class); System.out.println(tree); // 这是sentence的依赖图 SemanticGraph dependencies = sentence.get(CollapsedDependenciesAnnotation.class); System.out.println(dependencies); } // 这是指代链的图 Map<Integer, CorefChain> graph = document.get(CorefChainAnnotation.class); System.out.println(graph); } }
一部分输出如下图所示:
sentence:Until the 19th century and the first opium war, Shanghai was considered to be essentially a fishing village. token:Until-1 word:Until pos:IN ne:O token:the-2 word:the pos:DT ne:DATE token:19th-3 word:19th pos:JJ ne:DATE token:century-4 word:century pos:NN ne:DATE token:and-5 word:and pos:CC ne:O
// 句法分析树
(ROOT (S (PP (IN Until) (NP (NP (DT the) (JJ 19th) (NN century)) (CC and) (NP (DT the) (JJ first) (NN opium) (NN war)))) (, ,) (NP (NNP Shanghai)) (VP (VBD was) (VP (VBN considered) (S (VP (TO to) (VP (VB be) (NP (RB essentially) (DT a) (NN fishing) (NN village))))))) (. .)))
// 依赖图
-> considered/VBN (root) -> century/NN (nmod:until) -> Until/IN (case) -> the/DT (det) -> 19th/JJ (amod) -> and/CC (cc) -> war/NN (conj:and) -> the/DT (det) -> first/JJ (amod) -> opium/NN (compound) -> ,/, (punct) -> Shanghai/NNP (nsubjpass) -> was/VBD (auxpass) -> village/NN (xcomp) -> to/TO (mark) -> be/VB (cop) -> essentially/RB (advmod) -> a/DT (det) -> fishing/NN (compound) -> ./. (punct)
// 指代链
{1=CHAIN1-["first" in sentence 1], 2=CHAIN2-["Shanghai" in sentence 1, "Shanghai" in sentence 2], 3=CHAIN3-["the 19th century and the first opium war" in sentence 1], 4=CHAIN4-["the 19th century" in sentence 1], 5=CHAIN5-["the first opium war" in sentence 1], 6=CHAIN6-["essentially a fishing village" in sentence 1], 8=CHAIN8-["200" in sentence 2], 9=CHAIN9-["China" in sentence 2], 10=CHAIN10-["1914" in sentence 2, "its" in sentence 2], 11=CHAIN11-["200 banks dealing with 80 % of its foreign investments in China" in sentence 2], 12=CHAIN12-["its foreign investments" in sentence 2]}
运行后若出现错误
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
解决办法是在pom.xml中加入依赖关系
<dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-simple</artifactId> <version>1.7.12</version> </dependency>
相关文章推荐
- 如何使用Stanford CoreNlp做中文情感分析
- stanford coreNLP 在自己的工程下按文件处理
- Stanford CoreNLP遇到的问题
- 使用Stanford CoreNLP工具包处理中文
- Stanford coreNLP 出现 in thread "main" java.lang.OutOfMemoryError: Java heap space
- Eclipse下使用Stanford CoreNLP的方法
- StanfordCoreNLP中文demo使用的OOM问题及解决
- Stanford CoreNLP 进行中文分词
- 采用Stanford CoreNLP实现英文单词词形还原
- [译] 第二十天:Stanford CoreNLP - 用Java对Twitter进行情感分析
- Stanford CoreNLP – Core natural language software
- 命令行调用StanfordCoreNLP3.8.0中文+JDK1.9版本
- Stanford CoreNLP 3.6.0 中文指代消解模块调用失败的解决方案
- Stanford CoreNLP 3.6.0 使用入门
- 采用Stanford CoreNLP实现英文单词词形还原
- 使用Stanford CoreNLP的Python封装包处理中文(分词、词性标注、命名实体识别、句法树、依存句法分析)
- stanford CoreNLP 命名实体识别NER学习笔记
- [置顶] 使用Stanford CoreNLP工具包处理中文
- Stanford CoreNLP--Part of Speech
- Ubuntu下安装Stanford CoreNLP