您的位置：首页 > 其它

1.getting started Stanford CoreNLP

2017-04-10 19:50 363 查看

CoreNLP的核心包包括两个类：Annotation 和 Annotator。

Annotations 是用来保存 annotators的结果的数据结构，Annotations 一般是map，Annotators 更像函数，不过他们对Annotations进行操作，而不是针对Objects。

Annotators 可以进行 tokenize，parse，NER，POS。Annotators 和Annotations 整合在 AnnotationPipelines 中，Stanford CoreNLP 继承了AnnotationPipeline 类，并且自定义了NLPAnnotators。Annotators
的输出需要使用 CoreMap 和 CoreLabel来获取。

1. 通过StanfordCoreNLP(Properties props)来创建StanfordCoreNLP对象

2. 通过annotate(Annotation document) 来解析任意的文本。

public class WordSeg {
public static void main(String[] args) {
// 创建一个StanfordCoreNLP对象，
// 包括POS tagging, lemmatization, NER, parsing, and coreference
// resolution
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,dcoref");
// 创建一个Stanford coreNLP对象
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

String text = "Until the 19th century and the first opium war, Shanghai was considered to be essentially a fishing village. However, in 1914, Shanghai had 200 banks dealing with 80% of its foreign investments in China.";
// 用上述文本创建一个空的Annotation
Annotation document = new Annotation(text);
System.out.println("空的Annotation:"+document);
// 对文本进行所有上述定义的操作
pipeline.annotate(document);

// 这是text中所有的sentences
// CoreMap<class object,custom types>
List<CoreMap> sentences = document.get(SentencesAnnotation.class);

for (CoreMap sentence : sentences) {
System.out.println("sentence:"+sentence);
// CoreLabel是具有特殊token处理方法的CoreMap
for (CoreLabel token : sentence.get(TokensAnnotation.class)) {
System.out.println("token:"+token);
// 这是token的文本内容（word）
String word = token.get(TextAnnotation.class);
System.out.println("word:"+word);
// 这是token的词性标注标签
String pos = token.get(PartOfSpeechAnnotation.class);
System.out.println("pos:"+pos);
// 这是token的NER标签
String ne = token.get(NamedEntityTagAnnotation.class);
System.out.println("ne:"+ne);
}
// 这是sentence的句法分析树
Tree tree = sentence.get(TreeAnnotation.class);
System.out.println(tree);
// 这是sentence的依赖图
SemanticGraph dependencies = sentence.get(CollapsedDependenciesAnnotation.class);
System.out.println(dependencies);
}
// 这是指代链的图
Map<Integer, CorefChain> graph = document.get(CorefChainAnnotation.class);
System.out.println(graph);
}
}

一部分输出如下图所示：

sentence:Until the 19th century and the first opium war, Shanghai was considered to be essentially a fishing village.
token:Until-1
word:Until
pos:IN
ne:O
token:the-2
word:the
pos:DT
ne:DATE
token:19th-3
word:19th
pos:JJ
ne:DATE
token:century-4
word:century
pos:NN
ne:DATE
token:and-5
word:and
pos:CC
ne:O

// 句法分析树

(ROOT (S (PP (IN Until) (NP (NP (DT the) (JJ 19th) (NN century)) (CC and) (NP (DT the) (JJ first) (NN opium) (NN war)))) (, ,) (NP (NNP Shanghai)) (VP (VBD was) (VP (VBN considered) (S (VP (TO to) (VP (VB be) (NP (RB essentially) (DT a) (NN fishing) (NN village))))))) (. .)))

// 依赖图

-> considered/VBN (root)
-> century/NN (nmod:until)
-> Until/IN (case)
-> the/DT (det)
-> 19th/JJ (amod)
-> and/CC (cc)
-> war/NN (conj:and)
-> the/DT (det)
-> first/JJ (amod)
-> opium/NN (compound)
-> ,/, (punct)
-> Shanghai/NNP (nsubjpass)
-> was/VBD (auxpass)
-> village/NN (xcomp)
-> to/TO (mark)
-> be/VB (cop)
-> essentially/RB (advmod)
-> a/DT (det)
-> fishing/NN (compound)
-> ./. (punct)

// 指代链

{1=CHAIN1-["first" in sentence 1], 2=CHAIN2-["Shanghai" in sentence 1, "Shanghai" in sentence 2], 3=CHAIN3-["the 19th century and the first opium war" in sentence 1], 4=CHAIN4-["the 19th century" in sentence 1], 5=CHAIN5-["the first opium war" in sentence 1], 6=CHAIN6-["essentially a fishing village" in sentence 1], 8=CHAIN8-["200" in sentence 2], 9=CHAIN9-["China" in sentence 2], 10=CHAIN10-["1914" in sentence 2, "its" in sentence 2], 11=CHAIN11-["200 banks dealing with 80 % of its foreign investments in China" in sentence 2], 12=CHAIN12-["its foreign investments" in sentence 2]}

运行后若出现错误

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

解决办法是在pom.xml中加入依赖关系

<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.12</version>
</dependency>

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： Stanford CoreNLP nlp

相关文章推荐

新的分享

章节导航