您的位置：首页 > 其它

依存句法CoNLL-U 格式(CoNLL2014)

2015-07-30 14:17 1991 查看

因为dependency-Based Word Embeddings需要用到CoNLL格式的句法，这里:
https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/
以下摘自
http://universaldependencies.github.io/docs/format.html
Sentences consist of one or more word lines, and word lines contain the following fields:

ID: Word index, integer starting at 1 for each new sentence; may be a range for tokens with multiple words.
FORM: Word form or punctuation symbol.
LEMMA: Lemma or stem of word form.
CPOSTAG: Universal part-of-speech tag drawn from our revised version of the Google universal POS tags.
POSTAG: Language-specific part-of-speech tag; underscore if not available.
FEATS: List of morphological features from the universal feature inventory or from a defined language-specific
extension; underscore if not available.
HEAD: Head of the current token, which is either a value of ID or zero (0).
DEPREL: Universal Stanford dependency relation to the HEAD (root iff HEAD = 0) or a defined language-specific subtype of one.
DEPS: List of secondary dependencies (head-deprel pairs).
MISC: Any other annotation.

ID	FORM	LEMMA	CPOSTAG	POSTAG	FEATS	HEAD	DEPREL	DEPS	MISC
1	They	they	PRON	PRN	Case=Nom\|Number=Plur	2	nsubj	4	nsubj
2	buy	buy	VERB	VBP	Number=Plur\|Person=3\|Tense=Pres	0	root	_	_
3	and	and	CONJ	CC	_	2	cc	_	_
4	sell	sell	VERB	VBP	Number=Plur\|Person=3\|Tense=Pres	2	conj	0	root
5	books	book	NOUN	NNS	Number=Plur	2	dobj	4	dobj
6	.	.	PUNCT	.	_	2	punct	_	_

以下摘自
http://hanlp.linrunsoft.com/doc/_build/html/dependency_parser.html

CONLL标注格式包含10列，分别为：
———————————————————————————
ID   FORM    LEMMA   CPOSTAG POSTAG  FEATS   HEAD    DEPREL  PHEAD   PDEPREL
———————————————————————————

只用到前８列，其含义分别为：

1    ID      当前词在句子中的序号，１开始.
2    FORM    当前词语或标点
3    LEMMA   当前词语（或标点）的原型或词干，在中文中，此列与FORM相同
4    CPOSTAG 当前词语的词性（粗粒度）
5    POSTAG  当前词语的词性（细粒度）
6    FEATS   句法特征，在本次评测中，此列未被使用，全部以下划线代替。
7    HEAD    当前词语的中心词
8    DEPREL  当前词语与中心词的依存关系

在CONLL格式中，每个词语占一行，无值列用下划线'_'代替，列的分隔符为制表符'\t'，行的分隔符为换行符'\n'；句子与句子之间用空行分隔。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航