依存句法CoNLL-U 格式(CoNLL2014)
2015-07-30 14:17
1991 查看
因为dependency-Based Word Embeddings需要用到CoNLL格式的句法,这里:
https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/
以下摘自
http://universaldependencies.github.io/docs/format.html
Sentences consist of one or more word lines, and word lines contain the following fields:
ID: Word index, integer starting at 1 for each new sentence; may be a range for tokens with multiple words.
FORM: Word form or punctuation symbol.
LEMMA: Lemma or stem of word form.
CPOSTAG: Universal part-of-speech tag drawn from our revised version of the Google universal POS tags.
POSTAG: Language-specific part-of-speech tag; underscore if not available.
FEATS: List of morphological features from the universal feature inventory or from a defined language-specific
extension; underscore if not available.
HEAD: Head of the current token, which is either a value of ID or zero (0).
DEPREL: Universal Stanford dependency relation to the HEAD (root iff HEAD = 0) or a defined language-specific subtype of one.
DEPS: List of secondary dependencies (head-deprel pairs).
MISC: Any other annotation.
以下摘自
http://hanlp.linrunsoft.com/doc/_build/html/dependency_parser.html
https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/
以下摘自
http://universaldependencies.github.io/docs/format.html
Sentences consist of one or more word lines, and word lines contain the following fields:
ID: Word index, integer starting at 1 for each new sentence; may be a range for tokens with multiple words.
FORM: Word form or punctuation symbol.
LEMMA: Lemma or stem of word form.
CPOSTAG: Universal part-of-speech tag drawn from our revised version of the Google universal POS tags.
POSTAG: Language-specific part-of-speech tag; underscore if not available.
FEATS: List of morphological features from the universal feature inventory or from a defined language-specific
extension; underscore if not available.
HEAD: Head of the current token, which is either a value of ID or zero (0).
DEPREL: Universal Stanford dependency relation to the HEAD (root iff HEAD = 0) or a defined language-specific subtype of one.
DEPS: List of secondary dependencies (head-deprel pairs).
MISC: Any other annotation.
ID | FORM | LEMMA | CPOSTAG | POSTAG | FEATS | HEAD | DEPREL | DEPS | MISC |
1 | They | they | PRON | PRN | Case=Nom|Number=Plur | 2 | nsubj | 4 | nsubj |
2 | buy | buy | VERB | VBP | Number=Plur|Person=3|Tense=Pres | 0 | root | _ | _ |
3 | and | and | CONJ | CC | _ | 2 | cc | _ | _ |
4 | sell | sell | VERB | VBP | Number=Plur|Person=3|Tense=Pres | 2 | conj | 0 | root |
5 | books | book | NOUN | NNS | Number=Plur | 2 | dobj | 4 | dobj |
6 | . | . | PUNCT | . | _ | 2 | punct | _ | _ |
http://hanlp.linrunsoft.com/doc/_build/html/dependency_parser.html
CONLL标注格式包含10列,分别为: ——————————————————————————— ID FORM LEMMA CPOSTAG POSTAG FEATS HEAD DEPREL PHEAD PDEPREL ——————————————————————————— 只用到前8列,其含义分别为: 1 ID 当前词在句子中的序号,1开始. 2 FORM 当前词语或标点 3 LEMMA 当前词语(或标点)的原型或词干,在中文中,此列与FORM相同 4 CPOSTAG 当前词语的词性(粗粒度) 5 POSTAG 当前词语的词性(细粒度) 6 FEATS 句法特征,在本次评测中,此列未被使用,全部以下划线代替。 7 HEAD 当前词语的中心词 8 DEPREL 当前词语与中心词的依存关系 在CONLL格式中,每个词语占一行,无值列用下划线'_'代替,列的分隔符为制表符'\t',行的分隔符为换行符'\n';句子与句子之间用空行分隔。
相关文章推荐
- Pinot安装并简单部署测试环境
- ios下 active 演示激活
- Android开发学习笔记:WebView 一
- Android procrank , showmap 内存分析
- RHEL十七(计划将来的Linux任务)
- mac 安装homebrew
- 结合"hello world"探讨gcc编译程序的过程
- hdu 4324 Triangle LOVE 拓扑排序
- Xcode6及以上版本,创建Auto Layout 约束时产生的一些变化
- 实用CMD命令
- 开发运行hadoop的AvgScore程序
- 输入年月日时分秒,要求输出该年月日时分秒的下一秒,如果输出2004年12月31日59分59秒,输出2005年1月1日0时0分0秒
- 时屏蔽ios和android下点击元素时出现的阴影
- hdu 4324 Triangle LOVE 拓扑排序
- 神一般的链家自宫后就能飞吗?
- android自动接通电话:部分手机不支持
- 15. PHP 全局变量 - 超全局变量
- 技术知识图
- 妙用AlertActivity的mAlertParams 和 mAlert
- 吸血鬼数字,Java编程思想第四章练习10