[文献阅读] A Study of Translation Edit Rate with Targeted Human Annotation
2014-04-17 15:23
549 查看
A Study of Translation Edit Rate with Targeted Human Annotation
Matthew Snover and Bonnie Dorr
Institute for Advanced Computer Studies
University of Maryland
College Park, MD 20742
{snover,bonnie}@umiacs.umd.edu
本文重要信息摘要:
1、Translation Edit Rate (TER) measures the amount of editing that a human would have to perform to change a system output so it exactly matches a reference translation.
2、The methods of automatic machine translation consist of BLEU, METEOR,NIST,TER and so on.
3、We define a new, more intuitive measure of “goodness” of MT output—specifically, the number of edits needed to fix the output so that it semantically matches a correct translation.
4、Recently the GALE (Olive, 2005) (Global Autonomous Language Exploitation) research program introduced a new error measure called Translation Edit Rate (TER) that was originally designed to count the number of edits (including
phrasal shifts) performed by a human to change a hypothesis so that it is both fluent and has the correct meaning. This was then decomposed into two steps: defining a new reference and finding the minimum number
of edits so that the hypothesis exactly matches one of the references. This measure was defined such that all edits, including shifts, would have a cost of one. Finding only the minimum number of ed-its, without generating a new reference is the measure defined
as TER; finding the minimum of edits to a new targeted references is defined as human-targeted TER (or HTER).
5、BLEU (Papineni et al., 2002) calculates the score of a translation by measuring the number of n-grams, of varying length, of the system output that occur within the set of references.
6、METEOR (Banerjee and Lavie, 2005) is an evaluation measure that counts the number of exact word matches between the system output and reference. Unmatched words are then stemmed and matched. Additional penalities are assessed
for reordering the words between the hypothesis and reference. This method has been shown to correlate very well with human judgments.
7、TER is defined as the minimum number of edits needed to change a hypothesis so that it exactly matches one of the references, normalized by the average length of the references.
8、Possible edits include the insertion, deletion, and substitution of single words as well as shifts of word sequences.
9、
10、The number of insertions, deletions, and substitutions is calculated using dynamic programming. A greedy search is used to find the set of shifts, by repeatedly selecting the shift that most reduces the number of insertions,
deletions and substitutions, until no more beneficial shifts remain.
11、
12、In both TER and HTER, the majority of the edits were substitutions and deletions.
13、 In an analysis of shift size and distance, we found that most shifts are short in length (1 word) and are
by less than 7 words.
Matthew Snover and Bonnie Dorr
Institute for Advanced Computer Studies
University of Maryland
College Park, MD 20742
{snover,bonnie}@umiacs.umd.edu
本文重要信息摘要:
1、Translation Edit Rate (TER) measures the amount of editing that a human would have to perform to change a system output so it exactly matches a reference translation.
2、The methods of automatic machine translation consist of BLEU, METEOR,NIST,TER and so on.
3、We define a new, more intuitive measure of “goodness” of MT output—specifically, the number of edits needed to fix the output so that it semantically matches a correct translation.
4、Recently the GALE (Olive, 2005) (Global Autonomous Language Exploitation) research program introduced a new error measure called Translation Edit Rate (TER) that was originally designed to count the number of edits (including
phrasal shifts) performed by a human to change a hypothesis so that it is both fluent and has the correct meaning. This was then decomposed into two steps: defining a new reference and finding the minimum number
of edits so that the hypothesis exactly matches one of the references. This measure was defined such that all edits, including shifts, would have a cost of one. Finding only the minimum number of ed-its, without generating a new reference is the measure defined
as TER; finding the minimum of edits to a new targeted references is defined as human-targeted TER (or HTER).
5、BLEU (Papineni et al., 2002) calculates the score of a translation by measuring the number of n-grams, of varying length, of the system output that occur within the set of references.
6、METEOR (Banerjee and Lavie, 2005) is an evaluation measure that counts the number of exact word matches between the system output and reference. Unmatched words are then stemmed and matched. Additional penalities are assessed
for reordering the words between the hypothesis and reference. This method has been shown to correlate very well with human judgments.
7、TER is defined as the minimum number of edits needed to change a hypothesis so that it exactly matches one of the references, normalized by the average length of the references.
8、Possible edits include the insertion, deletion, and substitution of single words as well as shifts of word sequences.
9、
10、The number of insertions, deletions, and substitutions is calculated using dynamic programming. A greedy search is used to find the set of shifts, by repeatedly selecting the shift that most reduces the number of insertions,
deletions and substitutions, until no more beneficial shifts remain.
11、
12、In both TER and HTER, the majority of the edits were substitutions and deletions.
13、 In an analysis of shift size and distance, we found that most shifts are short in length (1 word) and are
by less than 7 words.
相关文章推荐
- [文献阅读] Bleu: a Method for Automatic Evaluation of Machine Translation
- 在最完整的搜索提示降史上的用户交互的研究——阅读《An Eye-tracking Study of User Interactions with Query Auto Completion》
- 史上最全的搜索下拉提示用户交互研究——读《An Eye-tracking Study of User Interactions with Query Auto Completion》
- 论文阅读:Comparative Study of Deep Learning Software Frameworks( caffe、Neon、TensorFlow、Theano、Torch 之比较)
- 文献阅读笔记——Action Recognition with Stacked Fisher Vectors
- [文献阅读] Decoding Algorithm in Statistical Machine Translation
- Human Action Recognition Using a Modified Convolutional Neural Network(经典文献阅读)
- 《Reducing the Dimensionality of Data with Neural Network》阅读心得
- 论文阅读 Visual Categorization with Bags of Keypoints
- The study of chapter 13 in programming windows with mfc-printing with document and views
- The study of Programming Windows with MFC--print's example
- 论文阅读:《Human Parsing with Contextualized Convolutional Neural Network》ICCV 2015
- [论文阅读笔记] Massive Exploration of Neural Machine Translation Architectures
- [文献阅读] The Alignment Template Approach to Statistical Machine Translation
- The study of Programming Windows with MFC--Common Control
- Human Action Recognition Using a Modified Convolutional Neural Network(经典文献阅读)
- The Study of Programming Windows with MFC--Imagelist and ComboBoxEx
- 【喵の开题报告】Translation of Semantics with Applications
- 《System Service Call-oriented Symbolic Execution of Android Framework with Applications to...》论文阅读笔记
- 阅读笔记4:CSK:Exploiting the Circulant Structure of Tracking-by-detection with Kernels