Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean
Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, Jeffrey Dean
Machine translation metrics automatically assess quality of the machine translation output. There are two types of metrics: quality evaluation and quality estimation.
Quality estimation metrics do not rely on human (reference) translation.
Quality evaluation
Similarity-based metrics
These metrics evaluate similarity between machine translation and reference translation. There are two types of this similarity:
n-gram matching-based similarity
embedding-based similarity
n-gram matching metrics
These metrics evaluate similarity based on hand-crafted features and rules. For example, a metric can count the number and fraction of n-grams that appear in both the machine translation and the human translation.