Evaluation metric based on n-gram precision

BLEU (BiLingual Evaluation Understudy) is a metric for automatic evaluation of machine translation that calculates the similarity between a machine translation output and a reference translation using n-gram precision.

Its basic unit of evaluation is the sentence.

The closer a machine translation is to a professional human translation, the better it is.

BLEU: a Method for Automatic Evaluation of Machine Translation

BLEU is the standard metric for machine translation evaluation.

BLEU compares machine translation output to a single reference translation.

BLEU treats all n-grams equally.

Because BLEU is not an exactly defined metric, BLEU scores are not comparable. So researchers have created variants that are more concretely defined.


  • BLEUrt
  • M-BLEU

Note: The list is incomplete.

sacreBLEU is a well-known implementation of BLEU.


Want to learn more about BLEU?

Edit this article →

Machine Translate is created and edited by contributors like you!

Learn more about contributing →

Licensed under CC-BY-SA-4.0.

Cite this article →