Evaluation metric based on n-gram precision
BLEU (BiLingual Evaluation Understudy) is a metric for automatic evaluation of machine translation that calculates the similarity between a machine translation output and a reference translation using n-gram precision.
Its basic unit of evaluation is the sentence.
The closer a machine translation is to a professional human translation, the better it is.
BLEU is the standard metric for machine translation evaluation.
BLEU compares machine translation output to a single reference translation.
BLEU treats all n-grams equally.
Because BLEU is not an exactly defined metric, BLEU scores are not comparable. So researchers have created variants that are more concretely defined.
Note: The list is incomplete.
sacreBLEU is a well-known implementation of BLEU.