A short sequence of word types

An n-gram is a short sequence of word types. n is the number of words in the sequence, for example, a 2-gram has two word types. N-grams have many applications in machine translation:

Number of words Common name Sequence notation N-gram language model notation
1 Unigram
2 Bigram
3 Trigram


String in English: "The car has two doors."

Tokens: "The", "car", "has", "two", "doors", "."

Unigrams: "The", "car", "has", "two", "doors", "."

Bigrams: ("The", "car"), ("car", "has"), ("has", "two"), ("two", "doors"), ("doors", ".")

Want to learn more about N-gram?

Edit this article →

Machine Translate is created and edited by contributors like you!

Learn more about contributing →

Licensed under CC-BY-SA-4.0.

Cite this article →