N-gram

A short sequence of word types


An n-gram is a short sequence of word types. n is the number of words in the sequence, for example, a 2-gram has two word types. N-grams have many applications in machine translation:

Number of words Common name Sequence notation N-gram language model notation
1 Unigram
2 Bigram
3 Trigram

Example

String in English: "The car has two doors."

Tokens: "The", "car", "has", "two", "doors", "."

Unigrams: "The", "car", "has", "two", "doors", "."

Bigrams: ("The", "car"), ("car", "has"), ("has", "two"), ("two", "doors"), ("doors", ".")


Want to learn more about N-gram?


Edit this article →

Machine Translate is created and edited by contributors like you!

Learn more about contributing →

Licensed under CC-BY-SA-4.0.

Cite this article →