Neural machine translation

Deep learning approaches to machine translation

Neural machine translation (NMT) is a machine translation approach based on machine learning that uses large neural networks to predict the likelihood of correct translations. Like statistical machine translation, neural machine translation is data-driven.

Neural networks

Neural networks use training data to create vectors for every word and its relations, called word embeddings. Words with similar meaning cluster together, and words with more than one meaning appear simultaneously in different clusters.

Cluster₁:

Bank
Lake
River
Stream
Terrain

Cluster₂:

Money
Finance
Credit
Bank
Banking

Neural networks use cluster information to disambiguate the meaning of input words and generate the most relevant translations.

Sequence model

In general, neural machine translation can be seen as a sequence-to-sequence task. Given an input sequence, the system predicts and generates an output sequence. The sequence model arranges a sentence order by calculating the probability of the sequence of words.

Encoder/decoder framework

Neural machine translation architecture consists of an encoder and a decoder.

The encoder analyses the input sequence words and their relations. The result is the representation of the sentence, called context vector. The context vector summarizes the entire input sequence into a single fixed-length vector.

The decoder takes that sequence representation and produces the translation.

Attention mechanism

Single fixed-length vectors are too limited to cram all the information from long sentences.

A solution to this problem is to employ an attention mechanism. An attention mechanism focuses on the input sentence areas that are relevant instead of looking at the complete input sentence. The attention mechanism also learns the alignment between the relevant information.