Attention
A mechanism for improving the encoder-decoder accuracy in machine translation
The attention mechanism or self-attention mechanism is a way for neural machine translation to focus on the most important parts of the input sequence.
Motivation
In an encoder-decoder architecture, the encoder takes the input sequence and creates a fixed-length context vector, also called a thought vector. The context vector represents the summary of the entire input sequence. The decoder then uses this vector to generate the output. One limitation of this design is that the system cannot retain information from longer input sequences.
The attention mechanism addresses this issue. It weighs the importance of different parts of the input sequence and focuses on the relevant subsets instead of using the entire input equally.
Process
- Score encoder hidden states depending on the relevant input word for each decoding step
- Produce a fixed-size context vector based on the weighted sum of all encoder hidden states
- Feed the context vector into the decoder at each time step
Approaches
- Global attention
- Local or window-based attention
While global attention considers all hidden states to generate the context vector, local attention takes only a subset.