Training data

Training data for machine translation

Training data is the data used to train a model. In other words, the training algorithm sets the model parameters to fit the training data.

The most common type of training data in machine translation is parallel text data. Monolingual text data is also often used for learning, for instance in language models or back-translating the monolingual data to generate synthetic parallel data.