Quality estimation
Machine translation quality estimation
Machine translation quality estimation (QE or MTQE) or machine translation quality prediction (QP or MTQP) is the task of automatically predicting the quality of machine translation output.
Original | Translation | Quality prediction |
---|---|---|
English July 30th, 2021 | French 30 juillet 2021 | Good |
English This is my home. | Spanish Este es mi inicio. | Bad |
Quality estimation does not require human reference translations, so it is useful for new content. It uses machine learning to provide predictions that are relatively accurate at the segment level.
Quality estimation is fundamentally different than quality evaluation. Quality evaluation metrics like BLEU require human reference translations against which to compare the machine translation output. But for new content, there are no human reference translations available.
Use cases
Quality estimation has offline and production use cases:
- Hybrid translation
- Estimating post-editing effort
- Validating final human translations
- Comparing machine translation systems or translation models
- Filtering training data for machine translation
The main use case is hybrid translation.
Granularity
Quality estimation is most commonly at the segment-level, for example a sentence.
- Word-level
- Phrase-level
- Sentence-level
- Document-level
Sentence-level scores can be aggregated into paragraph-level scores or document-level scores. Word-, phrase- and sentence-level scores can indicate if a machine translation output needs to be post-edited. Document-level scores indicate if a machine translation output can be used without human post-editing.
In 2012, Lucia Specia and Google researcher Radu Soricut organized the first Shared Task on Quality Estimation.
In 2018, Lucia Specia, Carolina Scarton and Gustavo Henrique Paetzold published the book Quality Estimation for Machine Translation. There was research on word-level quality estimation and paragraph-level quality estimation.
In 2020, ModelFront launched a multilingual quality estimation API. Tharindu Ranasinghe released pretrained models. Facebook Research launched unsupervised quality estimation internally.
A growing set of frameworks, models, and systems are generally available.
Frameworks
Frameworks from academia and industry are available as open-source code and models.
The first framework, QuEst, was released in 2013.
Name | Owner | Approach |
---|---|---|
QuEst | University of Sheffield | Feature engineering |
QuEst++ | University of Sheffield | Feature engineering |
DeepQuest | University of Sheffield | Deep learning |
OpenKiwi | Unbabel | Deep learning |
TransQuest | Tharindu Ranasinghe, University of Wolverhampton | Deep learning |
TransQuest also includes pretrained models. The models were pretrained with WMT data.
Providers
ModelFront launched a standalone production system for quality estimation. By 2020, it was generally available and supported more than 10000 language pairs. It is provided as an API, so it can be integrated into other systems and products.
Provider | Product |
---|---|
ModelFront | ModelFront quality prediction |
TAUS | DeMT Estimate API |
There are also providers that offer a quality estimation feature or integration within another product.
Features and integrations
A few machine translation providers have launched generally available features for quality estimation.
Product | Feature | Provider |
---|---|---|
KantanMT | KantanQES | KantanAI |
Omniscien | Translation Confidence Scoring and Quality Estimates | Omniscien |
Integrations
There is a quality estimation integration or connector available for most translation management systems.
Product | Feature | Provider | |||
---|---|---|---|---|---|
translate5 | ModelFront quality prediction plug-in | ModelFront | |||
memoQ | Quality estimates (AIQE) | ModelFront, DeMT Estimate | |||
Crowdin | ModelFront quality prediction | ModelFront | |||
XTM | ModelFront XTM connector | ModelFront | KantanStream | KantanQES | KantanAI |
PhraseTMS | MT quality estimation, ModelFront Phrase connector | Phrase QE | |||
GlobalDoc LangXpert | Effort estimation | ModelFront | |||
Google Cloud Translation Hub | Machine translation quality prediction | Google Cloud Translation Hub - MTQP |
translate5 is open-source.
Internal systems
More companies have researched or launched quality estimation internally. They do not provide quality estimation to others.
- Amazon
- Microsoft
- VMware
- Facebook AI Research
- eBay
- SAP
- MusixMatch
- Wayfair
- Unbabel
- Transperfect
- CrossLang
- Fair Trade Translation
Note: This list is incomplete.
Types
Quality estimation is typically implemented as classification or regression.
Supervised
Supervised quality estimation trains on parallel data that includes human labels or human post-edits.
Unsupervised
Unsupervised quality estimation trains on monolingual data or parallel data only. Supervised quality estimation relies on labeled or post-edited data.
Glassbox
Glassbox approaches are tied to the machine translation system itself. A glassbox system makes a prediction based on the internal variables of the machine translation model. It is like a confidence score.
Blackbox
Blackbox approaches are independent of the machine translation system. They are not necessarily trained on the same data, and can be used with any machine translation system.
Approaches
Feature engineering
Early quality estimation approaches use machine learning with feature engineering.
Examples of specific features are the number of noun or prepositional phrases in the source and target, the number of named entities, etc. Based on these features, a quality estimation model is built using machine learning techniques.
Deep learning
With the rise of deep learning, quality estimation technology resorts to deep learning architectures based on artificial neural networks.
Single-language-pair
Early quality estimation approaches created one model or system per language pair, similar to most machine translation systems at the time.
Multilingual
Multilingual quality estimation uses one model or system for many language pairs, similar to multilingual machine translation systems.