CoCo4MT 2023
Workshop on Corpus Generation and Corpus Augmentation for Machine Translation
The second workshop on Corpus Generation and Corpus Augmentation for Machine Translation (CoCo4MT) will be co-located with the MT Summit 2023 conference from 4 September to 5 September, 2023.
CoCo4MT sets out to be a workshop centered around research that focuses on corpora creation, cleansing, and augmentation techniques specifically for machine translation. We hope that submissions will provide high-quality corpora that is available publicly for download and can be used to increase machine translation performance thus encouraging new dataset creation for multiple languages that will, in turn, provide a general workshop to consult for corpora needs in the future.
Topics (not limited):
- Difficulties with using existing corpora (for example, political considerations or domain limitations), and their effects on final machine translation systems
- Strategies for collecting new machine translation datasets (for example, via crowdsourcing)
- Data augmentation techniques
- Data cleansing and denoising techniques
- Quality control strategies for machine translation data
- Exploration of datasets for pretraining or auxiliary tasks for training machine translation systems
sites.google.com/view/coco4mt (CoCo4MT)
Important dates
18 May 2023 | Call for papers released |
19 May 2023 | Shared task release of train, development and test data |
25 May 2023 | Shared task release of baselines |
05 June 2023 | Second call for papers |
20 June 2023 | Third and final call for papers |
05 July 2023 | Paper submissions due |
05 July 2023 | Shared task deadline to submit results |
20 July 2023 | Notification of acceptance |
20 July 2023 | Shared task system description papers due |
31 July 2023 | Camera-ready due |
4-5 September 2023 | CoCo4MT workshop |