CoCo4MT 2022

Workshop on Corpus Generation and Corpus Augmentation for Machine Translation

The first workshop on Corpus Generation and Corpus Augmentation for Machine Translation (CoCo4MT) was co-located with AMTA 2022 on 16 September 2022.

It was the first workshop centered around research focusing on corpora creation, cleansing, and augmentation techniques specifically for machine translation.

Topics (not limited):

  • Difficulties with using existing corpora (e.g., political considerations or domain limitations) and their effects on final MT systems,
  • Strategies for collecting new MT datasets (e.g., via crowdsourcing),
  • Data augmentation techniques,
  • Data cleansing and denoising techniques,
  • Quality control strategies for MT data,
  • Exploration of datasets for pretraining or auxiliary tasks for training MT systems.

Keynote speakers

  • Ankur Parikh
  • Graham Neubig
  • Jörg Tiedemann
  • Julia Kreutzer
  • Maria Nadejde


  • Kenneth Church
  • Marine Carpuat

Important dates

01 June 2022 Call for papers released
15 June 2022 Second call for papers
29 June 2022 Third and final call for papers
13 July 2022 Paper submissions due
20 July 2022 Paper submissions due (extended deadline)
27 July 2022 Notification of acceptance
07 August 2022 Camera-ready due
31 August 2022 Video recordings due
16 September 2022 CoCo4MT workshop

Want to learn more about CoCo4MT 2022?

Edit this article →

Machine Translate is created and edited by contributors like you!

Learn more about contributing →

Licensed under CC-BY-SA-4.0.

Cite this article →