CoCo4MT 2022

Workshop on Corpus Generation and Corpus Augmentation for Machine Translation

It was the first workshop centered around research focusing on corpora creation, cleansing, and augmentation techniques specifically for machine translation.


  • Orlando, Florida


Important Dates

Call for papers released 01 June
Second call for papers 15 June
Third and final call for papers 29 June
Paper submissions due 13 July
Paper submissions due (extended deadline) 20 July
Notification of acceptance 27 July
Camera-ready due 07 August
Video recordings due 31 August
CoCo4MT workshop 16 September

Keynote speakers

  • Ankur Parikh
  • Graham Neubig
  • Jörg Tiedemann
  • Julia Kreutzer
  • Maria Nadejde


  • Kenneth Church
  • Marine Carpuat

Calls For Papers

Topics (not limited):

  • Difficulties with using existing corpora (e.g., political considerations or domain limitations) and their effects on final MT systems
  • Strategies for collecting new MT datasets (e.g., via crowdsourcing)
  • Data augmentation techniques
  • Data cleansing and denoising techniques
  • Quality control strategies for MT data
  • Exploration of datasets for pretraining or auxiliary tasks for training MT systems

Want to learn more about CoCo4MT 2022?

Edit this article →

Machine Translate is created and edited by contributors like you!

Learn more about contributing →

Licensed under CC-BY-SA-4.0.

Cite this article →