Link Search Menu Expand Document

CoCo4MT 2023

Workshop on Corpus Generation and Corpus Augmentation for Machine Translation

The second workshop on Corpus Generation and Corpus Augmentation for Machine Translation (CoCo4MT) will be co-located with the MT Summit 2023 conference from 4 September to 5 September, 2023.

CoCo4MT sets out to be a workshop centered around research that focuses on corpora creation, cleansing, and augmentation techniques specifically for machine translation. We hope that submissions will provide high-quality corpora that is available publicly for download and can be used to increase machine translation performance thus encouraging new dataset creation for multiple languages that will, in turn, provide a general workshop to consult for corpora needs in the future.

Topics (not limited):

  • Difficulties with using existing corpora (for example, political considerations or domain limitations), and their effects on final machine translation systems
  • Strategies for collecting new machine translation datasets (for example, via crowdsourcing)
  • Data augmentation techniques
  • Data cleansing and denoising techniques
  • Quality control strategies for machine translation data
  • Exploration of datasets for pretraining or auxiliary tasks for training machine translation systems (CoCo4MT)

Important dates

18 May 2023 Call for papers released
19 May 2023 Shared task release of train, development and test data
25 May 2023 Shared task release of baselines
05 June 2023 Second call for papers
20 June 2023 Third and final call for papers
05 July 2023 Paper submissions due
05 July 2023 Shared task deadline to submit results
20 July 2023 Notification of acceptance
20 July 2023 Shared task system description papers due
31 July 2023 Camera-ready due
4-5 September 2023 CoCo4MT workshop

Edit this article →

Machine Translate is created and edited by contributors like you!

Learn more about contributing →

Licensed under CC-BY-SA-4.0.

Cite this article →