CoCo4MT 2023

Name: CoCo4MT 2023
Start: 2023-09-04
End: 2023-09-05
Location: Macau Special Administrative Region, CN

Workshop on Corpus Generation and Corpus Augmentation for Machine Translation

Location

Macau Special Administrative Region, CN

Important Dates


Call for papers released	18 May
Shared task release of train, development and test data	19 May
Shared task release of baselines	25 May
Second call for papers	05 June
Third and final call for papers	20 June
Paper submissions due	16 July
Shared task deadline to submit results	16 July
Notification of acceptance	20 July
Shared task system description papers due	20 July
Camera-ready due	31 July
CoCo4MT workshop	04 September

Keynote speakers

Manuel Mager, Applied Scientist at AWS AI Labs
Jack Halpern, CEO at The CJK Dictionary Institute
Marta Costa-jussà, Research Scientist at Meta AI

Panel

Silvio Amir, Northeastern University

Schedule


9:00	Opening remarks
9:15	Panel Responsible Low-Resource MT Silvio Amir, Northeastern University Manuel Mager, AWS AI Lab
10:00	☕️
10:30	Invited talk Morphological Segmentation of Polysynthetic Languages Manuel Mager, AWS AI Lab
11:00	Shared task Introduction and Finding Anaya Ganesh, University of Colorado Boulder
11:15	Shared task Williams College's Submission for the Coco4MT 2023 Shared Task Alex Root, Mark Hopkins
11:35	Shared task The AST Submission for the CoCo4MT 2023 Shared Task on Corpus Construction for Low-Resource Machine Translation Steinþór Steingrímsson
12:00	🍴
14:00	Invited Talk Introducing Large-Scale Synthetic Corpora Jack Halpern, The CJK Dictionary Institute
15:00	Paper 2 - Do Not Discard – Extracting Useful Fragments from Low-Quality Parallel Data to Improve Machine Translation Steinþór Steingrímsson, Pintu Lohar, Hrafn Loftsson, Andy Way
15:30	☕️
16:00	Paper 3 Development of Urdu-English Religious Domain Parallel Corpus Noor e Hira, Sadaf Abdul Rauf
17:00	Invited Talk Beyond Semantic Evaluation in SeamlessM4T - Massively Multilingual & Multimodal Machine Translation Marta Costa-jussà, Meta AI
17:45	Closing remarks

Call for papers

CoCo4MT sets out to be a workshop centered around research that focuses on corpora creation, cleansing, and augmentation techniques specifically for machine translation. We hope that submissions will provide high-quality corpora that is available publicly for download and can be used to increase machine translation performance thus encouraging new dataset creation for multiple languages that will, in turn, provide a general workshop to consult for corpora needs in the future.

Topics (not limited):

Difficulties with using existing corpora (for example, political considerations or domain limitations), and their effects on final machine translation systems
Strategies for collecting new machine translation datasets (for example, via crowdsourcing)
Data augmentation techniques
Data cleansing and denoising techniques
Quality control strategies for machine translation data
Exploration of datasets for pretraining or auxiliary tasks for training machine translation systems

CoCo4MT 2023

Workshop on Corpus Generation and Corpus Augmentation for Machine Translation

Location

Links

Important Dates

Keynote speakers

Panel

Schedule

Call for papers