Tsinghua Natural Language Processing Group (THUNLP) has published a great reading list on GitHub for any budding AI researchers whose New Year’s resolution is to study machine translation. The list compiles the most influential machine translation papers from the past 30 years, spotlighting the 10 most important contributions to the development of machine translation.

The reading list is smartly organized, with detailed categorizations including statistical machine translation, neural machine translation, multilingual language translation, low-resource language translation and others. The prominence of neural machine translation papers is due NMT dominance in the field during the years surveyed.

Below is the full THUNLP Machine Translation Reading List:

10 MUST READS:

Statistical Machine Translation

Tutorials

Word-based Models

Phrase-based Models

Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. Statistical Phrase-Based Translation. In Proceedings of NAACL 2003.

Michel Galley and Christopher D. Manning. 2008. A Simple and Effective Hierarchical Phrase Reordering Model. In Proceedings of EMNLP 2008.

Syntax-based Models

Discriminative Training

System Combination

Evaluation

Neural Machine Translation

Tutorials

Model Architecture

Attention Mechanism

Open Vocabulary and Character-based NMT

Training Objectives and Frameworks

Decoding

Low-resource Language Translation

Semi-supervised Methods

Unsupervised Methods

Pivot-based Methods

Data Augmentation Methods

Data Selection Methods

Marlies van der Wees, Arianna Bisazza and Christof Monz. 2017. Dynamic Data Selection for Neural Machine Translation. In Proceedings of EMNLP 2017.

Transfer Learning & Multi-Task Learning Methods

Meta Learning Methods

Jiatao Gu, Yong Wang, Yun Chen, Kyunghyun Cho, and Victor O.K. Li. 2018. Meta-Learning for Low-Resource Neural Machine Translation. In Proceedings of EMNLP 2018.

Multilingual Language Translation

Prior Knowledge Integration

Word/Phrase Constraints

Syntactic/Semantic Constraints

Coverage Constraints

Document-level Translation

Robustness

Visualization and Interpretability

Linguistic Interpretation

Fairness and Diversity

Efficiency

Pre-Training

Speech Translation and Simultaneous Translation

Multi-modality

Domain Adaptation

Quality Estimation

Automatic Post-Editing

Word Translation and Bilingual Lexicon Induction

Poetry Translation

Marjan Ghazvininejad, Yejin Choi, and Kevin Knight. 2018. Neural Poetry Translation. In Proceedings of NAACL 2018.

To view other papers and resources, please visit THUNLP on GitHub.