Facebook researchers have introduced two new methods for pretraining cross-lingual language models (XLMs). The unsupervised method uses monolingual data, while the supervised version leverages parallel data with a new cross-lingual language model. The research aims at building an efficient cross-lingual encoder for sentences in different languages within the same embedded space — a shared-coding-space approach that provides advantages for tasks such as machine translation.

Research results show advanced efficiency in various cross-language comprehension tasks and state-of-the-art results on cross-lingual classification, unsupervised and supervised machine translation.

The Facebook XLM project contains code for:

Language model pretraining:

Causal Language Model (CLM) — monolingual

Masked Language Model (MLM) — monolingual

Translation Language Model (TLM) — cross-lingual

Supervised / Unsupervised MT training:

Denoising auto-encoder

Parallel data training

Online back-translation

XNLI fine-tuning

GLUE fine-tuning

XLM also supports multi-GPU and multi-node training.

Generating cross-lingual sentence representations

The project provides sample code that can quickly obtain cross-language sentence representations from pretrained models. These cross-lingual sentence representations are useful for machine translation, calculating sentence similarities, or implementing cross-lingual language classifiers. The examples provided by the project are mainly written in Python 3, and require support from the Numpy, PyTorch, fastBPE, and Moses libraries.

To generate cross-language sentence representations, the first step is to import code files and libraries and load the pre-training model:

Next, build a dictionary, update parameters, and build a model:

The following is a list of cases in BPE format (based on the fastBPE library), where researchers extracted sentence representations based on the pretraining model:

The last step is creating a batch and completing forward propagation to produce the final sentence embedding vector:

The final output tensor shape (sequence_length, batch_size, model_dimension) can be further fine-tuned to complete 11 NLP tasks or XNLI tasks in GLUE.

Researchers report their unsupervised method achieved a score of 34.3 BLEU on WMT’16 German-English, bettering the previous best approach by more than 9 BLEU. On supervised machine translation, performance rose by more than 4 BLEU points on the WMT’16 Romanian-English to establish a new state-of-the-art score of 38.5 BLEU.

The paper Cross-lingual Language Model Pretraining is on arXiv.