The General Language Understanding Evaluation (GLUE) benchmark is widely used to evaluate Natural Language Processing (NLP) models. Although GLUE includes a range of English sentence-pairing, word prediction and other NLP tasks, it cannot evaluate the performance of Chinese NLP models.

Now, a group of NLP researchers and enthusiasts, including graduates from Tsinghua University, Peking University, and Zhejiang University, have introduced ChineseGLUE, a benchmark designed to encourage the development and assessment of Chinese language models.

GLUE was introduced in 2018 by researchers from New York University, University of Washington and DeepMind. Since then, new pretrained language models such as Google’s BERT have rapidly improved performance in Natural Language Understanding (NLU), a NLP research area with a focus on machine reading comprehension through sentiment analysis and grammatical judgment, etc.

This April the team behind GLUE updated the benchmark to SuperGLUE, which retained two GLUE tasks and added five more challenging language understanding tasks, improved resources, and a new public leaderboard. While that was welcome news for English NLU researchers, it did not help those working on Chinese.

The new ChineseGLUE benchmark fills that gap with:

Benchmarking of multiple Chinese-language tasks with different degrees of difficulty, including sentence-pairing, etc.

An open public leaderboard for tracking performance.

A baseline model that includes the starting code and a pretraining model available in TensorFlow, PyTorch, Keras and PaddlePaddle.

A raw corpus for language modeling, pretraining or generative task. The corpus contains around 10G of data and is expected to grow to 100G by the end of 2020.

ChineseGLUE currently covers the following datasets and tasks: LCQMC Colloquial Description of Semantic Similarity Task; XNLI Language Inference Task; TNEWS Toutiao Chinese News (short text) Classification; INEWS Internet Sentiment Analysis Task; DRCD Traditional Chinese Reading Comprehension Task; CMRC2018 Simplified Chinese Reading Comprehension Task; BQ Intelligent Customer Service Question Matching, MSRANER Named Entity Recognition, and THUCNEWS Long Text Classification.

ChineseGLUE is available on GitHub.