Microsoft Research Asia (MSRA) has been dubbed the “Whampoa Academy for AI” in reference the elite Chinese military school. MSRA is a bootcamp for NLP research and has trained more than 500 interns, 20 PhDs and 20 postdocs over the past two decades.

MSRA Assistant Managing Director and President of the Association for Computational Linguistics (ACL) Dr. Zhou Ming shared his thoughts with Synced regarding recent research progress and future development trends.

SYNCED: According to Marek Rei’s count, in 2018 you co-authored the most number of papers in the NLP+ML field. Could you familiarize us with your team’s work?

Dr. Ming Zhou: Last year was a big harvest for our team with several breakthroughs:

In reading comprehension (MRC), we won first place in both SQuAD 1.1 and SQuAD 2.0. Our submission surpassed human level for the first time in the SQuAD 1.1 test set in January 2018, and just recently beat others’ performance and won first place in SQuAD2.0. In addition, we did the same for an interactive, multi-round MRC system CoQA.

For Neural Network Machine Translation (NMT), we reached a level comparable to human translation on the Chinese-English general news report test set. We also proposed joint training and dual learning methods to make full use of a monolingual corpus, and invented consistency specifications and improved network decoding capabilities.

For grammar check we used neural network coding and decoding technology and adopted a technique similar to neural network machine translation. These made sound improvements to grammar check, and now automatically generate a training corpus and decode it one by one. Our results ranked first in the three current public grammar checking review sets. The related ACL articles we published have attracted lots of attention.

There’s also speech synthesis based on neural networks (TTS). Our team worked with Microsoft’s Voice Products Division to apply neural network machine translation to TTS for the first time, greatly improving TTS performance. Our technology performs best in the relevant evaluation sets.

We also continued to work with Microsoft Xiaoice, adding creative engines to the original chatbot, enabling it to write poems, compose music, news, etc.

SYNCED: You have been involved in the organization and management of ACL conferences and are chairman this year. What breakthroughs have you seen in NLP?

Dr. Ming Zhou: First, neural networks have penetrated NLP and NLP modeling, learning, and reasoning methods are widely used in NLP tasks. Second, pre-trained models like BERT are popular, reflecting the universal prowess of large-scale linguistic data; third, low-resource consuming NLP tasks have been further developed.

Also, I think advancements of China’s NLP research have attracted worldwide attention. Due to efforts of the Chinese Computer Society and the Chinese Information Society coupled with the R&D at many schools and companies, the number of publications sent to ACL, EMNLP, COLING, etc. has ranked second in the world for the last five years.

The ACL also set up the Asian ACL Branch (AACL). I am grateful for the support of the ACL Executive Committee and the support of NLP colleagues in various countries and regions in the Asia Pacific region. The establishment of the AACL means that Asia can make headlines in NLP development in North America and Europe. After the establishment of the AACL Asia Chapter, activities like the ACL conference can be organized in Asia to improve the level of NLP development in Asia.

SYNCED: What lies ahead for NLP research?

Dr. Ming Zhou: First, pre-trained models. Everyone has talked about pre-trained models in the past year, and all tasks used BERT when it was released. So I expect in the coming year that pre-trained models will continue to heat up, in areas such as training better pre-trained models and pre-trained models for specific tasks.

Second, the study of low-resource NLP tasks. How should we do learning, modeling, and reasoning in the absence of a corpus or a small corpus? I foresee further development of semi-supervised and unsupervised learning methods, and the use of transfer learning, multi-task learning and so forth to skillfully graft or borrow models from other languages, tasks, or open fields into new languages, tasks, or fields, and reflect it in specific tasks such as machine translation, reading comprehension, Q&A, etc.

Third, knowledge, common sense-based applications. It’s about accumulating knowledge and common sense and subtly integrating them into an AI model, and then evaluating the end effects.

SYNCED: Your team has published many papers on multimodal fusion. What is the current research progress in this field?

Dr. Ming Zhou: Multimodal fusion is very interesting. Thanks to the development of neural networks, the encoding and decoding of multi-modality (language, text, image, video) content can be unified under the same framework. In the past, due to the intrinsic differences in semantics, it was really not clear how the results of language and image analyses could be combined; now a model can be used to model, encode, and decode, actualizing end-to-end learning without impediments.

There area also interesting applications such as capturing, which is to understand an image or video and then use a paragraph of text to describe it. Our group is doing some common sense knowledge research for CQA, where the machine asks questions or provides answers about video or images.

The third is to use the result of image recognition as the input of a natural language system to do the work of writing poetry, lyrics or composing music. Microsoft Xiaoice wrote poetry as well. The user uploads a picture, Xiaoice understands the picture, and expresses it with a few keywords. It then elaborates them and generates a lyric or a poem.

SYNCED: MSRA celebrated its 20th birthday last year, and you have also been with Microsoft for 20 years. Can you share some particularly memorable events from this time?

Dr. Ming Zhou: I am honored to have gone through all the processes led by the first Dean, Kai-Fu Lee, and now President Xiaowen Hong. I have been a witness, a beneficiary, and a learner over the past 20 years. Joining Microsoft from Tsinghua I found that Microsoft has a strong product and marketing team, and a strong R&D atmosphere at Microsoft Research and MSRA. I learned about research methods, teamwork, product awareness, and team cooperation as a researcher.

As for projects, we started with Microsoft’s keyboard input method in Chinese and Japanese. In 2004, I started work on the Microsoft New Year Couplet. From 2008 to 2012, we did the Bing Dictionary. In 2012, Microsoft Research founder Rick Rashid demonstrated a real-time voice machine translation system at the “21st Century Computing Conference”. In the past few years we have worked a lot with Microsoft Xiaoice, neural network machine translation, machine reading comprehension, and so forth. Each project has its own characteristics.

However, I want to project the perspective from the pure research project itself to the broader world of NLP development. For the past 20 years, Microsoft Research has played a unique role in promoting global NLP, especially China’s NLP. A responsible company should not only think of itself, but also think about whether it can positively help the development of its field and help its countries and regions develop in this field. That is, become a meaningful contributor. After all, when Microsoft China Research Institute (later renamed Microsoft Asia Research Institute), was founded, there was only one ACL article from China, written by Tsinghua University Professor Changning Huang’s team.