Voice-based computing is a good option for China. Today typing Chinese on a typical QWERTY keyboard relies on a system called “pinyin,” based on characters’ pronunciation, but since there are four tones in Mandarin and each has a different meaning, the user must painstakingly select the right character from a drop-down menu after typing the pronunciation. A common syllable like “yi” can correspond to 60 or more frequently used Chinese characters. Some input methods can prioritize the most likely character according to the context, but they are not always accurate. Unsurprisingly, users of mobile technologies like the popular WeChat communication app tend to leave verbal messages for one another, rather than the typed texts typical in the U.S.

In China today, voice assistant technology works by turning a user’s voice commands into text and generating a response based on the meaning of the text. That process works pretty well for task-based commands—check the weather or look for the English translation of a particular Chinese word—but it cannot sustain a back-and-forth conversation about multiple subjects.

Solving conversational computing will require overcoming some of the challenging complexities of the Chinese language. In Chinese, for example, the same characters arranged in different order mean different things, and even when arranged in the same order, they can have different meanings depending on what comes before or after them. In addition, written Chinese does not have spaces naturally dividing words as English does. So Chinese natural-language-processing researchers must teach their algorithms where to insert spaces in order to establish the proper meaning of a particular combination of characters. The absence of Chinese verb tenses—there are no distinctive forms for past, present, or future—also makes it challenging for machines to decipher the timeline of a sequence.

Chinese natural-language-processing researchers are tackling other challenges, too: Numerous dialects exist, some of which are mutually incomprehensible, and the same expression can mean different things in different contexts.

Zhiyong Wu, an associate professor at Tsinghua University who studies natural-language understanding, notes that for computers to truly understand the intent of a human speaker and communicate appropriately, they will have to pick up subtle clues such as intonation and stress. They will also have to understand emotions, since humans’ decision making is not based solely on logic, notes Jia Jia, an associate professor at Tsinghua University who studies social affective computing.

To make its system smarter, Baidu introduced a “trainer” mode in its platform this year to allow software developers to contribute language data in real time through a built-in annotator bot. The bot receives developer feedback (such as the explanation of a query the system didn’t understand the first time), learns from that, and then corrects the system.

One advantage Chinese researchers have as they try to solve these problems is a large quantity of data. The neural networks that underpin the language understanding of today’s computers require large amounts of data to train. The more data a company has, the smarter its neural networks will become, and companies like Baidu and Alibaba have the benefit of vast user bases. As of the end of 2016, Baidu claimed 665 million monthly active mobile users, and as of March this year, Alibaba had 507 million mobile monthly active users.

But Gang Wang, a scientist at Alibaba’s A.I. Lab, says researchers will have to design neural networks that don’t need a lot of data to become more efficient at learning language. In the real world, people express the same meaning in different ways, and it’s impossible to teach the computer every possible expression, he notes. In his previous role as an academic researcher, he and his colleagues came up with a method for teaching computers to understand a subject when very little data is available: use data from related subjects. For example, to train a neural network to understand texts in sports medicine, you could draw upon data from sports and data from medicine. The approach is not as good as using organic data, Wang notes, but when that’s lacking, it does make it possible to train neural networks on a topic.

Ultimately, what will make a voice assistant succeed in China is its content and services, says Chenfeng Song, founder of Ainemo, a startup that makes a voice-activated home assistant robot called Little Fish that went on sale in June. Song plans to gradually build educational and health-care programs into his company’s home assistant. Little Fish uses the DuerOS conversational platform. Voice, Song notes, is a way to deliver content to people who cannot access the Internet very well through desktop computers and smartphones, especially the elderly and young children.