Like a phone call: XiaoIce, Microsoft’s social chatbot in China, makes breakthrough in natural conversation

A user tries out the new functionality in XiaoIce, Microsoft’s social chatbot in China. Photo courtesy of Microsoft.

When people interact with most personal digital assistants or chatbots today, the experience is a lot like speaking into a walkie-talkie or texting: First one party says or writes something, and then the other party digests that information and responds.

It’s effective, but Li Zhou, engineer lead for XiaoIce, Microsoft’s wildly popular artificial intelligence-powered social chatbot in China, notes that it has one big drawback.

“People don’t actually talk that way,” Zhou said.

Instead, he notes, when most people are on the phone or chatting in person, they are both talking and listening at the same time – often predicting how the other person might finish a sentence, and maybe interrupting someone when appropriate or breaking an awkward silence to offer a new thought based on the information they are gathering.

Now, Microsoft believes it has created the first technological breakthrough that can allow people to have a conversation with an AI-powered chatbot that is more like that natural experience a person might have when talking on the phone to a friend.

The company recently incorporated these advances into XiaoIce, a social chatbot that has more than 200 million users in Asia, and it is working to apply the same breakthroughs to other social chatbots including Microsoft’s Zo in the United States.

In telecommunications parlance, the breakthrough allows XiaoIce to operate in “full duplex” – that’s a term that refers to the ability to communicate in both directions simultaneously, like a telephone call. It differs from “half duplex,” which is more like the walkie-talkie experience in which only one person can talk at a time.

Zhou said the new update, which Microsoft calls “full duplex voice sense,” also expands XiaoIce’s ability to predict what the person she is talking with will say next. That helps her make decisions about both how and when to respond to someone who is chatting with her, a skill set that is very natural to people but not yet common in chatbots.

“This is the art of conversation that people use in their daily life,” Zhou said.

Taken together, full duplex voice sense reduces the unnatural lag time that can sometimes make interactions with chatbots feel awkward or forced.

“This really speeds up her responses to be much more natural,” said Ying Wang, a Microsoft director who oversees Zo.

In addition, the new technology means that users don’t have to use a “wake word” – usually, the chatbot’s name – every time they respond during conversations.

The advance builds on some other skills XiaoIce has developed, such as the ability to pause one thing she’s doing – telling you a story, for example – so she can do something else, like turn on a light. She can then remember to go back to telling the story – again, much like a person can switch topics in a conversation for a bit but then return to the original topic.

Di Li, Microsoft’s general manager for XiaoIce, said all these improvements are part of Microsoft’s effort to build AI-powered social chatbots that understand people’s emotional as well as intellectual needs. That’s core to the overall goals for XiaoIce, Zo and Microsoft’s other social chatbots throughout the world, including Ruuh in India and Rinna in Japan and Indonesia.

Unlike productivity-focused assistants such as Cortana, Microsoft’s social chatbots are designed to have longer, more conversational sessions with users. They have a sense of humor, can chitchat, play games, remember personal details and engage in interesting banter with people, much like you would with a friend.

Li noted that full duplex voice sense is the type of advance that helps make those types of conversations successful.

“Because it’s very natural, it makes the user feel very relaxed,” he said.

Related:

Allison Linn is a senior writer at Microsoft. Follow her on Twitter.