Natural language: The de-facto interface convention for social robotics

Social Robotics: The UX of Natural Language

A very smart and snotty journalist told me that he had spoken with Aldebaran’s Pepper system when he was in Boston. Much to his chagrin, the robot didn’t reply when the journalist spoke to it. “It didn’t understand a word I was saying,” said the journalist. “It simply doesn’t work.” The only word it understood, he said, was “sushi” at which point I realized that he was speaking with the system in English, but the robot was programmed to understand Japanese.

This is a problem of user experience, or UX.

Natural language interfaces are turning into a de-facto interface convention. Just like the GUI overlapped and largely replaced the command line, NLP is now being used by robots, the Internet of things, wearables, and especially conversational systems like Apple’s Siri, Google’s Now, Microsoft’s Cortana, Nuance’s Nina, Amazon’s Echo and others. These interfaces are designed to simplify, speed up, and improve task completion. Natural language interaction with robots, if anything, is an interface. It’s a form of UX that requires design.

The next phase of UX is psychological: Natural Language personalities.

We all use language on a daily basis as our key interface with other people, so expectations of how to talk are as deep as they are diverse. We all seem to have, from time to time, problems talking with one another, so designing talking interfaces is a hard UX job. Topic management, dialogue turn-taking, segue management, association with iconic gestures, and thousands of other aspects of communication sit right in the middle of this emerging design discipline.

Traditionally design, even software design, employs the famous architectural adage of, “Form follows function.” But when it comes to natural language interfaces, and robotics in general, we enter a new kind of design in which function follows form. The form of human interaction becomes the function in making something metaphoric, simple, fun and useful.

When it comes to human-robot interaction – and social robots in particular – the form defines the function. There are at least three reasons for this I’ll list here. All of them are psychological.

Social robots should look like us.

First, a user should identify with the conversational system. Any school age child will tell you that there’s no pragmatic reason to make a computer look anything like a person. The android seems a fool’s design. It’s off-balance, clunky, the sensors are too high, walking is hard to calculate, and when the robot falls, it takes a serious digger damaging the sensors that are perched at the part of the robot that gets the hardest knock when they fall. And what is a head for, anyway?

It turns out that babies (and adults, and school age children too) are wired to recognize even an abstracted human face surprisingly quickly and at a surprisingly early age. Babies, when they’re still wee things, are drawn to facial imagery, even if it comprises just two dots and a curvy line. I tried this on my own piglet when he was 5 months old. I gave him a drawing of a circle with the two dots and a curved line, and another of a square with a single dot and two lines. He went back and forth between the two drawings, but spent at least six times longer staring at the face drawing.

What is even more curious is that 6-month olds are more likely to spend time looking at what you or I might call an attractive face (see e.g. here and here).

So it stands to reason that a social robot should have an attractive face, simply because humans are wired to pay attention to it. And this is part of the metaphor that is driving android design.

But beauty is only skin deep. While we wrestle to escape the uncanny valley of visual appearance, we are learning that there are many uncanny valleys of not just appearance, but sound, color, timing and even psychology. Maybe psychology, most especially.

Social robots should speak like us.

Second, now that NLP technology is generally working well enough to complete end-user tasks and provide contextual training, it’s time to start addressing personality in the same way that we’ve addressed appearance. This means that, just as we need to make sure that we design robots that look like us, we need to also design robots that speak like us.

This simply means that the user experience of natural language must consider psychology and human-like interaction as the core metaphor for the design of social robotics. It’s like making a glove that is shaped like a hand – the tool has to be designed to work with how we’re built. So psychology is key.

‘Birds of a feather flock together’ for some very deeply ingrained reasons associated with trust. We’re accustomed to trusting people that look like us and when we trust someone we actually perceive them to look more like us.

So talking and looking like a human is pretty important for a social robot.

This linking of behavior and appearance is key to user experience. Systems such as those used in healthcare, finances, and other very important interfaces will need to be designed such that they’re reliable, and this means they have to act like the user. Users will scarce be found who are willing to tell their financial data, or discuss private healthcare problems, with some fictional character that looks like a Furby, or a spinning ball of metal. And, to make matters more challenging, the user experience around natural language interfaces will need to address localization issues that include such things as accent, argot, and other manners of speech that are linked to particular regions of a country.

We’ve found in our work at Geppetto Avatars that people are more willing to overlook uncanny elements of design if there are slight cultural variations thrown in. For example we’ve noticed that synthesized voices sound less artificial if they have a slight accent. A synthesized voice with a British accent doesn’t sound quite so synthetic to an American listener. But if you give that same listener a synthesized voice that uses an American accent, the listener is far more likely to notice the synthetic elements. This is simply a UX technique to off-load technical challenges. So UX can serve the role of simplifying not only the interaction, but the technology as well.

Social robots should be polite.

Robots – or other connected systems – that use Natural Language interfaces create spontaneous behavior in the user. Most users actually try to be polite or cooperative, and even start attributing personality to the system (such as humor, aggressiveness, and gender). We all talk to our car, many of us have named it, and that’s how we’re wired. This is something people automatically do, like it or not, with not just natural language systems but almost all forms of media.

In 1996, Reeves and Nass, two famous researchers from Stanford (and authors of The Media Equation), proved that computers elicit social interactions in users. Reeves and Nass designed an experiment in which 22 people come into a lab to work with a computer via natural language text interface. At the end of the session Reeves and Nass asked them to evaluate the computer used and how well it performed. Reeves and Nass found that users had automatic social reactions during the test, and opinions about the personality of the machine they were using.

They ran the test again, this time via natural language voice interface, to make the human-social theme more evident. The test results were the same. They concluded that people are polite to computers, both in speech and writing, and the experiment showed that social rules can apply to media and computers can be social initiators. Participants denied it, but the results said otherwise.

Try it yourself – watch people talk with Siri and they use the word “Please” or “Thank you” more often than you might guess. Then go buy The Media Equation. It’s a great read.

When we design systems that are able to build identification and trust with a user these systems are able to make advice about healthcare and how to use our bodies, or finance and how to use our money. These are powerful systems because they are actually modifying what the user does. The conversational system has a power over the user simply in making advice that is amplified by how we are wired. If ethics is all about power then being polite is the logical interface to ethics.

Interface, whether command line, GUI, or natural language, has always been the most potent and valuable element of a computer design. Interface will continue to be an important and valuable element of robotic design as well. The future of UX is psychological as much as graphical and ethical as much as functional. Ask your car.

UX practitioners and robotics designers need to consider the form and the function, especially when it comes to natural language interfaces. Otherwise, 私達はちょうど悪い寿司の貧しい翻訳があるでしょう。*

[“we’ll just have poor translations of bad sushi.”]