Conversation UI or the Voice User Interface (V.U.I.)

A few years ago, Facebook basically forced most of its users to adopt the Messenger application (although, not for people who use the Facebook paper app,😎). After announcing its intention to build Chatbots into the message platform, Zuckerburg and company seem to be making good on the promise that Messenger would provide a new platform that users will find useful.

Much of Facebook’s F8 and Google’s I/O conference focused on how AI happening inside of a conversation might reshape how we interact with our devices. This trend has been called conversational UI, cognitive learning, and smart chatbots. I prefer the term coined on Relay.FM's Presentable podcast, the V.U.I. or voice user interface.

Amazon’s Echo has demonstrated a market demand for a somewhat sapient assistant that is always listening, and Google’s Home is affirmation that the age of “no interface” as an interface is upon us, fad or not. While Google, Facebook, and Amazon have recently embraced the trend, IBM’s Watson has used this conversational UI for years, albeit in places not visible to the average consumer (beyond Jeopardy, of course.)

Apple’s Siri seems to be a promise of this type of interaction, but the reality of the voice assistant is far from ideal. Apple is said to unveil plans on both Siri and a conversational UI product later this month at WWDC. It’s safe to say that technology companies are flirting with conversational UI as a new way to interact with our devices and the connected world. This has led to doomsaying for the G.U.I., (Graphical User Interface).

While Conversational UI has a bright future, and surely application of a VUI has its merits, it cannot supplant or even really compete with the GUI in terms of ideal user experience.

The Spatial Problem with Conversational UI

Jeff Veen made the point in his presentable podcast this week that most interaction designers already think of UX as a conversation. While this is a certainly high-minded idea, it’s true that interfaces are supposed to represent the clearest way for people to understand how to interact and use a service or application. Jeff goes on in the episode to point out the discoverability problem with a conversational UI. His argument is that users won’t know what they can and cannot do within an application or service if the VUI is the primary way to interact with it.

This discoverability problem is simply a symptom of a larger problem with a VUI, its lack of spatial metaphor. A clear mental model of “where something is” drives a lot of computer-human interaction, and has deep roots in the way humans understand and interact with the world.

John Siracusa has famously bemoaned the Mac’s lack of a spatial finder, while pointing to iOS’s app configuration as the pinnacle of deconstructing computer interaction for “normal people”. Indeed, iOS has always signaled to the user where “home” is, and every app is a different path away from that home state. This metaphor is complete with animations indicating that apps transport you away from the home screen, and the home button shrinks the app back to an icon on your home screen.

Over time, different metaphors for in-app navigation have been experimented with, to varying degrees of success. But most are a play on the theme of transporting a user between places, giving users a physical connection to “where” in the interface they are.

Don’t believe me? Try this thought experiment:

Think of your music library, can you remember where in an album your favorite song sits?

Think of snapchat, which way should you swipe to get to Snapchat stories?

Think of turning on the flashlight on your phone, of selecting the messages app, of how you get back to the home screen.

All of these actions invoke actual spacial relationships which help us remember them and understand them. They probably have you thinking about where your fingers would touch and what you’d see on the screen when you do. This type of interaction is functionally muscle memory. If you had to state each step of these processes in order to access them, the cognitive load to do so would be higher than simply using the app as is.

Use the VUI Where it Makes Sense

Conversational UI is popular for a reason, though, and that’s because VUIs are better at some things than GUIs are. The most obvious times VUIs will be more conveinent for users is a) when the phone is not an immediate option, b) when the user is doing something besides interacting with a device.

The first and most obvious place for VUIs to succeed is in the car. The ‘connected car’ trend is no brighter than when conversational UI is applied inside the vehicle for a driver. A phone should not be an immediate option for someone driving and a driver should not be focused on operating anything besides the vehicle. There are real and important benefits to be realized here, as distracted driving due to our devices seems to be an epidemic in modern society. I hope to see tech companies refocus their VUI technologies inside the car, because this could save lives.

The next obvious place for a VUI is in the home. Specifically, when we are cooking, cleaning, having conversations with family, and doing other things at home besides using our smartphones. This is where Amazon has succeeded with Echo, and where Google aims to win with its ‘Home’ product.

Lastly, I believe VUI will be superior with users who are smartphone averse. This is an increasingly rare population, and it correlates with age. VUI is much easier to operate and appreciate for those who would rather interact with a device without having to understand the spacial metaphors of a smartphone. These users want to simply express their intent and have the computer respond in kind. Some call this humanist computing, and better versions of Alexa and Siri will be popular with older age groups in this category.

After these use-cases the VUI starts to become “as good as” the GUI. You can search the web with a vocal “Ok, Google” or just type a query into a search bar. Part of which you choose will based on your context, part will be based on preference, but neither involve particularly complex interaction. The preference on which way you’ll want to interact with your device will also likely shift as VUIs improve, but only for certain tasks. A good VUI might be able to read a text from my wife about a movie and then buy tickets at a local theater for us, but might not do as well with a request to make a powerpoint for a presentation or edit a movie in iMovie. In those instances, the GUI will always win.

Part of utilizing more powerful VUI, will be to deploy it when it removes friction between the user and the app or service, and knowing when the GUI will better serve the user’s need. In the car, in the kitchen, for a quick question or request, VUI might be a go-to; for everything else, there’s the GUI.

The Incentives of Conversational Interface

There is a great article about how conservational UI won’t replace apps, it will make better apps. This is largely true, and this kind of thinking helps put a stop to “Catastrophizing the End of the GUI”.

The VUI represents a coming competitive advantage for the platform makers that integrate into their platform more than for app makers to integrate into their apps. Facebook wants you to put a Chatbot from your service into their app, not into your own. Google’s platform will allow your app to build into it, but notably won’t send the user to your app, completing the interaction at the OS level.

Some apps will want to pay attention at how to integrate with Facebook and Google’s VUI, and how to integrate good aspects of VUI into their services. Particularly, transactional applications with a physical counterpart, because automating transactions will be the easiest tasks a VUI can help with.

The implicit promise of conversational UI is that a computer can understand a user’s intentions. The two biggest obstacles that stand between a user and a successful conversational UI are 1) the user likely has to know what their intentions are, 2) your intentions are the product that Google and Facebook want, not their realization.

When you open your smartphone, do you always know your intention? Even if you do, could you always express it in an understandable way? If the majority of your interactions with your smartphone begin with a clear goal and end once you have achieved that goal, I’ll eat my hat. People are bad at verbalizing what they intend to do, and even if VUIs get really good at guessing (creepy if they do), people might want to spend time browsing a feed more than they want to achieve a task.

The companies who are so bullish on conversational UI, Facebook and Google, are incentivized in such a way that when they begin understanding your intentions, they have already got what they needed from the interaction. Their products are machines for helping companies advertise to you. Even if their VUIs can achieve the lofty goal of understanding your intentions, these companies are not as incentivized to successfully help you realize that intention. Hopefully, conversion rates will increase for companies who successfully build VUI and where VUIs successfully determine intent. But as of now, the incentives feel unbalanced away from the user and towards data mining.

Conclusion

Can VUIs guide the user without some connection to a spatial world? Maybe. But, it comes back to the reality that a user might not prefer the interaction. The cognitive load that conservational UI eliminates is marginal enough that any distrust or misunderstanding in the VUIs ability to understand and follow through on a user’s intention would default the user back to the GUI.

Until VUIs can anticipate intentions, I believe the GUI is safe. And once the VUI is that capable, I think society will have much bigger issues to worry about than how we interact with consumer electronic companies.