So flattening navigation isn’t really breaking news in today’s conversational revolution. What is more interesting, however, is how conversational interfaces today are flattening navigation: through flexible entry points.

A Solution to the Discoverability Problem: Multiple Entry Points

Conversational interfaces are often opaque and provide little or no indication of the functionalities or the architecture of the app as traditional GUIs do. Consequently, one of the biggest problems with conversational interfaces today is that many of them often devolve into fancy command lines.

Users struggle when they are required to remember specific words or phrases to use a conversational interface. In Sabre Labs, we refer to this phenomena as the discoverability problem.

Again, this is not a new problem. Macs have spotlight and Windows now has Cortana where you can directly access programs via a search bar. Theoretically, this is more efficient than navigating to a program’s location and clicking the icon. But most people still access applications through navigation, and it’s not hard to see why.

When I type “internet” into Spotlight, I don’t get Safari or Chrome or Firefox. I see this:

Esc. Esc. Esc. Oh wait, I don’t have that key anymore.

Many VUIs fail for the same reasons that most users prefer not to open up the command line: People don’t remember details — they remember concepts, the big picture. And they’re definitely not going to remember the three commands required to use your app.

The discoverability problem could refer to a couple of scenarios:

Scenario 1. The user is unable to even recall how to invoke the voice app/skill at all, similar to how you forget apps you’ve downloaded on your phone, but this time you have to remember exactly what the name of the app is.

The user is unable to even recall how to invoke the voice app/skill at all, similar to how you forget apps you’ve downloaded on your phone, but this time you have to remember exactly what the name of the app is. Scenario 2. The user is unable to use a conversational interface effectively or at all because there are no clear ways to discover the app’s functionalities and/or the app’s functionalities are difficult to remember.

The solution to both scenarios is to have flexible entry points, whether it be to launching an app or to reach a functionality. Unfortunately, Scenario 1 is outside of our control, as the experience of invoking apps is tied to the platform. All we can do is try to make our apps have a catchy, easy-to-say name.

I’ve mused about smart utterances for app launch—for example, if a user has only have one health app installed, s/he should be able to say anything health related and get that app. But that would require some fancy programming and possibly AI... which leads us to Scenario 2.

Ideally, to solve Scenario 2, flexible entry points would be resolved by a NLP (natural language processing) engine advanced enough to organically parse and understand all users requests as a human would. We’re not there yet.

So for now, we have to mitigate this problem through other methods. On Amazon Alexa platform, this is done through something called intents and utterances—through this model, we can build flexible entry points into our app even with current NLP technology.

A Primer on Intents and Utterances

For readers who are not familiar with building third-party voice apps, i.e. “skills” on Alexa, here’s a quick explanation.

Let’s say that you want to find flights that connect in a certain city. This is an intent, which I named “FlightsByConnection” as shown below. We can express this intent in many different ways.

We could say, “What flights connect in New York City?”, “Find a flight with a layover in NYC,” or even, “I need to connect in New York.” These commands, dubbed utterances by the Alexa platform, are the multiple entry points. As you see, the more utterances there are, the easier it is to access the intent.

This is how most conversational interfaces are structured today. The meaning of voice inputs aren’t processed as a human would, but this model still facilitates navigation and mitigates the discoverability problem by providing multiple, flexible entry points to an intent.

Context-keeping is the secret sauce, but it’s difficult to make.

What’s more difficult—but incredibly powerful—is maintaining context during all this.

Imagine, in the previous scenario, that you actually need to find flights that only connect in JFK. Ideally, you should be able to follow up the previous request with something short and sweet like, “How about just JFK?” and be understood. But without context, this simple query doesn’t mean anything.

Dealing with context isn’t easy. It brings up tough questions like, “How do you even program context?”, “When do you know that a context has switched?”, “How long should this piece of info be maintained?”, “How can we resolve pronouns?”, and other concerns that are being worked on by really smart people.

Context-keeping is not a solved problem, but it’s happening—slowly but surely. Google Home and Alexa are both contextually aware to a degree, as CNET reports:

Amazon has improved Alexa’s contextual awareness to an extent. Ask about the weather on Thursday, and Alexa will respond accurately. Say, “How about Friday” and Alexa will understand you’re still talking about the weather. Google takes contextual awareness one step further. If you ask who plays Katniss Everdeen, then ask “what else is she in?” Google will get both questions right, knowing you mean Jennifer Lawrence when you say “she.”

Context Independent vs. Context-Dependent Commands

Beyond technological constraints, context-keeping itself comes with a host of considerations, one of which is the concept of context-independent versus context-dependent commands.

Context-independent, top-level commands are easy. “Find flights connecting in JFK” as a context-independent command should return all the flights connecting in JFK. But what if this command were context-dependent?

With context as a variable, two identical commands could mean something wildly different: