Voice experiences are still in their infancy. Even though we’ve been dreaming them up and playing them out in canned ways for sci-fi entertainment for years, honest to goodness voice interfaces have taken baby steps in their evolution…until now. As discussed in my previous post, the democratization of the voice experience through the introduction of the Alexa Skills Kit and Amazon Echo have provided the masses with the opportunity to flex their voice muscles for the first time.

This evolution event has tremendous potential to speed the development of rich voice experiences as brands, institutions, organizations, students, and individuals test the waters and help hone the process.

The Promise & Potential…and Pain Points

As with anything new and adventurous, a voice-driven world is an exciting thing. It turns our day-to-day interactions with a machine into a human-formed conversation. Voice is faster, less cumbersome, universal and can improve effectiveness by removing noise. There is much promise wrapped up in the ease of voice experiences. Allowing humans to converse naturally with machines is more efficient than typing every request. It speeds delivery of an answer, helps identify choices and allows us to take more on as the mechanics of decision making and commands are delegated.

The problem is that all of those great things can become a wash if the experience is bad. The user can simply walk away and stick with what’s comfortable, thus negating progress and pushing us backward. If the wall comes down and the wizard is revealed to be a poorly constructed mass of wires and emptiness, then we’ve lost the thrill, and likely lost the user. It’s easy for the user to just say, “great idea, we’re just not ready for this yet.”

Why is it so easy to throw one’s hands up that quickly? Simply put, with voice:

it’s easier for the user to get lost

it’s simple to walk away, to go back to what we’re familiar with if it’s too difficult

it’s possible to try to push the user into cumbersome activities or processes

Natural Language Understanding and Automated Speech Recognition are getting better every day, but there is a big difference in deciphering what’s being asked and in turn providing the proper answer. Apple’s Siri and Google through Google Now had been at the forefront of this experimentation with voice. They have also experienced their fair share of challenges in honing the process of giving users what they ask for, with a study in late 2014 pointing out that Siri was only able to fully answer queries 53% of the time (in comparison to 88% for Google Now). Learning however comes fast in voice. Just a year later, in late 2015, other studies point to rising accuracy ratings for Google Now and Siri, with Cortana falling flat in comparison.

While those providers look to raise their efficacy results, a new player has emerged and with it, a new set of voice experience creators has entered the world of voice. As the proverbial “lid” is removed and developers, brands and individuals take advantage of Amazon’s democratization of the voice experience, it becomes more important that these creators focus on how to get users to the answers they seek faster and with clarity and guidance.

The Plan

So, how can we, as designers and developers of voice experiences, deliver something of value that is also easy to use. As we’ve begun our work crafting voice experiences at RAIN (www.rain.agency), we’ve established the following structure:

Establish a strategy and ensure that we’re building an experience with purpose

and ensure that we’re building an experience with purpose Construct a path and account for all possible areas where the user can get lost

Provide tools that guide the user and help them way-find as they navigate

and help them way-find as they navigate Push the user back on track when they deviate from our planned path

when they deviate from our planned path Keep it simple by building a foundation that is scalable and easy for the user to adapt to as we scale.

We’ve also created a process for our teams to ensure that we avoid getting caught up in the excitement and instead approach skill planning with a strategic lens. Our process front-loads the project with 65–70% of the work being allocated to Voice Experience Planning & Production (VX). This VX work includes:

Establishing the initial strategy for the skill, including definition of an MVP as well as long-term roadmap for the skill’s evolution. We prefer a Crawl > Walk > Run approach championed by Amazon, which allows for a natural evolution that is good for the brand/creator as they get started, as well as the user as they learn the environment.

for the skill, including definition of an MVP as well as long-term roadmap for the skill’s evolution. We prefer a Crawl > Walk > Run approach championed by Amazon, which allows for a natural evolution that is good for the brand/creator as they get started, as well as the user as they learn the environment. Develop User Stories that detail all possible interactions (what does it do? & what doesn’t it do?) from the main skill flow to the help menu and associated Alexa-app card, as well as any other unique areas of user-engagement.

that detail all possible interactions (what does it do? & what doesn’t it do?) from the main skill flow to the help menu and associated Alexa-app card, as well as any other unique areas of user-engagement. Design a User Flow that facilitates the path from launch to completion and everywhere in between.

that facilitates the path from launch to completion and everywhere in between. Devise Use Cases that identify paths and edge cases.

that identify paths and edge cases. Write Scripts for each use case so that a full set of responses can be developed and to help identify any flaws with the previous planning.

for each use case so that a full set of responses can be developed and to help identify any flaws with the previous planning. Take the Scripts into VX Production identifying Intents (what is the user trying to to?), associated Utterances (what is the user saying to trigger an intent?), while fleshing out responses into brand tone and voice.

identifying Intents (what is the user trying to to?), associated Utterances (what is the user saying to trigger an intent?), while fleshing out responses into brand tone and voice. Finalize any other aspects of the interaction such as content for cards delivered to the user via the Alexa-app, emails triggered by the interaction, or other unique digital media that is conveyed to the user.

While a relatively new process, instituting this structure has allowed us to set standards and scale our teams quickly. We’re honing some of the details as we go, but these tentpoles have proven strong and are able to be managed from smaller personal skills to larger enterprise solutions. It has also ensured that we set a line in the sand for the quality of our voice experiences that matches the high standards we place on all of our work, regardless of medium.

Coming Up Next…

Thank you for continuing to follow our journey into the world of voice experiences. If you haven’t had a chance to read the intro articles you can find them here. In the next article we’ll take a step back and explore the evolution of interactions that have brought us from the peck-and-hunt days of information gathering, through the evolution event that was the introduction of search, and now to a place where getting what we want is as close as the tip of our tongue. Till then, take care, and as always, if you have a question, feel free to ask…

☞ If you enjoyed this please consider clicking or tapping the “︎❤”