Pedro Lima | Mar 21, 2017 by:| Mar 21, 2017

As a developer, I’m always excited to dive deep into the code on new technologies. And we at ArcTouch will often take time outside of client projects to explore emerging platforms with a simple purpose: to learn. But it’s always better when we can build something that solves a real problem.

That’s what happened for me recently, when I built my first Alexa skill for Amazon Echo. “Skills” are like apps for the Amazon Echo, and developers can create custom skills, which Echo owners can download and install on their device from the Alexa Skills Store. The difference is, whereas apps primarily have a visual interface, skills have a voice-based interface, where you speak with the device to interact — and hence we call them “conversational apps” or voice apps.

My personal goal for this project was to learn the ins and outs of the platform. But the idea for the voice app I built was born out of a problem our account team members had experienced when traveling to meet with our clients — the annoyance and unpredictability of airport security lines.

The truth is, everyone hates the unknowns of security lines. A survey by SITA found that the negative emotion of travelers peaked when going through security — a statistic that probably surprises no one but affirms my own hatred for the process.

Turns out, this is a problem I could help chip away at by building an Alexa skill that allows people to check security lines before they head to the airport. I’m happy to share that this application, the Airport Security Line Wait Times skill, is now available in the Amazon Skills Store. This skill provides estimated wait time information at security checkpoints in more than 450 U.S. airports.

If you have an Amazon Echo, try it out and let us know what you think — I hope you find it useful!

Besides building the skill, I also wanted to share what I learned about Alexa in the process. We at ArcTouch are big believers in the future of voice applications (make sure to check out our recent post on Google Home). So, here’s a recap of how I went about building the Airport Security Line Wait Time skill.

Basic setup with AWS Lambda

To get started, I needed to figure out how (and where) the logic of my application would run. I chose AWS Lambda, a serverless computing platform that’s part of Amazon Web Services. I chose this because it was the simplest way to get a skill up and running. Doing it through AWS Lambda meant that Alexa would send JSON requests to the server and it would process those requests on demand — rather than requiring a more complicated (and expensive) solution that was continuously running. The alternative is to set up your own server, and write all the server code, but that was a much bigger project than I was looking for. A detailed step-by-step guide for using Lambda can be found here.

At the AWS console, you create the Lambda function, which can be written in C#, Java, JavaScript (Node.JS) and Python. For our skill I decided to go with JavaScript. The main reason I chose JavaScript was because most of the examples I found in my research used the same language. Because time was short (I had to get back to client work eventually!), I grabbed an open source code sample here (thank you, @deegels!) and built my application on top of it.

It is possible to use C#, Java or Python — but another benefit to choosing JavaScript is that it works very well with JSON and XML data sources, which made it very easy for me to use the API.

Mapping the MyTSA Web Service to the Alexa skill

With the basic code foundation for my Alexa skill in place, the next step was to create the connection to the data source.

All the data for the Airport Security Line Wait Times skill comes from a web service written by the U.S. Department of Homeland Security called MyTSA Web Service API. It publishes last reported wait time information from security lines at more than 450 airports.

The API itself includes three primary pieces of data: airport names (by airport code), terminals and wait times. My job was to write the code that could fetch these inputs and then present that information through Alexa’s voice app architecture.

Voice app user experience: Defining intents and utterances

The real magic for building a great voice app comes in defining how human (user) words trigger different responses. The programming model for creating Alexa skills offers quite a bit of flexibility. To start, you need to define an invocation word — the term that a user says to an Amazon Echo (or other Alexa device) that first triggers a skill.

For our skill, we defined the invocation word as security line.

If you say, for example, “Alexa, ask security line what’s the wait time at SFO”, Alexa will parse the words and recognize that security line is the invocation word for our skill.

The next step is to script utterances and intents. Utterances are recognized phrases that might be part of a sentence that a user might say. Alexa tries to match a whole sentence a user says with one utterance, and each utterance maps to an intent. An intent represents what information the user is trying to get from the skill. For example, a food delivery skill might have “check menu” and “place order” intents.

For this skill, our sample utterances were defined as:

Alexa, ask security line what’s the wait time at [airport] [terminal value]?

Alexa, ask security line how long is the security line at [airport] [terminal value]?

Alexa, ask security line how long is the wait at [airport code]?

Once Alexa determines the intent from what the user has said, it will look for keywords in the sentence, or slot values. Those are parameters that an intent expects in order to process the user request. When you define utterances, you add slots to those sentences, so when a user sentence is matched to an utterance, the slot values will also be matched. Then, we will access those values in our server-side code to do the computations, resulting in the appropriate response.

For our security line skill, we have two slot values: airports and terminal value (e.g. “terminal 1” or “terminal C”). Those need to be configured by the developer — in my case, I took the API and parsed all the data from airports and terminals (which included cleaning up some duplicates), then added those data bits as the slot values. Below you can see the main function of the skill and how it uses the slot values in our code:

With that tedious task completed, Alexa did most of the work from that point forward. After interpreting these values, making one remote API call and doing some computation on the results, Alexa was able to return the expected response to the user.

There are two basic types of responses: terminal and open session. If the user, for example, asks “Alexa, ask security line how long is the wait at SFO?” then Alexa’s response would continue the open session. That’s because she would need to ask for the other slot value: the terminal value. The response would be, “For which terminal?” But if the user offered the terminal value in the original question, Alexa would be able to answer with an estimated wait time — and terminate the session.

With everything defined, we’ve now got a working app that sounds and responds like this:

All that’s left now is to do some testing.

Testing and debugging my Alexa skill

When you’re ready to test your work, you compress the code in a zipped file and upload it through the AWS console. This is also the only simple way to test your skill. Unlike on a mobile phone, where you can push software via USB to a device, you cannot upload a file directly to Alexa. You could set up your local machine to execute and debug your code from your local development environment — which I’d recommend doing for more complex projects. Here’s a good post from Amazon on how to do that.

Otherwise the process is: edit your local code, zip, upload, test, rinse and repeat. If you have Node.JS locally installed, you can also run your code directly from the command line, but not all Alexa SDK functions will work. I found this testing process to be a bit cumbersome — but once you’re ready to publish, this process makes it simple to launch an app.

Publishing an Alexa skill

The process of publishing an Alexa skill was very straightforward, largely due to the fact that all of my code was already in the right place. Pushing the skill to Amazon for approval was simply a matter of checking a box. That’s very different from other platforms, such as iTunes Connect, where once a project is complete, there’s a lot of work to prepare the files and upload them for review.

First take on developing for Alexa

The best way I can describe my first hands-on work as an Alexa developer is that it was frictionless. I found the programming concepts to be fairly straightforward to grasp. Amazon does a nice job of reducing friction with Lambda and AWS. Of course, the simplicity comes at a cost of capabilities. Building a more complex application would require an investment in building a back-end server. And Alexa, on its own, doesn’t provide much depth to its AI other than parsing intents and slot values. So, building a more intelligent application will require additional skill and expertise.

But our industry is just getting started in exploring the vast potential for conversational voice applications. As our developer community becomes more acclimated to platforms like Alexa and Google Home — and with the investment our industry is making in AI — it’s easy to envision a not-too-distant world full of rich experiences with personal assistants.

Want your own voice app for your brand or business? Contact us and let’s have a conversation.