Developing for Alexa using Elixir (Part One)

Written by Eugen Minciu on Apr 13 2017

Alexa is an intelligent personal assistant launched by Amazon in late 2014. It’s capable of voice interaction, music playback, todo lists, alarms, and a whole lot more. You can control Alexa through devices such as, Echo, Dot, and other integrated hardware. But most importantly, Alexa is extensible by any and all developers.

Mad Skills with Alexa

Almost anything you can control through spoken word can become a skill on Alexa. A skill is an encapsulated piece of functionality written by a developer. A skill can do things like provide stock prices, switch on lighting, or order a pizza. If you can ask for something, chances are you can write a skill for it.

A LessEverything Skill, So Near

At LessEverything there are projects in development where Alexa will be part of the UI. Since some of the tooling around Alexa apps was incomplete, we recently developed a new skill outside client work. This gave us an opportunity to further our understanding of how the service works and contribute some source back to the community.

We came up with a skill which allows you to record where you have left an item and recall its location at a later time. To use the skill you might say,

“Alexa, tell So Near I’m putting my car keys in the kitchen drawer.”

When you need reminding where your car keys are you can ask,

“Alexa, ask So Near where my car keys are.”

And the skill will respond with,

“You last put your car keys in the kitchen drawer.”

Pretty sweet, right?

How Alexa Works (A 30,000 Foot View)

When developing for Alexa you deal with two main areas. There is the Skill Interface and the Skill Service. The interface receives the voice command from the caller and passes it to the service. In some ways, they’re the equivalents of the front-end (interface) and back-end (service) in regular web projects.

When someone speaks to their device the Skill Interface is Alexa’s first port of call. The Skill Interface aims to work out:

what the user wants to do, which, in Alexa’s parlance is called an intent, and

whether the user’s utterance contains any words that should be passed over to your service as they were received. (for us, those would be “keys” or “kitchen counter”); Amazon calls these slots

Configuration of the Skill Interface happens in the Alexa Control Panel. There are three key elements needed.

Intents

In our skill there are two main intents. RecordLocationIntent and FindObjectIntent . There are others, but we’ll gloss over them for now.

Slots

Each intent can have optional slots that go with it. We have one set up for the Object they are placing and one for the Location it’s placed in.

Slots are optional as some intents do not need them, e.g., “Alexa, tell me a joke”.

There are pre-built slots such as “Room” (AMAZON.Room) which will auto populate the common names of rooms. Others include dates, countries etc.

For our intents we needed more than the pre-builts supplied. To accommodate this we have to provide Custom slots. To do so we give a custom slot name and a list of possible values. Looking at a sample, our LOCATIONS slot shows,

hallway sofa storage unit park car pantry hallway cupboard coat rack office ...

You’re allowed up to 50,000 entries per slot. They should, to your best effort, be representative of real world use. For example, if you expect two word locations 20% of the time, your list should have ~20% of it as two word entries.

You provide the Intents and their Slots in the JSON format. Here’s an excerpt from ours,

{ "intents": [ { "intent": "RecordLocationIntent", "slots": [ { "name": "Location", "type": "LOCATIONS" }, { "name": "Object", "type": "OBJECTS" } ] },

Utterances

Utterances are how Intents get recognized and Slots populated. You provide utterances for each of the intents and Alexa uses them to build an interaction model.

Using the RecordLocationIntent as an example, our sample utterances might be,

RecordLocationIntent leaving my {Object} in my {Location} RecordLocationIntent I left my {Object} in the {Location} RecordLocationIntent I put my {Object} in the {Location} RecordLocationIntent I have my {Object} in the {Location} RecordLocationIntent my {Object} are in the {Location} RecordLocationIntent my {Object} is in the {Location}

Each entry names the Intent, then gives a way in which a caller might phrase the placement of an object in a location. You can see the named slots, giving Alexa some idea of where to expect the variables we need.

The list is not exhaustive. People may well phrase this intent another way, but Alexa tries to plug this potential gap for you. When you submit the Intents, Slots, and Utterances the service churns its gears for a time; it’s filling in the potential gaps it suspects you’ve left.

The Request

Once Intents, Slots, and Utterances are in place you can test (from a text based form in the control panel) what Alexa is going to submit to the Skill Service.

Using an utterance of “I’m leaving my car keys in the kitchen drawer”, the call to our Skill Service might be,

{ "session": { "sessionId": "SessionId.<sessionId>", "application": { "applicationId": "amzn1.ask.skill.<reference>" }, "attributes": {}, "user": { "userId": "amzn1.ask.account.<userId>" }, "new": true }, "request": { "type": "IntentRequest", "requestId": "EdwRequestId.<requestId>", "locale": "en-US", "timestamp": "2017-01-30T18:09:08Z", "intent": { "name": "RecordLocationIntent", "slots": { "Object": { "name": "Object", "value": "car keys" }, "Location": { "name": "Location", "value": "kitchen drawer" } } } }, "version": "1.0" }

This is the JSON we’ll need to respond to in our Skill Service. They key elements we’ll look at are the type and intent .

We’ll cover how you use these fields in Part Two.