Using Spacy to build Conversational Interfaces for Alexa

Over the last year I’ve been working on an Alexa Skill to automate some of the devices in my home. Working with the Amazon API’s and dashboard for the provider-side parts of the setup has been good. However there’s an assumptive built into the tools for Alexa and Google Home that the developer will use tools like Dialogflow or the skills console to create the conversational interfaces. The reason for this assumptive is that traditionally creating conversational interfaces is non-trivial.

Take the simple case of an app which provides the time to the user. The user may say what time is it? . If you were to write a regex or other matcher to match this string, you’d have many other variations like what’s the time which would have the same intent but would not match. New techniques which are not normally part of the developers toolbox - like machine learning - have to be employed to determine the user’s intent across many variations of an utterance.

In this day and age, can a developer use off-the shelf tools to handle these variations in utterances directly, rather than depend on a service like Dialogflow? The answer is yes. Fantastic NLP libraries such as Spacy can be used to perform this function. A developer can use Spacy to handle the Natural Language tasks and have better control over processing of language then they would if they used a third party.

We’ll install Spacy below, and demonstrate how to use it to handle the Natural Language part of the Conversational Interface - using the raw user utterances coming directly from the Alexa service. We will do this with a simple skill which can tell the user the current time in any major city around the world.

The environment

The blog below assumes you’ve worked with python and flask-ask before. The example code is at https://github.com/gregretkowski/alexa-spacy My dev environment is An Ubuntu Xenial host and I use ngrok to proxy for Alexa Skills development.

Installing Spacy

Spacy is easily installed, and you’ll need both the Spacy package and a pretrained model for English. We also install numpy to do some maths when we are categorizing utterances.

pip install flask-ask pip install spacy wget https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz pip install file://`pwd`/en_core_web_sm-2.0.0.tar.gz#en_core_web_sm pip install numpy

Skill setup in the console (for passing through raw text)

Amazon has a data type called AMAZON.LITERAL which will give you the text of the user’s utterance. We can use this type to pass the full raw string to our application, and do all of the utterance processing ourselves. To do this, in the Alexa Skills console under the new Interaction Model interface we need set up our intent:

"intents": [ ... { "name": "Spacy", "samples": [ "{how are you|rawtext}", "{help|rawtext}", "{what time is it|rawtext}" ], "slots": [ { "name": "rawtext", "type": "AMAZON.LITERAL" } ] }

Classifying the utterance

Our application will have three main routes that a user can take. The user can first ask for help, and get back some help. Secondly the user can ask ‘how do you feel’. And finally, the user can ask what time it is, and optionally provide a city.

We will use Spacy to determine if the utterance is a member of one of these three CLASSES (or of none of them) - and use that information to decide which route the user is taking through the application.

First we set up a dictionary of CLASSES with example utterances:

CLASSES = { "help" : [ "help" , "i am lost" , "help me" ], "time" : [ "what time is it" , "what time is it in london" , "when is it now" , "what is the time" ], "mood" : [ "how are you doing" , "what mood are you in" , "what do you think" ] } # If Spacy is less confident then 75% then its an unknown utterance. THRESHOLD = 0.75

Next we write the classification function. This function will take the user’s utterance, and have Spacy calculate its semantic similarity to all of the example utterances in our CLASSES. If it matches one of these CLASSES at high confidence we return the class name - if it doesn’t match any of them we return unknown.

import en_core_web_sm nlp = en_core_web_sm . load () def nlp_classify ( voice_string ): # Convert to Spacy 'doc' utter = nlp ( unicode ( voice_string )) scores = [] cats = CLASSES cats_keys = cats . keys () # Iterate through each of the sample utterances... for idx , key in enumerate ( cats_keys ): v = cats [ key ] for i in v : # Spacy calculates the semantic similarity between # the user's utterance and the example utterance - # and gives a similarity score sample = nlp ( unicode ( i )) sim_score = utter . similarity ( sample ) scores . append ([ idx , sim_score ]) # We now find the example utterance with the highest # similarity score, and determine its CLASS a = np . array ( scores ) my_cat = cats_keys [ int ( a [ np . argmax ([ e for e in a [:, 1 ]])][ 0 ])] my_val = np . max ([ e for e in a [:, 1 ]]) # if the similarity score was below THRESHOLD, then the # user uttered something that was not close to any of our # example utterances, and we treat it as an 'unknown' utterance. if my_val < THRESHOLD : my_cat = 'unknown' return my_cat , my_val

And then in our main routing code - we get the label/class, and take the appropriate action based on that class:

@ask.launch @ask.intent ( 'Spacy' , default = { 'rawtext' : '' }) def mainroute ( rawtext = None ): nlp_label , nlp_score = nlp_classify ( rawtext ) if nlp_label == "mood" : r_st = "Good afternoon. Everything is going extremely well." return statement ( r_st ) . simple_card ( 'My Mood' , r_st ) elif nlp_label == 'time' : return get_time ( rawtext ) elif nlp_label == 'unknown' : return r_help ()

Using Spacy to identify Entities

Spacy can also detect and identify entities (generally, proper nouns). We can use this feature to pull information from the user’s utterance. For example, we can give the user the current time in a particular city instead of just the local time. To do that we have Spacy identify any Entities . If an entity such as a city is present, we geolocate the city, get its timezone, and then return the local time in that city.

# GETTING A TIME AT A CITY from datetime import datetime from geopy import geocoders from tzwhere import tzwhere from pytz import timezone tz = tzwhere . tzwhere () TIME_FORMAT = ' %- I % M % p' def get_time ( voice_string ): # We use 'title' to capitalize all the words, entities being proper # nouns are capitalized. doc = nlp ( voice_string . title ()) print "DOC ENTS" print len ( doc . ents ) print "STRING" print voice_string # If an entity is detected, assume it's a specific city! if len ( doc . ents ) > 0 : # get the city name, geocode it with Google, find the # timezone from the lat/lon and then get the time in that zone city = doc . ents [ 0 ] g = geocoders . GoogleV3 () place , ( lat , lng ) = g . geocode ( city ) tz_str = tz . tzNameAt ( lat , lng ) c_tz = timezone ( tz_str ) now_time = datetime . now ( c_tz ) my_loc = "in % s" % city # Otherwise, it's just the 'local' time else : now_time = datetime . now () my_loc = "" nice_time = datetime . strftime ( now_time , TIME_FORMAT ) response = "The time % s is % s" % ( my_loc , nice_time ) return statement ( response ) . simple_card ( 'The Time' , response )

Summary

This just touches the tip of the iceberg on the capabilites of Spacy for processing utterances for digital assistants such as Alexa and OkGoogle. You no longer have to depend on third parties to create the conversational interfaces for your Alexa skills!