How does Amazon’s Alexa assistant field complex commands like “Alexa, add peanut butter and milk to the shopping list and play music”? With a few sophisticated algorithmic techniques, as it turns out. In a newly published paper (“Practical Semantic Parsing for Spoken Language Understanding“) and accompanying blog post, scientists at Amazon’s Alexa AI research division detail an AI system capable of extracting both the structure and meaning of a sentence, even when the meaning and structure are complex or somewhat ambiguous.

As paper coauthor Rahul Goel explains, the model’s design was informed by two machine learning techniques: transfer learning, which transfers knowledge from an existing AI system to reduce the amount of data required to train a new model, and a copying mechanism, which enables models to deal with data they haven’t seen before.

Traditionally, Alexa parses requests by their intents (e.g., PlayMusic, SongName, and ArtistName) and slots ( Marvin Gaye’s “What’s Going On?”). But this approach necessitates a lot of error-prone manual annotation. For instance, the request “Add apples and oranges to shopping list and play music” consists of two main clauses (“add apples and oranges to shopping list” and “play music”) joined by the conjunction “and,” which is encoded in a data set as “(and(addToListIntent(add(ItemName(Apples))(ItemName(Oranges))))(PlayMusicIntent(Mediatype(Music)))).”

Image Credit: Amazon

The researchers chose instead to automatically convert data labeled according to their intents and slots into parse trees, or decision trees that depict requests’ grammatical structures. The team’s semantic parser constructed trees through a series of shift and reduce operations, where a “shift” moved to the next word in the input and a “reduce” assigned its final position in the tree. All the while, an attention mechanism tracked data examined by the parser and determined whether to use words from a lexicon or copy over words from the input stream.

In tests with natural-language understanding (NLU) data from Alexa interactions, the copy mechanism alone increased accuracy by an average of 61%, the researchers report, while transfer learning further improved it by 6.4%. And in a separate set of question-answering tests that drew on two public data sets (with questions like “What restaurant can you eat outside at?” or “How many steals did Kobe Bryant have in 2004?”), transfer learning boosted performance by 10.8%.

“The fact that our semantic parser improves performance on both natural-language-understanding and question-answering tasks indicates its promise as a general-purpose technique for representing meaning, which could have other applications, as well,” wrote Rahul Goel.

The work is scheduled to be presented at the 16th annual North American Chapter of the Association for Computational Linguistics in New Orleans, Louisiana in June.