“I saw the man with the binoculars.”

Is this sentence describing a man seen through binoculars or man holding binoculars?

“Police help dog bite victim.”

Are the police assisting the victim or the dogs?

Language is complex. When a message is expressed in natural language, spoken or written, each word counts. Often the message is clear. Other times, the message is ambiguous or dependent on context and subtext. Occasionally, the meaning is lost completely.

The ambiguity of language is especially apparent in public polling. Political pollster Frank Luntz even wrote a book about it called “It’s Not What You Say, It’s What People Hear.” Luntz found that small word choices dramatically change respondents’ understanding of questions and the interpretation of results.

Recently, we posted a simple political survey example to our hosted scripts interface. The script calls out to a list of phone numbers, and performs a simple election poll. It’s an example that we hope invites users to experience natural language understanding for themselves.

In the example, the script calls out to a potential respondent and requests permissions to continue with an automated survey. Below you can see the snippet used to ask a question and gather a response.

results['party'] = conn.getMultipleChoice(

['republican', 'democrat', 'independent',

'no party', 'libertarian', 'green'],

{

promptUrl: CDN + 'survey_party.wav'

}

);

The getMultipleChoice method plays the question from an audio file (in this case, a question about political party affiliation), and then restricts expected responses to the six choices listed.

When restricted to a set of valid responses, the transcriber is instructed to look out for (or ‘boost’) those words and phrases to reduce error in transcription. However, that is only helpful if the caller responds with one of those six exact choices.

But what happens if the caller goes off script? Consider again the challenges natural language presents: First, by limiting the set of valid responses, we may be unwittingly biasing results. Second, there may be multiple valid ways to express the same intended answer. What should we do?

One way to address this problem is to simply transcribe whatever the user says. We replace getMultipleChoice with getFreeResponse.

results['party'] = conn.getFreeResponse({

promptUrl: CDN + 'survey_party.wav'

});

Now, if the respondent says “Constitution Party” or “don’t have one” or “I didn’t hear the question the washing machine is loud” we capture the response. Sample responses from the raw transcript look like this:

“oh i don’t know if i have a party”

“well i have always been a democrat”

“i voted republican in the last election”

If the poll has a large sample size — thousands or tens of thousands of respondents — we’d just be pushing our interpretation problem further downstream.

Instead, Sift offers a solution called scanners. The Sift scanner is designed to interpret and extract from spoken language in a way that is both natural and expressive.

When you ask the scanner to search for ‘independent’, it searches for the exact word. However, let’s say you want to look for something that means independent. The scanner allows you to express this by adding tildes: ~‘independent’. We can also group together options using expressions like and, or, and then.

Using the method getScannedResponse, we now can provide a list of fuzzy options, with room for rephrasing.

var partyScan = "~~~'{democrat:party}' or ~~~'{independent:party}'";

while (!party) {

var result = connection.getScannedResponse(partyScan, {

promptUrl: CDN + "survey_party.wav",

timeoutSeconds: 30

});

if (result && result.extractions) {

party = result.extractions['party'].value;

}

}

Now, we are using the scanner to look for responses like ‘democrat’ or like ‘independent’. By adding three tildes (the maximum), we’re telling Sift to look for anything that carries a remotely similar meaning. The brackets say to extract the result, and the ‘:party’ tag says to store it to a result called party.

When the scanner extracts from speech, both the query and the speech are converted into specialized mathematical structures that encode meaning and allow for rapid comparison.

The scanner query slides over the transcript, looking for a good match.

This technique combines speed with linguistic flexibility. But most importantly, it’s often easier to use the scanner with a query that contains a sample match rather than trying to exhaustively think of ways that a person might respond to a prompt.