* **

The first step to acquiring a voicebot like this was to figure out what the people selling it might call it. Certainly they would not refer to their services as "robot telemarketing."

I started looking for the right jargon to Google. As it turns out, there are two key phrases: "interactive voice response" and "outbound." Interactive voice response refers to telephone systems that can process what you're saying and respond appropriately (even intelligently at times). Outbound call centers make calls; the inverse, inbound, refers to systems that receive calls from customers.

So, put them together and you have, "Outbound IVR," which Datamonitor projected should be a half billion dollar market by now.

Outbound IVR, though, is not generally supposed to be used for telemarketing. It's supposed to be used to deliver automated messages and provide just a smidgen of interactivity. So, a common use case might be to call a debtor up and ask them to pay a bill. Then the machine can take that payment without transferring you to a human. Or automated scheduling: a doctor's office could confirm that a patient has an appointment with the voicebot.

Why isn't outbound IVR used for telemarketing?

Well, primarily because IVR is really, really hard. It is widely recognized that voice-to-text with your phone (i.e. Siri) is far from reliable. And Siri actually has a lot better data to work with. An IVR bot has to work with the low-quality audio that's transmitted through the public switched telephone network (PSTN). Quality, in this case, being a quantitative measure of how much data is in the audio.

That's why the voice recognition on company telephone systems is a target for mockery. ("I said three. No, no, I said THREE. THREE!") And when someone is calling into a company, the company severely restricts the scenarios that the IVR bot has to work within. The bot knows what it's listening for. And it's still just OK.

Now, Samantha West actually uses a bunch of different responses as it tries to pose as a general-purpose salesperson. The queries that the editors launch at Samantha are pretty complex, and yet she comes back with an appropriate (if limited) response.

When I contacted outbound marketing companies and showed them the story with the clips, they all said they don't or can't do this sort of interactive voice response.

One source, who agreed to explain the problem on the condition that they would not be associated with this marketing bot, gave a fascinating explanation of why the telemarketing robot probably was not possible.

Getting this to work so quickly would be very difficult to achieve automatically as the audio on PSTN calls is 8000 Hz mono. For reference that is one less channel and 120,000 less hertz than the low quality mp3s in your music collection. This is why voice recognition is so aggravating over the phone - there is very little signal upon which to perform feature recognition. Even the fastest in the business (Nuance) doesn't respond this quickly. Even provided you could do the recognition under 50 milliseconds, the answers the gentleman is giving on the call are very fuzzy. These aren't boolean "yes" and "no" - they are simple and complex sentences wildly divergent from the prompt of the robot. So some [natural language processing] would need to be performed on the human's response to translate what was being said then fuzzy match the result against what an appropriate response would be. Doing all that in a delay for natural conversation doesn't sound possible to me. The only product that might have a shot at it are Nuance, but even then I don't think they are fast enough.

Other sources also suggested that Nuance might be the only company whose technology could do it. But when I contacted Nuance, a representative told me that they were not involved directly in the design of the software, nor did they know of anyone who was doing such a thing. (They did admit that there are people who could have gotten their hands on the software through resellers, but to their knowledge, this had not happened.)

So, if it's not a robot, what gives then? Because clearly someone is giving canned responses.

The theory I heard — and keep in mind it is just a hypothesis to explain a perplexing situation — goes like this:

Samantha West is a human being who understands English but who is responding with a soundboard of different pre-recorded messages. So a human parses the English being spoken and plays a message from Samantha West. It is IVR, but the semantic intelligence is being provided by a human. You could call it a cyborg system. Or perhaps an automaton in that 18th-century sense

If you're reading this, you must be wondering: WHY?!?!