Not unlike J.A.R.V.I.S. from Iron Man, Josh is a voice activated personal assistant (focused on the smart home). One of the things we aspire to is a natural flow of communication between users and the system. In a perfect world, you should never have to stress about whether or not Josh understands you. However, let’s stop for a minute and think about what that actually means. What does it mean to have a truly conversational system? What does it mean to understand language?

Hollywood is all smoke and mirrors

First things first, let’s go back to the classic example of HAL 9000. In the infamous scene where Dave tries to open the pod bay doors, HAL famously responds with “I’m sorry Dave, I’m afraid I can’t do that.” Movies have a habit of making things seem far more simple than they actually are, so let’s break this down just a little bit. It seems that forming a simple sentence should be pretty simple, right? Not exactly.

The famous scene from the 1968 film “2001: Space Odyssey”

In this situation, HAL needs to understand the concept of open, that is, to split the doors and HAL needs to understand which doors Dave is referring to.

Simple is always complicated

If you’re familiar with any type of software development, this can be imagined to be the same as:

open(podbay_doors)

Where open() is a function that takes in an instance of a door, and then can open that door, probably by calling an .open() method on the door object. This means that HAL knew that the word open had to map to the function open. Okay, cool, that’s not bad. You can maybe just look for words that are known to be an action, then use some cute reflection to call a method with the same name as the action uttered. Then, HAL had to know that the pod bay doors were what to open. Thinking about context, pod bay doors could be obvious because they were on a space ship. In truth it isn’t, because there are multiple doors that could have all been what Dave wanted, but let’s just say that it is actually that simple. Hold on though, there’s more. HAL had to respond to Dave. This is where the magic happens.

Firstly, humans ourselves are notoriously bad at forming grammatically correct sentences. We drop words, we assume context, we back track. Building a coherent sentence is no easy task, but HAL doesn’t just stop at that. HAL starts by saying “I’m sorry, Dave.” HAL actually apologizes ahead of time, because HAL knows that Dave won’t like what follows. This means that HAL had to understand the emotions that Dave would go through after hearing the sentence that HAL hasn’t even said yet. Not only is that impressive, but HAL was actually polite enough to think to apologize ahead of time.

Then he delivers the rest of the famous quote: “I’m afraid I can’t do that.” Again, HAL doesn’t just respond with a simple “no”. HAL actually makes it explicit that he’s “afraid.” I mean, he’s completely mocking Dave, but he knows to hide his mockery in the form of a polite phrase by saying “I’m afraid.” Lastly, HAL actually has the gall to say he “can’t” do that. Not that he won’t do that, but that he can’t do that. Now, most of us use “can” improperly, because we don’t actually know that we should use “will.” Being an advanced intelligence, HAL knows that “won’t” would be the proper thing to say. HAL chooses to use “can’t” as if someone else is giving him the permission, but at this point Dave knows, you know, I know, we all know HAL is the one in charge. It’s a double mock of Dave, and HAL is just being snide.

For many people, HAL 9000 was the first real time people were exposed to the idea of an artificial intelligence. Even today, it’s still a fantastic example of such. There are more recent ones like J.A.R.V.I.S. in the modern Ironman movies. There’s also the female AI from the movie Her, named Samantha. Maybe one of my favorite AIs is GLaDOS from the Portal franchise of video games. All of these systems display the fact that they can sympathize, which is just a mind blowing concept.

GLaDOS from Portal

More than just words

So, what does it mean to understand language? Well, simply put, it means that you can understand the relationship between words and the objects they refer to. Then the next question is, what does it take to understand these relationships? It requires what we call “world knowledge”. The words only represent concepts, but a system that can understand language needs to understand what these concepts mean.

A system developed by Google called Word2Vec has a math like function with words that works to fairly good success. For example:

king - man + woman = ?

Well, a king can be considered a male in a seat of power. What do we call the female in the seat of power? Word2Vec will tell you the answer is “queen”, and in my opinion that’s a pretty good answer.

Obtaining world knowledge means that we need to be able to identify and segment what information we receive so that we can properly bin it into categories in our brain. There are many ways for us to obtain this world information as well. We take in data through all of our senses, and computers don’t have access to this type of data the same way we do.

A great quote on the subject was uttered by one of the world leaders in deep learning, Yann Lecun. In an AMA on Reddit, he was asked the following question:

“How would you rank the real challenges/bottlenecks in engineering an intelligent ‘OS’ like the one demonstrated in the movie ‘Her’ … given current challenges in audio processing, NLP, cognitive computing, machine learning, transfer learning, conversational AI, affective computing .. etc. (i don’t even know if the bottlenecks are in these fields or something else completely). What are your thoughts?”

His answer in full can be found here, but here is a condensed version:

Yann Lecun, Director of Research at Facebook