7 min read

This article is part of Demystifying AI, a series of posts that (try to) disambiguate the jargon and myths surrounding artificial intelligence.

“Alexa, are you ready to have a body?”

Steady advances in artificial intelligence and natural language processing have made digital assistants such as Amazon’s Alexa increasingly capable of performing complicated voice commands under different circumstances.

But does it mean that our digital assistants are ready to escape the confines of smartphones, smart speakers and computers (and a bunch of weird gadgets)?

“The only way to make smart assistants really smart is to give it eyes and let it explore the world,” Rohit Prasad, head scientist of the Alexa artificial intelligence group at Amazon, recently said at the MIT Technology Review’s EmTech Digital conference.

Prasad didn’t explicitly say what it means to “give [Alexa] eyes and let it explore the world,” the statement strongly hints at an Alexa-powered robot (at least that’s how MIT Tech Review has interpreted his words). While the idea of putting a face on the voices of Alexa, Siri and Cortana sounds appealing, the truth is that with today’s AI technology, such an idea is doomed to fail.

The failure of robot projects

Jibo, the “first social robot for the home,” recently shut down. Mayfield Robotics, the manufacturer of the Kuri home robot, shut down in August. In October, Boston-based Rethink Robotics had to close shop because they couldn’t find a working business model for their famous Baxtor and Sawyer robots.

Boston Dynamics, the company that became famous with the YouTube videos of its robots performing incredible feats, rarely shows the human operators who are controlling and guiding its robots. Google acquired Boston Dynamics in 2013, but then sold it to Japanese tech giant SoftBank in 2017 because it didn’t fit in its strategy. Boston Dynamics is still struggling to find real-world problems to solve with its robots.

These are just a few of a string of failed robot projects, with probably more to come. To be clear, Alexa is backed by one of the largest and richest tech companies in the world. Amazon sits on a wealth of data, money and experience in creating tech products. But will Amazon’s virtually limitless resources be enough to overcome the challenges of creating an Alexa-backed robot?

The navigation challenges of robots

Teaching robots to navigate open environments is very difficult, even when equipped with the most advanced AI technologies. Any number of things can happen, and unless the AI powering the robot has an abstract and high-level knowledge of the world, it won’t be able to carry out its tasks without the help of humans.

That is exactly what contemporary AI lacks.

Robots and self-driving cars use computer vision to analyze their surroundings and navigate the world. Computer vision is the science that tries to replicate the workings of the human vision system and helps software make sense of the content of images and video.

At the moment, the most popular AI technique used in computer vision is deep learning. Deep learning algorithms ingest a huge number of examples to develop their behavior. For instance, a deep learning model that wants to help a robot navigate homes will have to see videos and pictures of different room types, different decorations, furniture, tables, carpets… to know how to find its way around different obstacles.

Even when trained with millions of samples, a deep learning model will not have a general understanding of what a room is, why there’s a table in the kitchen, why there are chairs around tables, etc. It will just have a statistical knowledge of the type of images it should see around a house, which ones it can go over, which ones it needs to avoid, and so on.

If the robot faces a new setting, or a new object or a new color composition it has never seen before, its AI will not know what to do and will act in an erratic manner. A short-term fix is to just throw more data at the problem and continue to train the AI models with all sorts of new kinds of samples.

Amazon sits on a vast sea of data that might be able to help train the Alexa robot’s AI algorithms. It can also tap into the vast resources of its Mechanical Turk platform to crowdsource some of the training work. But that will not solve the problem of giving AI a general understanding of the world, objects and relations between them.

Without that general understanding, even the most sophisticated AI model run into “edge cases,” scenarios that the AI has not been trained for. This is why it’s so hard to design robots and self-driving cars that can navigate open environments.

Some companies use complementary technologies such as sensors, radars and lidars to enable robots to map their surroundings. These hardware additions reduce error rates (and raise the costs). But even a perfect 3D mapping of the surrounding can cause errors if the AI doesn’t have a logical understanding of its environment.

Alexa will be facing an even bigger problem if it wants to handle objects as well as navigate environments. Robots have historically been bad at handling objects, unless in a very controlled environment. In recent years, companies have used advanced AI techniques such as reinforcement learning to train robot hands to carry out different tasks by themselves. But such methods require massive amounts of data and compute resources (again something that Amazon has in abundance) and have yet to fulfill real-world use cases.

The challenges of interacting with AI assistants

Now let’s say Amazon manages to create an Alexa robot that can “explore the world” and has an AI that can navigate different environments with acceptable accuracy most of the time, and only makes stupid mistakes every now and then.

The next question will be, what should this robot do?

Right now, Alexa has tens of thousands of skills, but most of them are simple tasks such as playing music, answering queries, and interacting with smart home devices. These are the kind of things you could expect from an inanimate object sitting on your table.

But our expectations will certainly change when Alexa escapes the shell of the Echo smart speaker and finds its own body. We will expect our AI assistant to manifest human-like behavior and intelligence. We will expect them to have many of the cognitive skills that we take for granted.

To be clear, AI assistants are already struggling to perform tasks that require multiple steps. Some of those problems are due to the limits of a voice-only interface. For example, smart speakers are very limited in helping users browse and choose between different options when making a choice. They’re also not very good at going back and forth between multiple steps. That’s why tasks like playing music and setting timers remain the more popular use cases for smart speakers.

But the bigger problem of digital assistants are the limits of contemporary AI in understanding and processing human language. Advances in deep learning and neural networks have created breakthroughs in automated speech recognition and natural language processing. AI is now better than ever in transforming speech to text and mapping text to commands.

But AI is still struggling to understand the context and meaning of words. At the heart of the most complicated language processing AI algorithms is still statistics. Your smart speaker will be able to respond to different variations of “What is the weather tomorrow?” “How’s the weather on Monday?” and “Will it rain next week?” But that is only because it has seen thousands of similar sentences and the corresponding function they must perform. It has no understanding of the concepts of weather, rain and weekday.

That’s why if you suddenly become distracted in the middle of a voice command to your AI assistant and say, “Alexa, how’s the weather on… umm… let me see… Monday—no wait, Tuesday?” your smart speaker will not be able to respond. But for a human, it would be a no-brainer.

Give Alexa a body, limbs and eyes to “experience” the world, and maybe it’ll be able to remove some of the confusion from the user experience. But the language understanding problem will not go away. Meanwhile, we have a tendency to anthropomorphize anything that scantly behaves or looks like humans. That means our expectations of the AI assistant will only increase when they enter their robot shells, especially since we’ll be forking over a larger sum to purchase them.

But what’s clear is that there’s a stark difference between AI and human intelligence, and no matter how human-like Alexa will be, it will not be able to fulfill our expectations.

What’s the optimal use for AI assistants?

Maybe someday, scientists will be able to crack the code of artificial general intelligence (AGI), the kind of AI that will be able to think like humans, without requiring huge amounts of examples and a ton of computing power (not everyone is a fan of AGI). Deep learning, machine learning and other AI technologies we currently have are considered narrow artificial intelligence, which means they can perform one specific task very well, but aren’t very good at general problem-solving or carrying their knowledge to other domains.

Until such time (if that time ever comes) that human kind manages to create general AI, we’ll have to find ways to put our digital assistants to efficient use. And key to that will be to recognize the limits of artificial intelligence and focus on putting narrow AI to good use.

What does this mean for digital assistants like Alexa, Siri and Cortana? Here are two scenarios that work best with current AI technology.

The narrow AI approach

The proposition of having an Alexa robot is something that will test the limits of AI. It would sound like a single AI-powered device that can perform thousands of tasks. The owner of the robot would have no way of knowing what the device can and can’t do. There’s a lot of ground for confusion and errors.

The narrow AI approach is to have multiple Alexa-powered devices that can perform specific tasks. This is something that Amazon has already tested successfully. When speaking to a light bulb or a microwave oven, you have a pretty clear idea of what you can and can’t say to it. The idea would be to see more AI-powered gadgets in homes, offices and cars.

Instead of a physically present robot, Alexa would be an omnipresent AI assistant that would be incorporated into all of your devices and would be able to take and execute commands to each specific device.

From a functional standpoint, this approach would work within the boundaries of current AI technology. But it isn’t a perfect solution. At the very least, AI-powered devices would entail privacy concerns, especially since tech giants don’t have a brilliant record when it comes to making responsible use of customer data.

The augmented intelligence approach

An alternative way to think about AI, which has become popular in the past few years, is to consider it as a complement and not a replacement to human intelligence and cognitive efforts. Known as augmented intelligence, this approach looks for ways AI can help humans better perform tasks by automating some of the steps, not the entire process.

One of the areas where AI assistants can perform augmented intelligence is AR headsets. When using augmented reality headsets, users don’t have access to rich user interfaces to interact with applications. This is where a voice enabled AI assistant can help a lot by relieving the cognitive burden from the user. For instance, users can query for information while using the headset.

AR headsets also enable better cooperation between humans and AI. Instead of exploring the world for itself, the AI assistant would be able to view it through the eyes of the user and better interact with the surrounding world and respond to commands.

Magic Leap, the company behind the famous namesake mixed reality headset, is contemplating creating AI assistants to go with its devices.

The robots are not coming—yet

We humans like to take cues from nature when we want to invent new things. But experience and history shows that we usually end up taking a different course: Planes fly, but they don’t flap their wings, and cars look nothing like horses.

Thinking about human-like robots is nice, but we must also acknowledge that replicating all the functionalities of the human brain, which is perhaps the most complex creation of nature, is all but impossible. So Alexa and other digital assistants will find new ways to make our lives easier, but they may never have their own human-like bodies.