Embodied AI, we argued in favor of transforming audio-based virtual assistants, such as Alexa, into AI-powered avatars for ease of skill discovery and more humanlike interactivity. In short, start by equipping Alexa and Siri with eyes on a screen. In the last issue of, we argued in favor of transforming audio-based virtual assistants, such as Alexa, intofor ease of skill discovery and more humanlike interactivity. In short, start by equipping Alexa and Siri withon a screen.

MIT who helped invent virtual assistants, and Alexa, share similar opinions about the current limitations to virtual assistants, i.e. common sense, situational awareness, and the important role of eyes for virtual assistants. Therefore, we are delighted to find out that both Boris Katz , a principal researcher atwho helped invent virtual assistants, and Rohit Prasad , head scientist of, share similar opinions about the current limitations to virtual assistants, i.e. common sense, situational awareness, and the important role of eyes for virtual assistants.

“Incredible progress…incredibly stupid”

That is quite harsh, but it is how Katz thinks of Alexa, Siri , and other virtual assistants in his interview with Technology Review ’s Will Knight : a conflicted feeling of pride and embarrassment. On the one hand, Katz is proud of the progress on and the adoption of virtual assistants. But on the other hand, he thinks these programs are “incredibly stupid”.

To be fair, Alexa and her likes are not stupid: they are rather a feat of software engineering with tremendous potential for improvement. But Katz’s candid opinions draw three important takeaways. First, Katz is dubious that training models on huge amounts of data would solve language understanding. Second, language understanding should not be isolated from other modalities like visual, tactile, and other sensory inputs. Third, common sense and intuitive physics are essential for virtual assistants.

Alexa Needs Eyes

But while Alexa can quickly access an encyclopedia-like knowledge base to respond to simple commands, the hack could only go so far. Prasad’s opinion is that “[the] only way to make smart assistants really smart is to give it eyes and let it explore the world.”