Researching cutting-edge AI is very satisfying and rewarding, but we’re seeing this great awakening, a great moment in history. For me it’s very important to think about AI’s impact in the world, and one of the most important missions is to democratize this technology. The cloud is this gigantic computing vehicle that delivers computing services to every single industry.

What have you learned so far?

We need to be much more human-centered. If you look at where we are in AI, I would say it’s the great triumph of pattern recognition. It is very task-focused, it lacks contextual awareness, and it lacks the kind of flexible learning that humans have. We also want to make technology that makes humans’ lives better, our world safer, our lives more productive and better. All this requires a layer of human-level communication and collaboration.

How can we make AI more human-centered?

There’s a great phrase, written in the ’70s: “the definition of today’s AI is a machine that can make a perfect chess move while the room is on fire.” It really speaks to the limitations of AI. In the next wave of AI research, if we want to make more helpful and useful machines, we’ve got to bring back the contextual understanding. We’ve got to bring knowledge abstraction and reasoning. These are all the most important steps.

At Stanford you created Visual Genome, a database of images that are extensively labeled so they can be used for AI systems. Is this interplay of vision and language necessary for the next leap forward?

Absolutely. Vision is a cornerstone of intelligence, and language understanding is a cornerstone of intelligence. What makes humans unique is that evolution gave us the most incredible and sophisticated vision system, motor system, and language system, and they all work together. Visual Genome is exactly the kind of project that’s pushing the boundaries of language understanding and visual understanding. And eventually we’re going to connect with the world of robotics as well.