In 2012 the world learned of a surprising research project inside Google’s secretive X lab. A giant simulation of three million neurons learned to recognize cats and people in pictures, without human help, just by looking at images taken from YouTube.

The people behind the project founded a new research group known as Google Brain inside the company’s search division. They and researchers elsewhere soon proved to the world that artificial neural networks, a decades-old invention, could understand images and speech with unprecedented accuracy (see “Google Puts Its Virtual Brain to Work”). The success of deep learning, as the technique is also known, prompted Google and others to invest heavily in artificial intelligence and has even led some experts to claim we should prepare for software that’s smarter than humans (see “What Will It Take to Build a Virtuous AI?”).

Yet Google’s cat detector was in some ways a dead end. The recent successes of deep learning are built on software that needs human help to learn—something that limits how far artificial intelligence can go.

Google’s experiment used an approach known as unsupervised learning, in which software is fed raw data and must figure things out for itself without human help. But although it learned to recognize cats, faces, and other objects, it wasn’t accurate enough to be useful. The boom in research into deep learning and products built on it rests on supervised learning, where the software is provided with data labeled by humans—for example, images tagged with the names of the objects they depict (see “Teaching Machines to Understand Us”).

That has proved incredibly effective for many problems, such as identifying objects in images, filtering spam e-mail, and even suggesting short replies to your messages, a feature introduced by Google last year. But unsupervised learning is probably needed if software is to keep getting better at understanding the world, says Jeff Dean, who leads the Google Brain group today and also worked on the cat detector project inside Google X.

“I’m pretty sure we need it,” says Dean. “Supervised learning works so well when you have the right data set, but ultimately unsupervised learning is going to be a really important component in building really intelligent systems—if you look at how humans learn, it’s almost entirely unsupervised.”

One example of that is the way we learn as infants, establishing the foundations of adult intelligence. For example, we figure out that objects still exist while out of sight and fall if unsupported, and we learn these things just by observing the world, without explicit instruction. That kind of common sense is needed if robots are to navigate the world as well as animals do. It also underpins seemingly more abstract tasks, such as understanding language.

Figuring out how software can do what comes so easily to human babies is crucial if grander ambitions for artificial intelligence are to be fulfilled, says Yann LeCun, director of Facebook’s Artificial Intelligence Research Group. “We all know that unsupervised learning is the ultimate answer,” he says. “Solving unsupervised learning will take us to the next level.”

Although they don’t have that ultimate answer yet, researchers at companies such as Facebook and Google, and in academia, are experimenting with limited forms of unsupervised learning.

One strand of research aims to create artificial neural networks that ingest video and images and then generate new imagery using the knowledge they have gained about the world—indicating that they have formed some internal representation of how it works. Making accurate predictions about the world is an important fundamental feature of human intelligence.

The "optimal" human face, according to a network of three million simulated neurons that Google fed images from YouTube.

Facebook’s researchers have made software called EyeScream that can generate recognizable images given prompts such as “church” or “airplane,” and they are working on creating software that predicts what will happen in a video. Researchers at Google’s DeepMind subsidiary have made software that looks at a photo with some parts blacked out and tries to fill them in with realistic imagery.

DeepMind is also testing an alternative to fully unsupervised learning called reinforcement learning, in which software is trained by receiving automatic feedback on its performance—for example, from the scoring system of a computer game (see “Google’s Intelligence Designer”). And researchers not using deep learning have demonstrated software that can learn how to recognize a handwritten character on the basis of a single example (see “This AI Algorithm Learns Tasks as Fast as We Do”).

Yet none of these explorations have so far revealed a path that seems guaranteed to lead to unsupervised learning at close to the human level, or software that can learn complex things about the real world just by experiencing or experimenting with it. “Right now we seem to be missing a key idea,” says Adam Coates, ‎director of the Chinese search engine Baidu’s Silicon Valley AI Lab.

Supervised learning still has a lot to offer while the search goes on, says Coates: Internet companies have access to a wealth of data on the things people do and care about, feedstock that can be used to build things like voice interfaces and personal assistants much more capable than those we have today. “In the near term there’s a lot you can do with labeled data,” he says. Large companies spend millions on getting contractors to label data to feed into their machine-learning systems.

LeCun of Facebook believes that researchers won’t be forced to subsist on labeled data forever. But he declines to guess how much longer the engine of human intelligence will remain out of reach to software. “We kind of know the ingredients; we just don’t know the recipe,” he says. “It might take a while.”