Yoshua Bengio recently had a vision – a vision of how to build computers that learn like people do.

It happened at an academic conference in May, and he was filled with excitement – perhaps more so than he’d ever been during his decades-long career in "deep learning," an emerging field of computer science that seeks to engineer machines that mimic how the human brain processes information. Or, rather, how we assume the brain processes information.

In his hotel room, Bengio started furiously scribbling mathematical equations that captured his new ideas. Soon he was bouncing these ideas off various colleagues, including deep learning pioneer Yann LeCun of New York University. Judging from their response, Bengio knew he was onto something big.

When he made it back to his laboratory at the University of Montreal – home to one of the biggest concentrations of deep-learning researchers – Bengio and his team went to work turning his equations into functional, intelligent algorithms. About a month later, that hotel-room vision morphed into what he believes is one of the most important breakthroughs of his career, one that could accelerate the quest for artificial intelligence.

In short, Bengio has developed new ways for computers to learn without much input from us humans. Typically, machine learning requires "labeled data" – information that's been categorized by real people. If you want a computer to learn what a cat looks like, you must first show it what a cat looks like. Bengio seeks to eliminate this step.

Yoshua Bengio. Image: Courtesy Yoshua Bengio

"Today’s models can be trained on huge quantities of data, but that’s not enough," says Bengio, who together with LeCun and Google’s Geoffrey Hinton is one of the original musketeers of deep learning. "We need to discover learning algorithms that can take better advantage of all this unlabeled data that’s sitting out there."

Currently, the most widely used deep-learning models – so called artificial neural networks harnessed by the likes of search giants Google and Baidu – use a combination of labeled and unlabeled data to make sense of the world. But unlabeled information far outweighs the amount people have been able to manually label, and if deep learning is to turn the corner, it must tackle areas where labeled data is scarce, including language translation and image recognition.

Bengio's new models – which he’s tested only on small data sets – can teach themselves to capture what he calls the statistical structure of the data. Basically, when a machine learns to recognize faces, it can spew out new images that look like faces too, without human intervention. It can provide answers, like when shown only part of an image it can guess the rest – or when shown only some words in a sentence it can guess the missing ones.

Right now, the models don't have a direct commercial application, but if they can perfect them, he says, then "we can answer arbitrary questions about the variables modeled. Understanding the world means just that: We can have a good guess about any aspect of reality that is hidden to us, given those elements that we observe. That's why this is an important piece."

On the surface, these algorithms look very much like the neural nets built by Hinton for Google’s image search and photo-tagging systems, he says, but they’re much better at exploring data that's thrown at them. In other words, they’re much more intuitive.

“Intuition is just the part of the computation going on in our brain for which we don’t have conscious access. It’s really hard to decompose it into little pieces we can explain,” he says. “This is the reason why the traditional AI of the 80s and 70s failed – because it tried to build machines that could explain every single step through reasoning. It turns out it was impossible to do that. It’s much easier to train machines to develop intuitions to make the right decisions.”

A picture illustrating how the learned generative model can fill in the missing left-part of a picture when given the right hand half. Each line has a series starting with random pixels on the left hand side and then the model randomly samples pixels so that the overall configuration is plausible. Image: Courtesy Yoshua Bengio

In the world of machine learning, that’s a big deal. If Bengio’s initial findings hold up on larger data sets, they could lead to the development of algorithms that have better transfer, meaning they are more easily applied to all types of problems like natural language processing, voice recognition, and image recognition. Think of it like a previous experience you use to intuit what action you should take in a new situation. In engineering terms, the potential time saved on coding task-specific algorithms could be substantial.

Unlike other machine-learning methods, deep learning is already endowed with some transfer, or intuitive, qualities, but Bengio and his team have been working towards making improvements for years. Recently, they won two international competitions focused on transfer learning.

This resolve to iterate and improve on already existing technologies speaks to Bengio’s outlook on AI and, more broadly, on science. An academic through and through, he’s made it his life’s mission to find a fix for what’s holding back his and his colleagues’ dreams of building intelligent machines.

“We do experiments whose goal is to figure out why…not necessarily to build something that we can sell tomorrow,” says Bengio. “Once you have that understanding, you can answer questions – you can do all sorts of useful things that are economically valuable.”

That conviction, fueled by his own intuition that deep learning was the way to move machine learning forward even when it was a dirty concept, keeps him motivated and working with new students, post-docs and young professors to keep the AI dream alive. He draws inspiration from the myriad exchanges he's had with colleagues like LeCun, Hinton, and Jeff Dean of Google Brain fame. His career, he says, has really been a social endeavor. In that spirit, Bengio has put the code for his new algorithms on Github for other developers to tweak and improve, and details of the findings have been published in a series of papers on the academic researcher site arXiv.org.

"My vision is of algorithms that can make sense of all the kinds of data that we see, that can extract the kind of information in the world around us that humans have," Bengio says. "I’m fairly confident that we’ll be able to train machines not just to perform tasks but to understand the world around us."