In a ceremony on Saturday, University of Montreal Professor and Head of the Montreal Institute for Learning Algorithms (MILA) Dr. Yoshua Bengio was honoured as a 2018 ACM Turing Award Laureate. Dr. Bengio shared the “Nobel Prize of computing” with Dr. Geoffrey Hinton from Google and Dr. Yann LeCun from Facebook. Recognized as one of the pioneers in deep learning, Dr. Bengio has made tremendous contributions in probabilistic models of sequences, high-dimensional word embeddings and attention and generative adversarial networks (GANs).

A 55-year-old Canadian scientist with grey hair and a close-cropped beard, Dr. Bengio’s recent research focus is on building human-level artificial intelligence systems. He is exploring algorithms that can learn better data representations by not only extracting pattern recognition but also discovering more complex relationships and high-level concepts. Dr. Bengio has also repeatedly expressed his concerns on the potential abuse of machine learning and on model bias.

On the occasion of 2019 International Conference on Machine Learning last week in Long Beach, California, Synced spoke with Dr. Bengio on the Turing Award, ICML, global AI trends, etc.

You, Dr. Geoffrey Hinton and Dr. Yann LeCun winning the Turing Award is a big deal and exciting news for the AI community. What was the first thing you did when you heard the news?

It’s funny that you ask, because the first thing they (ACM) told me on the phone was you’re not supposed to tell anyone! (laughs) Of course my first impulse was to tell a lot of people. They said I could talk to my family a little bit. It was hard to inhibit myself from talking about it until three weeks later when it was announced publicly.

After I got the news, it’s kind of amazing, right? But in a way it’s a very short-lived feeling. It’s like you win the lottery, but then one day later, you’re still the same person.

The next thing that came to me, emotionally speaking, is, well, this prize really shouldn’t be just a prize for three researchers. There’s a lot of people that have made this possible starting with our collaborators, our students, and in fact, a whole research community that has created all of the science around deep learning. I feel like this prize is a prize for that community.

In the past, a lot of Turing Awards have been given to very theoretical contributions to computer science, but deep learning is quite different. It’s something that’s changing the world. Of course there’s theory, but a lot of it is very intuitive and concrete. For many years, the branch of research was not considered as valuable as a theoretical computer science. So the fact that we’re getting this award is sending a signal that machine learning, AI, and in particular deep learning, is now established as an important contribution to science not just by companies but also by the academic community.

(left to right) Yoshua Bengio, Susan Dumais, Eric Horvitz, Geoffrey Hinton, at the ACM Award banquet in San Francisco on June 15. Courtesy Yann LeCun.

What are some of the most exciting or innovative machine learning trends you have seen in 2019?

I see a move from passive machine learning, where the learner gets a big dataset and trains; to active machine learning, where the learner interacts with its environment. It’s not just reinforcement learning. It is active learning, things like dialogue systems where the interaction allows the learner to improve and to seek information.

That’s very different from classical machine learning. It is connecting to game theory, to new questions that we’re not used to like multi-agents, to questions of causality and lots of exciting directions. The whole research field is not stuck in a minimum, but instead many interesting doors are opening up for research and for companies.

Many machine learning talents from universities are going to industry. How is this talent migration affecting fundamental research?

I would say it was even worse a couple of years ago, because a large proportion of the faculty teaching machine learning had been snapped up by companies. So right now, there are very few senior researchers of my generation that are still working for universities and doing deep learning research. Even among the generation of my graduate students twenty years ago or ten years ago, there are not that many people.

The good news is in the last three or four years there has been an influx of graduate students from all around the world at MILA and many universities around the world, studying machine learning and deep learning and making the science advance. But now these people are on the market, so I think we’re starting to see an influx of new faculty, young faculty that are coming to teach deep learning. As an example in Montreal, at MILA, we had about five professors six years ago, and today we have twenty regular professors. Just this academic year we were trying to fill almost ten faculty positions between the different universities that are affiliated with me so there’s a lot of new professors entering the universities. I think it’s still an issue that we lost a lot of people this way, but it’s getting better.

The other thing that’s helping is we have been able to make deals with companies so that some professors will be part time in industry and part time in academia. You know, the proportion varies from extreme, like eighty-

twenty to twenty-eighty or fifty-fifty. The advantage of this kind of formula is that those professors can continue supervising graduate students. So they will probably have less teaching or no teaching at all, but they will still help training the next generation, of course usually have a better salary as well. So everybody is kind of happy. At least we maintain the ability to train the next generation, which is the most important thing.

One of the ICML best papers this year Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations reflected some interesting aspects of disentangling, which is a research topic you have been working on for a while. What do you think of this paper?

A: I know this paper. I didn’t read it yet, but I think it may be a misunderstanding of what disentangling should be. A lot of people right now understand disentangling top level variables independent to each other, but that’s only a very rough approximation. Think about when you use languages, they are not independent because you can form connections between each word when you make a sentence. The whole series of papers trying to discover these independent factors, I don’t think is the right thing to do. That’s my opinion.

What do you think of recent media reports that training deep learning algorithms is contributing to climate change?

This is almost fake news. Some of my colleagues are about to write a rebuttal of this thing. For example, these models are trained by Google. Google has set up a system to make their energy consumption from GPU and so on carbon neutral. For all the electricity they’re using for their computers, they buy electricity from carbon negative sources like Hydro to put that same amount of energy in the grid. And so in a way there are canceling the potential carbon-producing energy that might have come through the consumption. One issue to keep in mind: responsible companies today that are burning a lot of energy and computation are being responsible about it.

Talking to a few people, the actual amount of energy that is being consumed in the kind of research we do — I think the numbers have to be checked carefully — is not that much energy. The reason is it depends how you do the calculation. It depends on the size of the model. Some of the models are so big that even in MILA we can’t run them because we don’t have the infrastructure for that. Only a few companies can run these very big models they’re talking about.

But the good part of it is maybe it’s going to help companies become more self-conscious about climate change effects of their energy consumption due to computing. Companies are building chips that are going to be maybe ten times or a hundred times more energy efficient. That was a good aspect, but I think it was a bit exaggerated.

MILA recently announced a partnership with Chinese ride-share giant Didi Chuxing Technology. Could you specify what sort of research will be conducted between DiDi and MILA?

In MILA, we’re interested in building systems that can in a way understand their environment. So build a model of it and capture the underlying explanations for what is being observed.

One thing that I’m personally interested in understanding is the causal relationships between variables. This is actually something very much missing currently in machine learning, but it’s very important for industries, because in industry, you’re not just interested in capturing the correlations between variables, you’re interested in taking decisions and in those decisions what you want to know is what’s going to be the effect of such and such actions.

Let me give you an example. The fact that it rains is correlated with the fact that I opened up my umbrella, but I cannot make the rain happen by opening up my umbrella. However, if it starts raining, it will have an effect on me: opening up the umbrella.

Also part of the mission of MILA is to produce research that can have a positive impact on in the world. In the past, we’ve been involved in healthcare applications. More recently, we started working on environment-related issues like climate change. We’re also working for example, with companies that are using machine learning for education. I think there are a lot of discussions that need to happen between governments, companies, researchers, scholars of humanities, philosophers and people, to decide together how we want to handle the new power that AI is giving us.

Canada and the United States are two different countries, but it is still amazing to see MILA partner with a Chinese technology company under the current circumstances. Can you comment on that?

I feel that what is currently happening between the US and China, and with many Western countries and China, is very unfortunate. It’s not going to help to solve the problems that need to be solved at the international level to create a better planet. We are at a time, I think, in the history of humanity, where more than ever we need international coordination that’s strong and collaborative. Otherwise, we cannot solve problems like climate change. We cannot even properly deal with issues brought by AI — like one issue I am very concerned about is the use of machine learning in killer drones. So all of these things need international treaties. International cooperation is very strong.

When you have the kind of aggressive clustering and political tension that is happening this year. It just goes in the wrong direction. It’s really not going to help. At MILA we’re making a bet that things will go in a better direction in the future. But also I think these kinds of moves can be a signal to governments, I hope. It’s only one small gesture, but if enough organizations follow this kind of example, then it might have an impact.