In 1983 the newly established Canadian Institute for Advanced Research (CIFAR) launched their “Artificial Intelligence, Robotics & Society” project, one of the world’s first broad AI initiatives. Fresh in from CMU, University of Toronto Professor Geoffrey Hinton joined the project a few years later. Today, Hinton is a Turing Award laureate globally recognized as the “Godfather of Deep Learning.” He is also Chief Scientific Advisor at the Vector Institute for Artificial Intelligence in Toronto. Like Hinton, Canadian AI research has come a long way.

In 2017 the Canadian government and CIFAR announced the Pan-Canadian AI Strategy. The Vector, along with Amii (Alberta Machine Intelligence Institute) in Edmonton and Mila (Montréal Institute for Learning Algorithms) in Montréal, are core institutions in the initiative, which aims to attract and retain the world’s leading AI researchers.

At the NeurIPS 2019 conference in Vancouver last week CIFAR introduced its new AI Chairs and a new grant program, and hosted the panel discussion Foundations in Machine Learning, where researchers from Mila, Amii and the Vector brainstormed on their work, looked at machine learning trends and offered advice to students.

Panelists:

Christian Gagné, Mila, also Full Professor at Université Laval

Pascal Germain, Mila, also Assistant Professor at Université Laval

Chris Maddison, Vector Institute, also Assistant Professor at University of Toronto

Courtney Paquette, Mila, also Assistant Professor at McGill University

Gennady Pekhimenko, Vector Institute, also Assistant Professor at University of Toronto

Siamak Ravenbakhsh, Mila, also Assistant Professor at McGill University

Nathan Sturtevant, Amii, also Professor at University of Alberta

Moderator: Simon Lacoste-Julien, Mila, Associate Professor

Synced has edited the panel discussion for brevity and clarity and added links to research papers of interest.

Q: Why is Canada, and your AI institute in particular, the right place for you to be pursuing your research?

Nathan Sturtevant: I’m a recruitment chair coming from the University of Denver and the University of Denver was interested in building an AI program. But if you look there at what they could build over several years time versus what was at the University of Alberta. University of Alberta was starting three tiers above what would be possible there. And so the quality of my colleagues and the quality of the students to work with was just an amazing opportunity. And having experienced both the US funding and the Canadian funding system, I have to say I much prefer the Canadian funding system.

Chris Maddison: I’d like to follow up on that. This is related to the funding system, but also the culture of the Canadian research environment. I think it tends to support curiosity-driven research or high-risk, high-reward research. And I don’t think it’s any accident that that two of the major subfields of artificial intelligence that are today generating so much buzz, deep learning and reinforcement learning, had two or three of their strongholds in Canada, that sort of sheltered those fields while maybe they were not the hot fields of machine learning. And I think that’s happening with work that’s going on today and that sort of kindness and willingness to take risks is something that Canada should be proud of.

Christian Gagné: I think that the environment is interesting, the sense that you have access to these nice group in Montreal and we will develop collaboration. But at the same time, in Laval, we have also a kind of tradition of multidisciplinary collaboration, having interesting interactions with people from other fields. We tend to have a nice dialogue and I think something such as feeding ideas I think is really interesting, it helps students connect with other domains.

(left to right) Courtney Paquette, Nathan Sturtevant, and Gennady Pekhimenko

Q: Is there any interesting research in your field that you found particularly exciting in the last year, or perhaps will be talked about at NeurIPS, that you would like to share?

Siamak Ravenbakhsh: There is a paper that I’m excited about from Max Welling’s group on gauge equivariant neural networks. The idea was so far in models that we could build with the generalized convolution to other domains you needed to define this sort of global symmetry — look at the graph, look at the set, look at their symmetries. Now there’s this new paper that says you don’t need this global symmetry, you can apply the same sort of ideas on general manifolds and builds a theoretical foundation to do that.

Note from Synced:

Gauge Equivariant Convolutional Networks and the Icosahedral CNN (arXiv)

Chris Maddison: This is maybe not new work in the sense of a new idea, but retrospective work that tries to carefully evaluate the current situation. And I’m thinking in particular of some recent empirical work coming out of Google Brain led by Christopher Shallue and George E. Dahl on trying to evaluate the effect of batch size force for very classical optimization algorithms in deep learning. And actually, as Roger B Grosse pointed out, the sort of surprising thing about that work was that there was no surprise. They sort of carefully clarified some confusions in the literature and laid out a very compelling empirical picture of what’s really happening on the ground.

The current culture in machine learning is not quite as supportive as I would like of that kind of work. And I think it’s really important because it can give guidance to theorists about what kind of questions they should be asking, what kind of things we might expect when we do theory. And that’s really important, as we’re developing simpler models for which we have theoretical traction.

Note from Synced:

Measuring the effects of data parallelism on neural network training (arXiv)

Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model (arXiv)

Gennady Pekhimenko: I want to talk more about big research directions. So if you look at the list of sponsors of NeurIPS, you’re going to see the usual suspects, but you’re also going to see a lot of new companies that a lot of you probably not heard about, such as Graphcore, Cerebras, and many others. They are all now here because they are trying to make their names known. They’re all building machine learning hardware and they’ve got tons of funds for that, and there is a lot of competition.

You can ask why is it important for us as researchers? Well, it’s important with respect to how we are going actually evaluate and properly compare which one actually is worth versus which one not worth it. Because there’s a lot of media claims about one hardware being 10 x 100 X faster than others, and you really need to be able to fairly compare what’s better and what’s actually performing better for your workload. With that in mind, there was a consortium created called “Demo Earth” that has both academia participants and industry participants and already had several releases for training and inference where you can actually have results that you can compare for your models as well and see numbers from all companies signed on. And that includes the startups I mentioned and big companies like Google and NVIDIA.

Nathan Sturtevant: NeurIPS isn’t a conference that I typically come to because I like to go to broader AI venues. But the field of search is something that is over 50 years old and it’s very easy to say, all the questions have been answered, but in the last four to five years there’s been fundamental changes in our understanding of algorithms and the way things work in that field. And so to this audience, particularly, I would say encourage you if you’re thinking about search to make sure that you’re looking at the state of the art.

Pascal Germain: Maybe I will put in an interesting line of work that I think was started by Mikhail Belkin in a paper that appeared on arXiv one year ago approximately. What is known now as I think “double descent curves.” The common wisdom was that we should not use models with too much power to avoid overfitting. And he empirically and theoretically showed that under some circumstances such models can in fact generalize better. And it’s really fun because we can see the same behavior appearing we were observing in the usual network. So from a statistical learning perspective, we can study this.

Note from Synced:

Reconciling modern machine learning practice and the bias-variance trade-off (arXiv)

Courtney Paquette: Maybe I should start off with what I’m not excited about it: In optimization there’s this push to get the best algorithm out there without really understanding what the algorithm is actually doing. This is something I want the ML community and optimization in the ML community to move away from, to move towards more understanding of what is actually happening instead of trying to produce the latest, greatest algorithm. If we understand what’s going on, then we can help design better algorithms.

Q: A lot of you do optimization research. I get a lot of deep learning students saying ‘I just use Adam. What’s the use of optimization, especially when what we care about is not optimization but actually generalization errors?’ What are you thoughts on this?

Courtney Paquette: Yeah, I think it’s interesting. So I go to Mila and I hear this also, that students are don’t care and just use Adam. Again, they don’t really understand or we don’t really understand why these algorithms are working. So they’re working for a very specific reason. And optimization is not just about producing a complexity result or a training error, but actually understanding why the algorithm is doing what it’s doing. There’s usually something else involved — the landscape plays a role or the objective function plays a role. We forget this. But it’s totally fine to change your loss function to make it easier to optimize. That’s a completely illegitimate thing. I think that there is a lot more to optimization and that machine learning sometimes glosses over it because there’s a lot of other things that we want students to know, so we don’t really have time to go into the theory side of optimization which has been around for quite a while now.

Chris Maddison: I just want to add sort of a meta point on top of Courtney’s point, which is that it’s very hard to know ahead of time whether or not you can make progress. And that goes both ways. You can try to prejudge ideas and decide ‘I don’t see how this is going to work’, but of course that was true before AdaGrad and Adam were invented and obviously they’ve had a huge impact on our field. So there’s significant uncertainty over the future. And in the face of that, what you should be doing is working on what you’re interested in, not what you can tell today is going to work.

Gennady Pekhimenko: In addition, all these moving targets, that becomes a nightmare. So we go back and forth around how to make things fair and at the same time, not limiting innovation. And the best thing we’ve come up with so far was the ability to steal the hyperparameters from others. So essentially, when the submission happens, everyone gets their best results. But then it turns out to some company X, just innovated on algorithm level but it doesn’t necessarily mean better hardware or software stack. Then the others can look and reuse it in their own submissions to make it more fair. That’s the best strategy we’ve come up with. Overall, for us, the fewer algorithms the better.

(left to right) Gennady Pekhimenko, Chris Maddison, and Christian Gagné

Q: One of the points of the Cifar AI chair program is to train the next generation of talents. What would be the best advice you would have for students interested in your area of machine learning? What can grad students be doing now to prepare for future careers in academy and industry that you would like to share?

Gennady Pekhimenko:I think we are in academia to take risks. So what I encourage in my students is to not be afraid to take on aggressive directions in research. For example, one student that started with me about a year ago came to me and said, ‘okay, I think back propagation is just the wrong strategy to do optimization. It’s fundamentally bad…’ And he came up with a very nice idea on the theoretical end, but the problem was that the whole central NVIDIA stack was fully not ready for that. So if for the trials you need to be ready for months and months of engineering and optimization, and eventually we were able to get something interesting out of it, beating the state of the art baseline with back propagation. But you need to have courage and convince the student they should be brave enough into risky directions with high reward and high promise.

Courtney Paquette: Advice I would give to a graduate student is actually to take a class or get some experience in communication. In sciences, we don’t actually teach people how to create slides, or give a talk, or do a poster. It’s sort of ad hoc. In other fields like business for instance this is what they do and they know how to explain it. I think it’s extremely important now, especially in AI, to be able to communicate results to other people. We should be trying to make things understandable to the majority of the people out there. And so my advice for a student is to take a class in communication and learn some of those skills.

Christian Gagné: I would say sometimes my students, when they are starting a PhD they maybe look at the trends and try to follow the trends. I always advise them to not go into things that are too trendy because they will be lost in the crowd and will be really difficult to do something that is meaningful because you are in competition with big teams and many people. What I’m suggesting to my students to look at different things, to be open, to be curious about the different domains and maybe try to make connections and figure out maybe new topics or topics that are on the edge of the current trending topics but that can be interesting and original in terms of proposing something new that is relevant, but still not in the spot of what everyone is trying to do right now.

Nathan Sturtevant: To add to the last two, I would advise students to be aware of the history of AI, what got us to where we are today, what are the broader questions that are being asked and answered? And they can do two things, it gives you perspective, but it also can help you understand what are the ways that people are thinking and maybe there’s different questions we should be asking that are not being asked. And that’s a place where you can make strong inroads or even open up new areas. Understanding what’s being done before means that you do a better job of not repeating things, understanding what what mistakes were and how we can improve on them in the future.

(left to right) Siamak Ravenbakhsh and Courtney Paquette

Q: You all have all fairly broad backgrounds. Do you have examples of tasks or applications in machine learning that we didn’t really think about in the past?

Siamak Ravenbakhsh: Applications of machine learning in astronomy and cosmology is one that I’ve been involved with in some projects. It comes with its own challenges, for example you get huge datasets. But these are not new tasks, or new settings in which you define new problems, it’s more like new applications.

Chris Maddison: I’m not sure this has been unrecognized or so novel that no one’s seen it, but I think there’s a big source of untapped data that some people are looking at, but we could look at much more, which is the data that we produce ourselves. So our source code, stack traces of algorithms that are running, all of this data occupies a very small part of all possible programs or all possible stack traces, and so it represents something about what humans care about in the world, and it has specific distributions, and we generate tons of this data. So doing machine learning to inform perhaps software engineering or basic algorithms, I think is a rich area that is is really blowing up right now.

Christian Gagné:A field we are looking at it, but maybe not as closely as can be, is all the systems that involve humans in the loop. You know, we have a lot of data about people who are clicking, but maybe if we look more at the cases where we have an interaction between a system and people, and customize systems that are collaborating with humans and get to specific behaviors that fit the person that is using the system. That’s something I think we need to look at closely.

Gennady Pekhimenko: I’m personally interested in the privacy aspects and the ability to do essentially training at the edge. So this is something that five or ten years ago was unimaginable because the compute power was not possible. And right now people are actively talking about it, trying to come up with creative protocols on federated learning or incremental learning at the edge. I think that’s an interesting direction and application that’s going to have huge potential.