January 25, 2016

Deep Learning is Easy - Learn Something Harder

Caveat: This post is meant address people who are completely new to deep learning and are planning an entry into this field. The intention is to help them think critically about the complexity of the field, and to help them tell apart things that are trivial from things that are really hard. As I wrote and published this article, I realised it ended up overly provocative, and I'm not a good enough writer to write a thought provoking post without, well, provoking some people. So please read the article through this lens.

These days I come across many people who want to get into machine learning/AI, particularly deep learning. Some are asking me what the best way is to get started and learn. Clearly, at the speed things are evolving, there seems to be no time for a PhD. Universities are sometimes a bit behind the curve on applications, technology and infrastructure, so is a masters worth doing? A couple companies now offer residency programmes, extended internships, which supposedly allow you to kickstart a successful career in machine learning without a PhD. What your best option is depends largely on your circumstances, but also on what you want to achieve.

Some things are actually very easy

The general advice I increasingly find myself giving is this: deep learning is too easy. Pick something harder to learn, learning deep neural networks should not be the goal but a side effect.

Deep learning is powerful exactly because it makes hard things easy.

The reason deep learning made such a splash is the very fact that it allows us to phrase several previously impossible learning problems as empirical loss minimisation via gradient descent, a conceptually super simple thing. Deep networks deal with natural signals we previously had no easy ways of dealing with: images, video, human language, speech, sound. But almost whatever you do in deep learning, at the end of the day it becomes super simple: you combine a couple basic building blocks and ideas (convolution, pooling, recurrence), you can do it without overthinking it, if you have enough data the network will figure it out. Increasingly high-level, declarative frameworks like TensorFlow, Theano, Lasagne, Blocks, Keras, etc simplify this to the level of building Lego towers.

Pick something harder

This is not to say there are no genuinely novel ideas coming out of deep learning, or using deep learning in more innovative ways, far from it. Generative Adversarial Networks and Variational Autoencoders are brilliant examples that sparked new interest in probabilistic/generative modelling. Understanding why/how those work, and how to generalise/build on them is real hard - the deep learning bit is easy. Similarly, there is a lot of exciting research on understanding why and how these deep neural networks really work.

There is also a feeling in the field that low-hanging for deep learning is disappearing. Building deep neural networks for supervised learning - while still being improved - is now considered boring or solved by many (this is a bold statement and of course far from the truth). The next frontier, unsupervised learning will certainly benefit from the deep learning toolkit, but it also requires a very different kind of thinking, familiarity with information theory/probabilities/geometry. Insight into how to make these methods actually work are unlikely to come in the form of improvements to neural network architectures alone.

What I'm saying is that by learning deep learning, most people mean learning to use a relatively simple toolbox. But in six months time, many, many more people will have those skills. Don't spend time working on/learning about stuff that retrospectively turns out to be too easy. You might miss your chance to make a real impact with your work and differentiate your career in the long term. Think about what you really want to be able to learn, pick something harder, and then go work with people who can help you with that.

Back to basics

What are examples of harder things to learn? Consider what knowledge authors like Ian Goodfellow, Durk Kingma, etc have used when they came up with the algorithms mentioned before. Much of the relevant stuff that is now being rediscovered was actively researched in the early 2000's. Learn classic things like the EM algorithm, variational inference, unsupervised learning with linear Gaussian systems: PCA, factor analysis, Kalman filtering, slow feature analysis. I can also recommend Aapo Hyvarinen's work on ICA, pseudolikelihood. You should try to read (and understand) this seminal deep belief network paper.

Shortcut to the next frontiers

While deep learning is where most interesting breakthroughs happened recently, it's worth trying to bet on areas that might gain relevance going forward:

probabilistic programming and black-box probabilistic inference (with- or without deep neural networks). Take a look at Picture for example, or Josh Tenenbaum's work on inverse graphics networks. Or stuff at this NIPS workshop on black-box inference. To quote a friend of mine

probabilistic programming could do for Bayesian ML what Theano has done for neural networks

better/scaleable MCMC and variational inference methods, again, with or without the use of deep neural networks. There is a lot of recent work on things like this. Again, if we made MCMC as reliable as stochastic gradient descent now is for deep networks, that could mean a resurgence of more explicit Bayesian/probabilistic models and hierarchical graphical models, of which RBMs are just one example.

Have I seen this before?

Roughly the same thing happened around the data scientist buzzword some years ago. Initially, using Hadoop, Hive, etc were a big deal, and several early adopters made a very successful career out of - well - being early adopters. Early on, all you really needed to do was counting stuff on smallish distributed clusters, and you quickly accumulated tens of thousands of followers who worshipped you for being a big data pioneer.

What people did back then seemed magic at the time, but looking back from just a couple years it's trivial: lots of people use Hadoop and spark now, and tools like Amazon's Redshift made stuff even simpler. Back in the days, your startup could get funded on the premise that your team could use Hive, but unless you used it in some interesting way, that technological advantage evaporated very quickly. At the top of the hype cycle, there were data science internships, residential training programs, bootcamps, etc. By the time people graduated from these programs, these skills were rendered somewhat irrelevant and trivial. What is happening now with deep learning looks very similar.

In summary, if you are about to get into deep learning, just think about what that means, and try to be more specific. Think about how many other people are in your position right now, and how are you going to make sure the things you learn aren't the ones that will appear super-boring in a year's time.

Summary

The research field of deep learning touches on a lot of interesting, very complex topics from machine learning, statistics, optimisation, geometry and so on. The slice of deep learning most people are likely to come across - the lego block building aspect - however is relatively simple and straightforward. If you are completely new to the field, it is important to see beyond this simple surface, and pick some of the harder concepts to master.