I’ve just wrapped up my trip to NIPS 2015 in Montreal and thought I’d jot down a few things that struck me this year:

Saddle Points vs Local Minima I heard this point repeated in a talk almost every day. In low-dimensional spaces (i.e. the ones we can visualize) local minima are the major impediment to optimizers reaching the global minimum. But this doesn’t generalize. In high-dimensional spaces, local minima are almost non-existent. Instead, there are saddle points: points which are a minimum in some directions but a maximum in others. Intuitively, this makes sense: in N dimensions, the odds of the curvatures all going the same way at a point is (1/2)^N. As Yoshua Bengio said, “it’s hard to build an n-dimensional wall.” This gives an intuition for why procedures like gradient descent are effective at optimizing the thousands of weights in a neural net: they won’t get stuck in a local optimum. And it gives an intuition for why momentum is helpful: it helps gradient descent escape from saddle points.

Model Compression The tutorial on Hardware for Deep Learning was less about new hardware and more about how to make your software get the most out of existing hardware. Due to the high cost of uncached, off-chip memory reads, reducing the memory footprint of your models can be a huge performance win. Bill Dally presented a result on model pruning that I found interesting: by iteratively removing small weights from a model and retraining, they were able to remove 90+% of the weights with zero loss of precision. This parallels an observation from transfer learning, that small networks are most effectively trained using the output of larger networks. It would be nice if we could train these smaller networks directly. See the Deep Compression paper.

The importance of canonical data sets / problems Over and over, talks and posters referenced the same canonical data sets: the MNIST set of handwritten digits, the CIFAR and ImageNet images, the TIMIT speech corpus and the Atari/Arcade Learning Environment (ALE). These have given researchers in their fields a shared problem on which to experiment, compete, collaborate and measure their progress. If you want to push a field forward, built a good challenge problem.

One-shot Learning There was much high-level talk about how the human brain is very good at learning to perform new tasks quickly. Contrast this with neural nets, which require thousands or millions of training examples to reach human performance. This comparison is somewhat unfair because adult humans have years of experience interacting with the real world from which to draw on. There seems to be a great deal of interest in getting machines to do a better job of transferring general knowledge to specific tasks.