The Future of Deep Learning

In a presentation given by Andrew NG about this very topic, he talked about two trends in the deep learning community; scale and end to end deep learning.

For most people even mildly interested in deep learning, scale wouldn’t come as a surprise. Over the last 10 to 20 years we have acquired a lot more data. We have reached a point where nearly everything is somehow logged and stored in a database somewhere.Likewise, computing power has increased over the same period. One way of visualizing it is to look at graphics in games; 20 years ago, Doom 3d was all the rage, whereas now it’s actually hard to tell reality from CGI.

Much of the theory surrounding deep learning has been around for a long time, but there hasn’t been enough data, or powerful enough computers for it to be viable. In fact, until recently neural networks, and deep learning wasn’t thought be efficient, or practical, and the field didn’t have the same sexyness that it has today; largely because they weren’t practical at the time.

But an increase in computing power doesn’t account for why the neural networks has scaled as well, both horizontally (how many neurons in each layer), and vertically (how many layers).

It turns out there’s a correlation between how large a neural network is, and how well it scales. Small neural networks tend to only improve very little when more computing power is added, and the training continues for more epochs (1 epoch is one loop over all the training data). Whereas medium neural networks benefit more from more computing power.

This trend seems to continue no matter how large the neural networks become, and can be described by the following graph:

(Graph is only for illustrative purposes, and is not to scale)

This relationship has to do with how much information that can be encoded into a neural network. It turns out it’s actually not that much - which is why they tend to generalize on large amounts of data, but that’s another story.

Furthermore, there’s a loose relationship between how deep you want your neural network, and how wide it needs to be. As a general rule of thumb, you want to be able to draw a diagonal line from the top neurons to one of the output neurons.

Given the massive benefits of scale as evident by the numerous consumer products now using deep learning this trend is likely to continue in the near future.

The other major trend is end to end deep learning.

Traditionally, a machine learning model would have a simple output like a binary, for example, is the review positive or negative? Or perhaps with object recognition an integer; is this a dog, a cat, or a person?

With end to end deep learning you can output more complex things. For example, to go straight from an image to a string of text describing what’s in the image, or to go from audio straight to a text transcript. Traditionally, the audio was split into phonemes (basic units of sound).

Another place where end to end deep learning is useful is with machine translation; to go straight from one language to another.

However, end to end deep learning is not the solution to everything. For example, if you wanted to make a model that predicted a persons age from x-ray images of their hand bones, it’d be difficult using an E2E strategy; there’s simply not enough data.



The achilles heel of deep learning: You need a lot of labeled data.

While it may not be widespread yet, end to end deep learning is likely to become more widespread as we gather more labeled data.



Another place where end to end deep learning can potentially have a lot of impact is for self driving cars. The traditional approach takes an image as input, and locates the objects in the image, finds the trajectory, and, finally, finds in what way it should steer. The end to end approach takes an images as input, and outputs the steering.







However, as of yet there’s still not enough data to make end to end deep learning powered self driving cars a viable technique.



Cover photo by Shan Sheehan

