Today we continue the NeuroNuggets series with a new installment. This is the first time when a post written by one of our deep learning researchers was so long that we had to break it up into two parts. In the first part, we discussed the notion of a computational graph and what functionality should a deep learning framework have; we found out that they are basically automated differentiation libraries and understood the distinction between static and dynamic computational graphs. Today, meet again Oktai Tatanov, our junior researcher in St. Petersburg, who will be presenting a brief survey of different deep learning frameworks, highlighting their differences and explaining our choice:

Comparative popularity

Last time, we finished with this graph published by the famous deep learning researcher Andrej Karpathy; it shows comparative popularity of deep learning frameworks in the academic community (mentions in research papers):

Unique mentions of deep learning frameworks in arXiv papers (full text) over time, based on 43K ML papers over last 6 years. Source

We see that the top 4 general-purpose deep learning frameworks right now are TensorFlow, Caffe, Keras, and PyTorch. Today, we will discuss the similarities and differences between them and help you make the right choice of a framework.

Tensorflow

TensorFlow is probably the most famous deep learning framework; it is being developed and maintained by Google. It is written in C++/Python and provides Python, Java, Go and JavaScript API. TensorFlow uses static computational graphs, although a recently released TensorFlow Fold library has added support for dynamic graphs as well. Also, since version 1.7 TensorFlow took a different step towards dynamic execution and implemented eager execution that can evaluate Python code immediately, without building graphs.

At present, TensorFlow has gathered the largest deep learning community around it, so there are a lot of videos, online courses, tutorials, and so on. It offers support for running models on multiple GPUs and can even split a single computational graph over multiple machines in a computational cluster.

Apart from purely computational features, TensorFlow provides an awesome extension called TensorBoard that can visualize the computational graph, plot quantitative metrics about the execution of model training or inference, and basically provide all sorts of information necessary to debug and fine-tune a deep neural network in an easier way.

Plenty of data scientists consider TensorFlow to be the primary software tool of deep learning, but there are also some problems. Despite the big community, learning is still difficult for beginners, and many experts agree that other mainstream frameworks are faster than TensorFlow.

As an example of implementing а simple neural network, look at the following:

It’s not so elementary for beginners, but it shows the main concepts in TensorFlow, so let us try to focus on the code structure only first. We begin by defining the computational graph: placeholders, variables, operations (maximum, matmul) and the loss function at the end. Then we assign an optimizer that defines what and how we want to optimize. And finally, we train our graph over and over in a special execution environment called a session.

Unfortunately, if you want to improve the network’s architecture with conditionals or loops (this is especially useful, even essential for recurrent neural networks), you cannot simply use python keywords. As you already know, a static graph is constructed and compiled once, so to add nodes to the graph you should use special control flow or higher order operations.

For instance, to add a simple conditional to our previous example, we have to modify the previous code like this:

Caffe

The Caffe library was originally developed at UC Berkeley; it was written in C++ with a Python interface. An important distinctive feature of Caffe is that one can train and deploy models without writing any code! To define a model, you just edit configuration files or use pre-trained models from the Caffe Model Zoo, where you can find most established state-of-the-art architectures. Then, to train a model you just run a simple script. Easy!

To show how it works (at least approximately), check out the following code:

We define the neural network as a set of blocks that correspond to layers. At first, we see a data layer where we specify the input shape, then two fully connected layers with ReLU activations. At the end, we have a softmax layer where we get the probability for every class in the data, e.g., 10 classes for the MNIST dataset of handwritten digits.

In reality, Caffe is rarely used for research but is quite often used in production. However, its popularity is waning because there is a new great alternative, Caffe2 which we will touch upon a little when we talk about PyTorch.

Keras

Keras is a high-level neural network library written in Python by Francois Chollet, currently a member of the Google Brain team. It works as a wrapper over one of the low-level libraries such as TensorFlow, Microsoft Cognitive Toolkit, Theano or MXNet. Actually, for quite some time Keras has been shipped as a part of TensorFlow.

Keras is pretty simple, easy to learn and to use. Thanks to brilliant documentation, its community is big and very active, so beginners in deep learning like it. If you do not plan to do complicated research and develop new extravagant neural networks that Keras might not cover, then we heartily advise to consider Keras as your primary tool.

However, you should understand that Keras is being developed with an eye towards fast prototyping. It is not flexible enough for complicated models, and sometimes error messages are not easy to debug. We implemented on Keras the same neural network which we did on TensorFlow. Look:

What immediately jumps out in this example is that our code has been reduced a lot! No placeholders, no sessions, we only write concise informative constructions, but, of course, we lose some extensibility due to extra layers of abstraction.

PyTorch

PyTorch was released by Facebook’s artificial-intelligence research group for Python, based on Torch (previous Facebook’s framework for Lua). It is the main representative of dynamic graph.

PyTorch is pythonic and very developer-friendly. The memory usage in PyTorch is extremely efficient for any neural networks. It is also said to be a bit faster than TensorFlow.

It has a responsive forum where you can ask any question and extensive documentation with a lot of official tutorials and examples, however, the community is still quite smaller as opposed to TensorFlow. Sometimes you can’t find implementation of contemporary model on PyTorch, but easy to see two or three on TensorFlow. Anyway, this framework is considered as a best choice to research.

Quite surprisingly, since May of 2018, PyTorch project was merged with Caffe2, successor of Caffe which actively developed by Facebook for production exactly. It means for supporters these frameworks that bottleneck between researchers and developers will be vanished.

Now look at this code below that shows simple way to “touch” PyTorch:

Here we initialize randomly our trial data and target, then assign model and optimizer. The last block executes training: every time calculates answer from model and change weights with SGD. It looks like Keras: easy read, but we don’t lost ability to write complicated neural networks.

Thanks for dynamic graph, PyTorch are integrated in Python more than TensorFlow. So you can write conditionals and loops like as in ordinary python program.

You can see it when try to realize, for example, simple recurrent block that we represent as hi=hi-1·xi:

The Neuromation choice

Our Research Lab at St. Petersburg mostly prefers PyTorch. For instance, we have used it for computer vision models that we applied to price tag segmentation. Here is a sample result:

But sometimes, especially in cases when PyTorch does not have a ready solution for something yet, we create our models in TensorFlow. The main idea of Neuromation idea is to train neural networks on synthetic data. We are convinced that a great result on real data can be obtained with transfer learning from perfectly labeled synthetic datasets. Have a look at some of our results for the segmentation of retail items based on synthetic data:

Conclusion

There are several deep learning frameworks, and we could go into a lot more detail about which to prefer. But, of course, frameworks are just tools to help you develop neural networks, and while the differences are important they are, of course, secondary. The primary tool in developing modern machine learning solutions is the neural network in your brain: the more you know, the more you think about machine learning solutions from different angles, the better you get. Knowing several deep learning frameworks can also help broaden your horizons, especially when the top contenders are as different as Theano and PyTorch. So it pays to learn them all even if your primary tool has already been chosen for you (e.g., your team uses a specific library). Good luck with your networks!

Oktai Tatanov

Junior Researcher, Neuromation

Sergey Nikolenko

Chief Research Officer, Neuromation