Summary: Performance comparison for the popular Deep Learning frameworks supported by Keras – TensorFlow, CNTK, MXNet and Theano

If there are any doubts in regards to the popularity of Keras among the Data Scientist/Engineer community and the mindshare it commands, you just need to look at the support it has been receiving from all major AI and Cloud players. Currently the official Keras release already supports Google's TensorFlow and Microsoft's CNTK deep learning libraries besides supporting other popular libraries like Theano. Last year Amazon Web Services announced its support for Apache MXNet, another powerful Deep Learning library and few weeks ago support for Keras was added to the MXNet's next release candidate. As of now MXNet only seems to support Keras v1.2.2 and not the current Keras release 2.0.5.

Although it is possible to deploy Keras models in production with any of the supported backends, developers and solution architects should keep in mind that Keras, by nature of being a high-level API for the different DL frameworks, doesn't yet support tweaking of all underlying parameters offered by the individual libraries. So in use cases where you would want to fine tune all parameters offered by the backend frameworks, you might be better off using one of the deep learning frameworks directly instead of using Keras as the top layer. This of course might change in future as additional functionalities are added to both Keras and the backend libraries. But having said that, Keras still remains an excellent tool that can fit into earlier stages of most deep learning development projects as it enables data scientists and engineers to quickly build and test complex deep learning models.

Keras also enables developers to quickly test relative performance across multiple supported deep learning frameworks. A single parameter in Keras configuration file dictates what deep learning framework would be used as the backend. So, you can build a single model and without changing any code at all, you can run it on TensorFlow, CNTK and Theano. As for MXNet, since it only supports Keras ver1.2.2 right now, some minimal changes in code are required but this might change soon. These individual frameworks can obviously be fine-tuned further using features present in the individual libraries but Keras still provides an excellent opportunity to compare the base performance between these libraries.

There have already been a few articles comparing the relative performance of backends supported by Keras but with each new release for Keras or the individual deep learning libraries, we are seeing drastic improvements in performance.

So let’s see how did the recent releases for the different deep learning frameworks performed in the recent matchup.

Let’s first go over the setup used for the tests.

All performance tests were executed on Azure NC6 VM with Nvidia Tesla K80 GPU. The VM image used was Azure DSVM (Data Science Virtual Machine) on Ubuntu. The image comes pre-installed with Keras, TensorFlow, Theano and MXNet besides other data science tools. For tests, all packages were updated to recent releases. To use MXNet, older Keras package 1.2.2 was used. Additional details on Azure DSVM

Configuration:

Due to dependencies for each framework, I had to run the tests in three configurations, as shown below:

DL Framework Software Configuration VM Configuration TensorFlow

CNTK Keras: Version 2.0.8 TensorFlow: Version 1.3.0 CNTK: Version 2.1 NVIDIA-CUDA Driver: v8.0.61 CUDNN: v6.0.21 Azure NC6 VM

GPU - Nvidia Tesla K80

vCPU - 6

Memory - 56GB

HDD - 380 GB

MXNet Keras: Version 1.2.2 MXNet: Version 0.11.0 NVIDIA-CUDA Driver: v8.0.61 CUDNN: v6.0.21 Theano Keras: Version 2.0.8 Theano: Version 0.9.0 NVIDIA-CUDA Driver: v8.0.61 CUDNN: v5.1.10

For all frameworks latest available stable releases were used for testing. All frameworks have their next beta versions available which claim to improve performance and are probably good to use for research purpose, but for production applications, preference is to use the stable releases. Hence these beta versions are not included in the performance tests.

Performance Tests:

To compare the performance of the DL frameworks, I used 5 different test models described below. To ensure no particular framework got any special treatment, all models were sourced from Keras/examples repository maintained on GitHub.

The test code/notebooks are available in my GitHub repo - https://github.com/jasmeetsb/deep-learning-keras-projects

Note: In two of the tests, MXNet was left out. This is because MXNet doesn’t yet support newer Keras functions and scripts would have needed significant changes before running on MXNet. This would have defeated the purpose of this exercise. Even the 3 tests that were run on MXNet needed some minimal changes to the script, mostly due to renaming of some of the Keras functions in their recent releases.

1. Test - CIFAR10 CNN

Learning Model Type: Convolutional Neural Network (CNN)

Datasets/Tasks: CIFAR10 small images dataset

Objective: Classify images into 10 classes

In terms of training speed per epoch, TensorFlow wins by a slight margin over MXNet.

In terms of accuracy/convergence speed CNTK seems to have a slight edge till 25th iteration but by 50th iteration all frameworks display similar accuracy.

2. Test - MNIST CNN

Learning Model Type: CNN

Datasets/Tasks: MNIST handwritten digit dataset

Description: Classify images into 10 classes/digits

In this test TensorFlow is a clear winner in terms of training speed but in terms of accuracy/convergence speed, all frameworks showcase similar characteristics.

33. Test - MNIST MLP

Learning Model Type: Multilayer Perceptron/Deep NN

Datasets/Tasks: MNIST handwritten digit dataset

Objective: Classify images into 10 classes/digits

In a standard Deep neural network test using MNIST dataset, CNTK, TensorFlow and Theano achieve similar scores (2.5 – 2.7 s/epoch) but MXNet blows it out of the water with 1.4s/epoch timing. MXNet also showcases a slight edge in regards to accuracy/convergence speed.

4.

4. Test - MNIST RNN

Learning Model Type: Hierarchical Recurrent Neural Network (HRNN)

Datasets/Tasks: MNIST handwritten digit dataset

Objective: Classify images into 10 classes/digits

CNTK and MXNet have similar performance (162 – 164 s/epoch) in terms of training speed followed by TensorFlow at 179s/epoch. Theano seems to lag significantly for RNN models.

5. Test - BABI RNN

Learning Model Type: Recurrent Neural Network (RNN)

Datasets/Tasks: bAbi Project (https://research.fb.com/downloads/babi/)

Objective: Train two recurrent neural networks based upon a story and a question. The resulting merged vector is then queried to answer a range of bAbi tasks.

Results: MXNet was left out as the sample script from Keras repo required changes. TensorFlow and Theano has comparable performance with CNTK being 50% faster at 9.5s/epoch.

Summary:

TensorFlow performed the best in both the CNN tests but lagged behind in the RNN test cases.

CNTK performed significantly better than TensorFlow and Theano in the Babi RNN and MNIST RNN tests but was slower than TensorFlow in CNN Test Cases.

MXNet seemed to perform slightly better than CNTK and TensorFlow in RNN test and significantly better than all frameworks in the MLP test but the fact that it does not support Keras v2 functions makes it difficult to do a straight up comparison without modifying the test code. This would hopefully be resolved soon.

Theano had a marginal edge over TensorFlow and CNTK in the deep Neural Network (MLP) test.

Conclusion: As seen in the results above, all frameworks have their areas of strength but at this moment there is no single framework that outperforms all others across the board. CNTK shines in RNN use cases, TensorFlow in CNN and MXNet although showing very promising performance still has some ground to cover to support all Keras functionality. As is often the case in open source world, all these frameworks are constantly being enhanced giving them better performance and making them easier to use and deploy in production. When considering these deep learning frameworks for production, performance is a prime consideration but in most cases, you also need to consider the ease of deployment and other auxiliary tools that come as part of these tools which help you manage your production machine learning models. That discussion will probably require a post on its own but I hope my analysis above at least provides you with some additional insights.