So you are exploring the intricate world of RNNs and their applications for NLP or predicting stock values when you see the training times some of these things require (even with a GPU). Here I will show you a little trick to improve training times without (much) changing your initial model or setup.

LSTMs

So as always lets start with the beginning, if you are using RNN you will 99% of the time be using LSTM or GRU layers (we will use LSTMs as examples here but it is perfectly transferable).

A visual representation of what an LSTM looks like.

Implementing these in keras is relatively trivial:

model = keras.models.Sequential()

model.add(keras.layers.LSTM(10, input_shape=(length, features)))



This code adds an LSTM layer with 10 neurons to our model.

Compile the model and you are set to go…. Or are you?

CPU vs GPU

So lets do some test on training times on some standard datasets (imdb) to see just how slow these are.



A reminder that this is not to build an efficient model, just a “big” one.

Original test code

Switching between environments with CPU and GPU tensorflow versions we can see clear speedups:



Train time per epoch on CPU: 43600 seconds.

Train time per epoch on GPU: 130 seconds.



That is an impressive speed up, so remember if you are training recurrent models be sure to use a GPU….



But what if that is not enough? Is there anything else I can do?



GPU + CuDNN

So if you have installed the gpu version of tensorflow you must have encountered the “cublas.so.9.0 cannot be found” or “CuDNN not installed” error, these are necessary libraries for the gpu version of tensorflow but actually the CuDNN library (NVIDIA’s own optimized library for machine learining) is not used by default always.



Enter the keras.layers.CuDNNLSTM, a version of LSTM that uses the CuDNN library.

Unfortunately there are some limitations to these layers (like not being able to choose the activation function) but if you want standard layers they are essentially the same.



So… how fast are they? Well, after changing all our keras.layers.LSTM to keras.layers.CuDNNLSTM and rerun the script we find it only takes 100 seconds!

Test code with CuDNNLSTM





So lets try different batch sizes and plot them!!!



Training time of LSTM (cpu), LSTM (gpu) and CuDNNLSTM (gpu) on imdb.

As you can see you get an impressive speedup relative to the cpu time. But if we look at these in relative numbers we get the following graph:

Comparison between GPU/CuDNN times relative to CPU times

A little comment, perhaps 128 is too big for my cpu and hence the increased training time.

Conclusions

We can clearly see the effect the hardware choice has on the training times of our models, but also that algorithmic changes have a deep impact also.

CuDNNLSTMs have shown impressive speedups even compared to the gpu training times and are a really easy way of speeding up your models

Click here to see the keras web documentation or here to go to the keras source code.