GPU + Deep Learning = ❤️ (but why?)

Deep Learning (DL) is part of the field of Machine Learning (ML). DL works by approximating a solution to a problem using neural networks. One of the nice properties of about neural networks is that they find patterns in the data (features) by themselves. This is opposed to having to tell your algorithm what to look for, as in the olde times. However, often this means the model starts with a blank state (unless we are transfer learning). To capture the nature of the data from scratch the neural net needs to process a lot of information. There are two ways to do so — with a CPU or a GPU.

The main computational module in a computer is the Central Processing Unit (better known as CPU). It is designed to do computation rapidly on a small amount of data. For example, multiplying a few numbers on a CPU is blazingly fast. But it struggles when operating on a large amount of data. E.g., multiplying matrices of tens or hundreds thousand numbers. Behind the scenes, DL is mostly comprised of operations like matrix multiplication.

Amusingly, 3D computer games rely on these same operations to render that beautiful landscape you see in Rise of the Tomb Raider. Thus, GPUs were developed to handle lots of parallel computations using thousands of cores. Also, they have a large memory bandwidth to deal with the data for these computations. This makes them the ideal commodity hardware to do DL on. Or at least, until ASICs for Machine Learning like Google’s TPU make their way to market.

All in all, while it is technically possible to do Deep Learning with a CPU, for any real results you should be using a GPU.

For me, the most important reason for picking a powerful graphics processor is saving time while prototyping models. If the networks train faster the feedback time will be shorter. Thus, it would be easier for my brain to connect the dots between the assumptions I had for the model and its results.

See Tim Dettmers’ answer to “Why are GPUs well-suited to deep learning?” on Quora for a better explanation. Also for an in-depth, albeit slightly outdated GPUs comparison see his article “Which GPU(s) to Get for Deep Learning”.

What to look for in a GPU?

There are main characteristics of a GPU related to DL are:

Memory bandwidth — as discussed above, the ability of the GPU to handle large amount of data. The most important performance metric.

— as discussed above, the ability of the GPU to handle large amount of data. The most important performance metric. Processing power —indicates how fast your GPU can crunch data. We will compute this as the number of CUDA cores multiplied by the clock speed of each core.

—indicates how fast your GPU can crunch data. We will compute this as the number of CUDA cores multiplied by the clock speed of each core. Video RAM size — the amount of data you can have on the video card at once. If you are going to work with Computer Vision models, you want this to be as large as affordable. Especially, if you want to do some CV Kaggle competitions. Amount of VRAM is not so crucial for Natural Language Processing (NLP) and working with categorical data.

Potential Pitfalls

Multiple GPUs

There are two reasons for having multiple GPUs: you want to train several models at once, or you want to do distributed training of a single model. We’ll go over each one.

Training several models at once is a great technique to test different prototypes and hyperparameters. It also shortens your feedback cycle and lets you try out many things at once.

Distributed training, or training a single network on several video cards is slowly but surely gaining traction. Nowadays, there are easy to use approaches to this for Tensorflow and Keras (via Horovod), CNTK and PyTorch. The distributed training libraries offer almost linear speed-ups to the number of cards. For example, with 2 GPUs you get 1.8x faster training.

PCIe Lanes (Updated): The caveat to using multiple video cards is that you need to be able to feed them with data. For this purpose, each GPU should have 16 PCIe lanes available for data transfer. Tim Dettmers points out that having 8 PCIe lanes per card should only decrease performance by “0–10%” for two GPUs.

For a single card, any desktop processor and chipset like Intel i5 7500 and Asus TUF Z270 will use 16 lanes.

However, for two GPUs, you can go 8x/8x lanes or get a processor AND a motherboard that support 32 PCIe lanes. 32 lanes are outside the realm of desktop CPUs. An Intel Xeon with a MSI — X99A SLI PLUS will do the job.

For 3 or 4 GPUs, go with 8x lanes per card with a Xeon with 24 to 32 PCIe lanes.

To have 16 PCIe lanes available for 3 or 4 GPUs, you need a monstrous processor. Something in the class of or AMD ThreadRipper (64 lanes) with a corresponding motherboard.

Also, for more GPUs you need a faster processor and hard disk to be able to feed them data quickly enough, so they don’t sit idle.

Note on Nvidia or AMD

Nvidia has been focusing on Deep Learning for a while now, and the head start is paying off. Their CUDA toolkit is deeply entrenched. It works with all major DL frameworks — Tensoflow, Pytorch, Caffe, CNTK, etc. As of now, none of these work out of the box with OpenCL (CUDA alternative), which runs on AMD GPUs. I hope support for OpenCL comes soon as there are great inexpensive GPUs from AMD on the market. Also, some AMD cards support half-precision computation which doubles their performance and VRAM size.

Currently, if you want to do DL and want to avoid major headaches, choose Nvidia.

Additional Hardware

Your GPU needs a computer around it:

Hard Disk: First, you need to read the data off the disk. An SSD is recommended here, but an HDD can work as well.

CPU: That data might have to be decoded by the CPU (e.g. jpegs). Fortunately, any mid-range modern processor will do just fine.

Motherboard: The data passes via the motherboard to reach the GPU. For a single video card, almost any chipset will work. If you are planning on working with multiple graphic cards, read this section.

RAM: It is recommended to have 2 gigabytes of memory for every gigabyte of video card RAM. Having more certainly helps in some situations, like when you want to keep an entire dataset in memory.

Power supply: It should provide enough power for the CPU and the GPUs, plus 100 watts extra.

You can get all of this for $500 to $1000. Or even less if you buy a used workstation.

GPUs Comparison

Here is performance comparison between all cards. Check the individual card profiles below. Notably, the performance of Titan XP and GTX 1080 Ti is very close despite the huge price gap between them.