Key factors in machine learning research are the speed of the computations and the repeatability of results. Faster computations can boost research efficiency, while repeatability is important for controlling and debugging experiments. For a broad class of machine learning models including neural networks, training computations can be sped up significantly by running on GPUs. However, this speedup is sometimes at the expense of repeatability. For example, in TensorFlow, a popular framework for training machine learning models, repeatability is not guaranteed for a common set of operations on the GPU. Two Sigma researcher Ben Rossi demonstrates this problem for a neural network trained to recognize MNIST digits, and a workaround to enable training the same network on GPUs with repeatable results.

Note that these results were produced on the Amazon cloud with the Deep Learning AMI and TensorFlow 1.0, running on a single NVidia Grid K520 GPU.

Caveat Emptor. The lack of repeatability is often acceptable since the final result (say in terms of out-sample accuracy) is not significantly affected by GPU non-determinism. However, we’ve found that in some applications the final result can be quite different due to this non-determinism. This is because small differences due to GPU non-determinism can accumulate over the course of training, leading to rather different final models. In such cases, the method outlined in this article can be used to ensure repeatability of experiments. However, repeatability achieved with this method is not a panacea since this does not address the fundamental instability of training. A fuller discussion of that is outside the scope of this article, but in our view the chief benefit of repeatability is to ensure that we have accounted for all sources of variation. Once we have that level of control, then we can go back and explore more carefully the stability of training as a function of the source of variation. In particular, even the GPU non-determinism may be explored in this deterministic setting by controlling how mini-batches are constructed. A second benefit of repeatability is that is makes it easier to write regression tests for the training framework.

Non-Determinism with GPU Training

Let us demonstrate the problem on the code to train a simple MNIST network (shown in the Appendix) using the GPU. When we run the code to train the model twice, we obtain the following output.