I am putting together another developer box and just got 2 Titan-V. I swapped them out in a machine with 2x 1080ti while waiting for the rest of the components to arrive.

I was surprised by the performance being less on my own work, so went to reproduce it using NGC containers and standard examples. I’m using latest nvidia-docker and tensorflow:18.08-py3 pull. The system is running Ubuntu 18.04 with 396.54 drivers.

In particular, the /workspace/nvidia-example/biglstm shows it easily.

2x Titan V, shows a wps of 10878 (averaged over several hundred iterations).

2x GTX1080ti, shows a wps of 12475 (averaged)

Which didn’t really make sense to me, but it is quite repeatable in my case.

Noticed some other anomalous performance, such as the plain cifar10 tutorial in Tensorflow. The 1080ti’s scale consistently from using 1 gpu, to significantly faster with 2x gpu. (about 1.37 per 100 iteration)

However, the Titan V is significantly faster using a single GPU (about 0.68 per 100 iterations), and significantly slower using 2x Titan V (about 1.27 per 100 iterations). The 2x Titan V is faster than 2x 1080ti, but it is slower than 1x Titan V. (weird)

I monitored all this using nvidia-smi dmon and I even touched the cards to see which were warming up just to be certain.

The machine is a little bit older, Skylake i7-6700k on a Asus Z170A mobo. With two cards, the PCIe are running 8x to each GPU, but this hasn’t been an issue with the 1080ti.

The 1080ti are EVGA SC models with hybrid coolers (liquid cooled GPU with a fan on the card’s voltage regulator section). They do stay quite cool and EVGA ships them with mild overclocking.

The Titan V’s are right out of the box, no tweaks or anything.

I would have expected different results, with the TitanV’s besting or at least equaling the 1080ti cards.

I would have also expected the simple cifar10 to scale better with 2x GPU’s, being about the same, or perhaps a little better. Although with the small data and batch sizes, that could be explained by the CPU overhead of managing data between the GPU’s.

I have no idea how to further dig into this, thought it might be an issue with the 396.54 drivers or Cuda 9.2.