Nvidia has unveiled the Tesla V100, its first GPU based on the new Volta architecture. Like the Pascal-based P100 before it, the V100 is designed for high-performance computing rather than consumer use, but it still provides a tantalising glimpse at what the future might hold for Nvidia's consumer graphics cards.

Volta, which has been on Nvidia's public roadmap since 2013, is based on a dramatically different architecture to Pascal, rather than a simple die shrink. The V100 chip is made on TSMC's 12nm Fin-FET manufacturing process and packs a whopping 21.1 billion transistors on a 815mm² die. By contrast, the P100 manages just 15.3 billion transistors on a 610mm² die, and the latest Titan Xp sports a mere 12 billion transistors on 471 mm².

Suffice it to say, V100 is a giant GPU and one of the largest silicon chips ever produced, period.

The combination of die size and process shrink has enabled Nvidia to push the number of streaming multiprocessors (SMs) to 84. Each SM features 64 CUDA cores for a total of 5,376—much more than any of its predecessors. That said, V100 isn't a fully enabled part, with only 80 SMs enabled (most likely for yield reasons) resulting in 5,120 CUDA cores.

In addition, V100 also features 672 tensor cores (TCs), a new type of core explicitly designed for machine learning operations. In tasks that can take advantage of them, Nvidia claims that the new tensor cores offer a 4x performance boost versus Pascal, which in theory makes the V100 a better performer than Google's dedicated tensor processing unit (TPU).

High-level performance of V100 is impressive: 15 teraflops of FP32, 30 teraflops of FP16, 7.5 teraflops of FP64, and a huge 120 teraflops for dedicated tensor operations. Should Nvidia ditch the die space reserved for FP64 and tensor cores for FP32 in a future consumer product (Titan XV anyone?), the gaming potential would be massive. Feeding the V100 GPU is 16GB of HBM2 memory clocked at 1.75GHz on a 4096-bit bus for 900GB/sec of bandwidth.

Despite the large die, the V100 GPU still runs at a peak 1455MHz. TDP is rated at 300W, and like its predecessor, V100 features Nvidia's proprietary NVLink connector that allows multiple GPUs to connect directly to each other with more bandwidth than the PCI Express 3.0 bus. The difference is that V100 features NVLink 2, which sports a higher 25GB/s bidirectional link bandwidth, as well as six NVLinks per GPU versus four on GP100.

The V100 will first appear inside Nvidia's bespoke compute servers. Eight of them will come packed inside the $150,000 (~£150,000) DGX-1 rack-mounted server, which ships in the third quarter of 2017. A 250W PCIe slot version of the V100 is also in the works (probably priced at around £10,000), as well as a half-height 150W card that's likely to feature a lower clock speed and disabled cores.