This post is a very short summary of NVIDIA Turing Architecture whitepaper (available on September 14th).

Key Features of Turing

INT32 Cores (Concurrent execution of floating point and integer instructions)

Turing architecture adds new execution unit (INT32). This unit will enable Turing GPUs to execute floating point and non-floating point processes in parallel. NVIDIA claims that this should theoretically provide 36% additional throughout for floating point operations.

The parallel execution will be possible thanks to new unified architecture for shared L1 memory and texture caching. NVIDIA claims that INT32/FP32 core design and other changes to the new streaming multiprocessor, provide “50% improvement in delivered performance per CUDA core”.

New Shading Advancements

Mesh Shading — new shader model for vertex, tesselation, geometry shading (more objects per scene)

Variable Rate Shading (VRS) — developer control over shading rates (to limit shading where it does not provide visual benefit)

Texture-Space Sharing — Storing shading results in memory (no need to duplicate sharing work for the processes)

Multi-View Rendering (MVR) — Extends Pascal’s Single Pass Stereo to multi views in a single pass

Turing Memory Compression

Turing architecture brings new lossless compression techniques. NVIDIA claims that their further improvements to ‘state of the art’ Pascal algorithms have provided (in NVIDIA’s own words) ‘50% increase in effective bandwidth on Turing compared to Pascal’.

Video and Display Engine

New video engine supports DisplayPort 1.4a (8K at 60 Hz). The Turing graphics cards can drive two 8K displays at 60 Hz (either through DP or USB-C. The new engine features enhanced NVENC encoder (can encode H.265 stream at 8K/30 FPS) and new NVDEC decoder with HEV YUV444 10/12b HDR, H.264 8K and VP9 10/12 HDR support.

NVLINK (only 2-way)

The TU102 GPU features TWO x8 2nd Gen NVLINK, while TU104 is equipped with a single x8 link. The TU106 does not support NVLINK. Unfortunately, NVIDIA decided to end 3-way and 4-way SLI support with Turing.

NVIDIA TU102 vs TU104 vs TU106

NVIDIA GeForce RTX 2070 is the only graphics card from the new series to utilize the full silicon. It is not, as previously speculated, based on cut-down TU104. NVIDIA confirmed that their new xx70 model will, in fact, feature TU106 GPU.

Specs-wise, Turing TU102 essentially doubles the specs of TU106. The TU104 is the only Turing chip to feature four TPCs per cluster (unlike TU102 and TU106 which have 6 per GPC).

Is TU106 a mid-range chip?

According to NVIDIA’s own naming convention, the TU106 should be a mid-range chip. What is worth noting, however, is that TU106 GPU is 131 mm2 bigger compared to GP104 (Pascal). The theory is that NVIDIA shifted TU100 to TU102 and TU102 to TU104 respectively. As long as die-size is considered, the TU106 could’ve easily been a high-end chip.

NVIDIA TURING GPUs VideoCardz.com TU102 TU104 TU106 Fabrication Node 12nm FFN 12nm FFN 12nm FFN Die Size 754 mm2 545 mm2 445 mm2 Transistors 18.6 Billion 13.6 Billion 10.6 Billion NVIDIA SKU w/ full chip Quadro RTX 6000 Quadro RTX 5000 GeForce RTX 2070 GPCs 6 6 3 TPCs 36 24 18 SMs 72 (12 per GPC) 48 (8 per GPC) 36 (12 per GPC) Tensor Cores 576 384 288 RT Cores 72 48 36 FP32 Cores (CUDAs) 4,608 3,072 2,304 INT32 Cores 4,608 3,072 2,304 ROPs 96 64 64 TMUs 288 192 144 Memory Interface 384-bit 256-bit 256-bit L2 Cache 6144 KB 4096 KB 4096 KB

Turing GPUs block diagrams

These are simplified versions of NVIDIA’s original block diagrams of Turing GPUs (they are basically 99% the same, except mine are a lot sexier).