NVidia GeForce GTX 980 Ti SLI Benchmark & Review vs. GTX 980 SLI, Titan X P2: GTX 980 Ti in SLI Results

Multi-GPU configurations have grown in reliability over the past few generations. Today's benchmark tests the new GeForce GTX 980 Ti in two-way SLI, pitting the card against the GTX 980 in SLI, Titan X, and other options on the bench. At the time of writing, a 295X2 is not present for testing, though it is something we hope to test once provided. SLI and CrossFire have both seen a redoubled effort to improve compatibility and performance in modern games. There are still times when multi-GPU configurations won't execute properly, something we discovered when testing the Titan X against 2x GTX 980s in SLI, but it's improved tremendously with each driver update.

Previous GTX 980 Ti Review Content

NVidia GeForce GTX 980 Ti Specs

GTX 980 Ti GTX Titan X GTX 980

GTX 780 Ti GPU GM200 GM200 GM204 GK-110 Fab Process 28nm 28nm 28nm 28nm Texture Filter Rate

(Bilinear) 176GT/s 192GT/s 144.1GT/s 210GT/s TjMax 92C 91C 95C 95C Transistor Count 8B 8B 5.2B 7.1B ROPs 96 96 64 48 TMUs 176 192 128 240 CUDA Cores 2816 3072 2048 2880 Base Clock (GPU) 1000MHz 1000MHz 1126MHz 875MHz Boost Clock (GPU) 1075MHz 1075MHz 1216MHz 928MHz GDDR5 Memory /

Memory Interface 6GB / 384-bit 12GB / 384-bit 4GB / 256-bit 3GB / 384-bit Memory Bandwidth (GPU) 336.5GB/s 336.5GB/s 224GB/s 336GB/s Mem Speed 7Gbps 7Gbps 7Gbps

(9Gbps effective - read below) 7Gbps Power 1x8-pin

1x6-pin 1x8-pin

1x6-pin 2x6-pin 1x6-pin

1x8-pin TDP 250W 250W 165W 250W Output 3xDisplayPort

1xHDMI 2.0

DVI 3xDisplayPort

1xHDMI 2.0

1xDual-Link DVI DL-DVI

HDMI 2.0

3xDisplayPort 1.2 1xDVI-D

1xDVI-I

1xDisplayPort

1xHDMI MSRP $650 $1000 $550 now $500 $600

A Word About How SLI Works

None of what's in this section is news. SLI has been around for a long time now – Wikipedia says 2004 – and the functionality has remained the same. Just for a quick refresher, we'll go over how SLI works at a top-level as a means to prep for the next section.

Scalable Link Interfaces allow the bridging of two or more same-model nVidia GPUs in a system. By using an SLI bridge, multiple video cards can be connected to share workload in the form of pixel processing and graphics computations, but using just one pool of VRAM from the primary card. AMD's version of this is called "CrossFire."

By using SLI, it is an inherent fact that more PCI-e lanes will be consumed from the PCH or CPU. Each video card uses a number of PCI-e lanes made available to it through the motherboard's PCI express slots and the platform's lane availability, generally in the form of x16/x16 or x8/x8, depending on the CPU and board. By looking at the pins present within the PCI-e socket (or by flipping the motherboard over - above), the maximum lane count supplied to that slot is revealed at a hardware level. If there are traces split to eight physical, metal pins filling half of the slot's pin-out, it's an x8 slot. The same is true for any other count. This stated, just because a slot offers sixteen pins does not ensure x16 support of a video card.

Even if a magical Z97 motherboard existed that offered four real PCI-e x16 slots – and there isn't one – Haswell is limited to just 16 PCI-e lanes on the CPU and 8 PCI-e lanes on the Z97 & H97 chipsets. Add 'em up, and that's just 24 total lanes – enough for an x8/x8 or x16/x8 setup, but not x16/x16. How, then, do some boards offer "x16/x16" SLI compatibility on LGA115X socket boards?

Some motherboards, like our test board from Gigabyte, will multiplex lanes to produce optimized (switching) lane availability through aftermarket PLX/PEX chips. Signal multiplexing receives multiple analogue signals and merges them into a single signal output, which is de-muxed on the receiving side when retrieiving the original data. It's not a perfect solution, but we can effectively simulate a higher lane count by multiplexing the signal to divert lane availability to straining devices as demand fluctuates. If all present devices are pushing maximum throughput, multiplexing doesn't resolve the issue of limited lane availability; normally, though, gaming use cases place greater load on one device and don't distribute load evenly between all present GPUs. For these instances, multiplexing can accelerate throughput by modulating the signal's strength (effective lane count) to each PCI-e slot.

DirectX 12 Changes the Memory Game

The new DirectX 12 API has already fallen upon our test bench, primarily when performing an API overhead test between Dx11, Dx12, and Mantle (soon Vulkan). DirectX 12 aims to resolve a number of CPU-binding issues, namely those pertaining to draw calls heavily loading the CPU, but it's also making changes to the ways GPUs and IGPs interact.

Dx12's Multiadapter feature will, when deployed by developers, enable communication between IGPs hosted on the CPU die and the dGPU (hosted on the video card). This puts Intel IGPs to use in dGPU systems and will hopefully make more scalable utilization of APU graphics than AMD's own dual graphics solution.

Dx12 also looks like it will allow VRAM pooling on multi-GPU configurations, bypassing the long-standing limitation that only the primary card would utilize its VRAM. Moving forward, this means multiple VGAs can be installed in an array to increase available VRAM – a desirable feature for users relegated to 2GB or 3GB devices. We're not yet entirely sure how this will work in games – obviously they've got to support Dx12 first – but it's a promising direction.

Continue to page 2 for benchmark results & charts.