NVIDIA GeForce GTX 960 GPU Benchmark vs. 760, 970, R9 285 – A $200 Juggernaut P2: GTX 960 Gaming Benchmarks

It's official: The price gap between the GTX 960 and GTX 970 is large enough to drive a Ti through. NVidia's new GeForce GTX 960 2GB graphics card ships at $200, pricing it a full $50 cheaper than the GTX 760's launch price. The immediate competition would be AMD's R9 285, priced almost equivalently. NVidia's GTX 960 is intended to target the market seeking the best video card for the money – a segment that both AMD and nVidia call the “sweet spot” – and is advertised as capable of playing most modern games on high settings or better. The GTX 960 uses a new Maxwell GPU, called the GM206, for which the groundwork was laid by the GTX 980's GM204 GPU. In our GTX 980 review, we mentioned that per-core performance and per-watt performance had increased substantially, resulting in a specs listing that exhibits a lower core count and smaller memory interface. AMD has leveraged these number changes in recent marketing outreaches, something we'll discuss in the conclusion. This GeForce GTX 960 review tests the new ASUS Strix 960 video card against the 970, 760, R9 285, & others. The benchmark analyzes GTX 960 FPS performance in titles like Far Cry, Assassin's Creed, EVOLVE, and other modern titles. The GTX 960 is firmly designed for 1080p gaming, which is where the vast majority of monitors currently reside.

ASUS GTX 960 Strix Review – FPS, Temperatures, & More

NVIDIA GeForce GTX 960, 970, & 980 Video Card Specs

GTX 980

GTX 970 GTX 960 GPU GM204 GM204 GM206 Fab Process 28nm 28nm 28nm Texture Filter Rate

(Bilinear) 144.1GT/s 109.2GT/s 72.1GT/s TjMax 95C 95C 95C Transistor Count 5.2B 5.2B 2.94B ROPs 64 64 32 TMUs 128 104 64 CUDA Cores 2048 1664 1024 Base Clock (GPU) 1126MHz 1050MHz 1126MHz Boost CLK 1216MHz 1178MHz 1178MHz Single Precision 5TFLOPs 4TFLOPs 2.3TFLOPs Mem Config 4GB / 256-bit 4GB / 256-bit 2GB / 128-bit Mem Bandwidth 224GB/s 224GB/s 112.16GB/s Mem Speed 7Gbps

(9Gbps effective - read below) 7Gbps

(9Gbps effective) 7Gbps

(9Gbps effective) Power 2x6-pin 2x6-pin 1x6-pin TDP 165W 145W 120W Output DL-DVI

HDMI 2.0

3xDisplayPort 1.2 DL-DVI

HDMI 2.0

3xDisplayPort 1.2 3xDisplayPort 1.2

1xHDMI

1xDL-DVI MSRP $550 $330 $200

As with both preceding Maxwell processors, the count-for-count specifications of the GTX 960 are deceptively low. The GTX 960 hosts a single GM206 GPU, equipped with 8 Streaming Multiprocessors to net a total of 1024 CUDA Cores. It's important to note that cross-architecture comparisons of core count aren't linear, as the Maxwell CUDA cores are roughly 1.4x more powerful than Kepler cores. We'll get into this momentarily.

Looking at the rest of the specs, the GTX 960 ships strictly in a 2GB model using a 128-bit bus, which makes use of heavy color compression to reduce saturation of the memory bandwidth (which is 112.16GB/s). The effective speed of the memory is 9300MHz due to these efficiency gains, operating a memory clock of 7010MHz natively.

Continuing the joint trend by all semiconductor manufacturers, the GTX 960 has dropped its TDP to just 120W, a marked decrease over the GTX 760's 170W power requirement. As a result of the low TDP, the GTX 960 can be powered by a single 6-pin PCI-e connector and the PCI-e slot alone. Notably, some board partners will be manufacturing overclocking-targeted GTX 960s with additional power headers for increased OC overhead.

NVidia still rests on 28nm fabrication process and likely will not change that for the remainder of this GeForce generation.

A Quick Refresher on Maxwell - CUDA Core Performance 1.4x Over Kepler

We'd recommend reading our GTX 980 review & architecture drill-down for full information on all major gaming-relevant Maxwell features.

The GM206 hosts almost all of the same architectural features introduced in the GM204 GTX 980 GPU, to include delta color correction and per-core efficiency gains, but makes a few key changes.

As with all Maxwell GPUs, the SMM is broken into four blocks. Each block possesses 32 CUDA cores, making for a total of 128 cores per streaming multiprocessor (there are eight in the 960). Each SMM hosts 96KB of shared memory, then combines the L1 texture caching functionality into a 24KB memory pool (one pool shared between two CUDA blocks, or 64 cores). For comparison, Kepler's GPUs hosted a 64KB pool that was also shared with L1 cache.

Among other changes that will be discussed shortly, these key modifications to Maxwell over Kepler contributed to a 40% performance-per-core increase, ultimately netting a 2x performance-per-watt gain.

I said there were new features in GM206 that were not found in GM204, though. The most noteworthy addition to GM206 and the GTX 960 is a new video engine, now capable of full H.265 HEVC (encoding & decoding), whereas the GTX 980 could only encode. We previously wrote that H.265 had finally been approved by the ITU. H.265 encoding (all our videos are encoded in H.264) will substantially decrease bandwidth requirements, allowing content creators to pack more data into a smaller file, allowing for less abusive data streams. This becomes critical as the world moves toward higher resolution output, which further increases demand for high bit-rate media (the bit-rate must increase in step with resolution to prevent a drop in overall quality).

The GTX 960's H.265 decoding support means better support for high-quality media going forward, although we think this is still somewhat far off; 4K & 5K are still buzzwords at this point, and media produced for the standards is rare. Buying a GPU strictly on the expectation that media (and then H.265-encoded media, still effectively non-existent) will soon be produced en masse seems a bit of a stretch, though it has plenty of other merits.

Aside from this change, it's important to note that the GM206 uses the same memory subsystem to what the GM204 and GTX 980 featured. As discussed in page two of our 980 review, third-generation delta color compression (example below) looks at individual object color delta temporally (from one frame to the next, or over time). Once analyzed, the delta (difference between the previous color and the new one) is calculated, rather than performing absolute calculations for every color on the screen; this also prevents colors that remained the same from being reacquired. Overall, this memory subsystem results in roughly 25% fewer bytes per frame vs. Kepler. The key note here is that a 128-bit memory interface – when looked at in real-world gaming applications – is capable of outperforming the GK106's 192-bit memory interface.

MFAA Finally Gets Introduced

Although it was written about in our GTX 980 review and at Game24, nVidia's more bandwidth-efficient MFAA (multi-frame sampled anti-aliasing) hasn't actually made it into many games. The launch of the GTX 960 comes shortly after expanded MFAA support by games, to include Dragon Age: Inquisition, ARMA III, Far Cry 4, and others. In testing, MFAA improves performance on mid-range hardware (like the GTX 960) somewhat substantially without producing a noticeable quality decrease.

MFAA operates temporally, taking pixel samples from the previous frame and the impending frame, then combining them to determine the best pixel color (improving smoothness, as all anti-aliasing technologies do). This means the workload is split over a period of two frames, rather than loading the GPU heavily on a single frame basis.

Let's get to the GTX 960 benchmarks. Continue on to Page 2 for the charts.