For a PC hobbyist

who’s into building high-end systems with elaborate water-cooling setups and multiple GPUs, it doesn’t get any better than Intel’s Core i7 Extreme processors. They’re pricey, sure, but they’re clearly the fastest, most capable CPUs on the planet.

Except, you know, when they aren’t.

The last generation of Intel’s Extreme CPUs lost much of its luster earlier this year when the Devil’s Canyon chips arrived in mid-range desktops with higher clock speeds and sometimes superior performance. It didn’t help that the Core i7-4960X and friends were saddled with the older X79 chipset, whose selection of USB and SATA ports left much to be desired.

Happily, Intel has been cooking up a new high-end platform that should remove all doubt about who’s top dog. The CPU is known as Haswell-E, and it brings with it an updated companion chipset, the X99. Together, this dynamic duo offers more of absolutely everything you’d want in a high-end rig: more cores, larger caches, and a huge increase in high-speed I/O ports. Haswell-E is also the first desktop CPU to support DDR4 memory, which promises faster transfer rates than DDR3.

We’ve been waiting impatiently for Haswell-E’s arrival for most of the year. At last, it’s finally here. We’ve had the top CPU in the lineup, the Core i7-5960X, up and running in Damage Labs for a while now—and we’ve tested it more ways than is probably healthy. Read on for our in-depth assessment.

The E is for Extreme

Compared to the prior-gen Ivy Bridge-E chips, the new Haswell-E silicon is an upgrade on just about every front—except maybe one. Both chips are built using Intel’s 22-nm fabrication process with tri-gate transistors. Intel is on the cusp of releasing 14-nm chips for use in tablets and laptops, but these big chips probably won’t move to the new process for another year.

The most notable change in Haswell-E is embedded in its name: the transition to newer CPU cores based on the Haswell microarchitecture. Compared to Ivy Bridge, Haswell cores can execute about 5-10% more instructions in each clock cycle—and possibly more if programs make use of AVX2 instructions for fast parallel processing. Haswell also brings its voltage regulation circuitry onto the CPU die, which can allow for faster, finer-grained control over the delivery of power around the chip.

A look at the Haswell-E die. Source: Intel.

Those improvements are welcome, but Intel hasn’t left anything to chance. The Core i7-5960X packs eight cores, and its L3 cache capacity is a beefy 20MB. That’s two more cores and 5MB more cache than the prior-gen Core i7-4960X, which should be enough to ensure the new chip’s performance superiority in multithreaded workloads.

To feed all of those cores, Haswell-E can transfer tremendous, almost unreasonable amounts of data. One of the key enablers here is DDR4 memory, which offers transfer rates of 2133 MT/s on these first products—up from DDR3 at 1866 MT/s in Ivy-E—and promises to scale up from there. Haswell-E has four memory channels, so it’s starting with 68 GB/s of memory bandwidth. In theory, that’s 20 GB/s more than the last gen. That’s also, coincidentally, the same amount of memory throughput the Xbox One has dedicated to both its CPU cores and graphics.

Speaking of graphics, one of the big selling points for these Extreme platforms is PCI Express bandwidth for use with multiple graphics cards. Haswell-E doesn’t disappoint on that front, with 40 lanes of PCIe 3.0 connectivity coming directly off the CPU die. The CPU can host multi-GPU configs with 16 lanes dedicated to two different graphics cards—or up to four graphics cards with eight lanes each. That’s the same basic config as in the last gen, with a few tweaks. One change is the ability to host a 5×8 setup, if the motherboard is built to support it. Indeed, the Asus X99 Deluxe board in our test system has five PCIe x16 slots onboard. I’m not quite sure what you’d do with five graphics cards at once, but it is apparently a possibility now.

Code name Key products Cores/ modules Threads Last-level cache size Process node (Nanometers) Estimated transistors (Millions) Die area (mm²) Gulftown Core i7-9xx 6 12 12 MB 32 1168 248 Sandy Bridge-E Core-i7-39xx 8 16 20 MB 32 2270 435 Ivy Bridge-E Core-i7-49xx 6 12 15 MB 22 1860 257 Haswell-E Core-i7-59xx 8 16 20 MB 22 2600 356 Vishera FX 4 8 8 MB 32 1200 315

All of this beefy hardware makes for a complex chip. Haswell-E is certainly that, at roughly 2.6 billion transistors and 356 mm². The quad-core Haswell chip is only 177 mm², or about half the size, and that’s with integrated graphics. You can see the difference in the dimensions of the packages used for the socketed processors below.

The quad-core Haswell Core i7-4790K (left) versus the Core i7-5960X (right)

Yeah, this is big and substantial hardware. Here’s a look at the three new Haswell-E-based CPU models alongside their quad-core Haswell cousins.

Model Cores/ threads Base clock (GHz) Max Turbo clock (GHz) L3 cache (MB) PCIe 3.0 lanes Memory channels Memory type & max speed TDP (W) Price Core i7-5960X 8/16 3.0 3.5 20 40 4 DDR4-2133 140 $999 Core i7-5930K 6/12 3.5 3.7 15 40 4 DDR4-2133 140 $583 Core i7-5820K 6/12 3.3 3.6 15 28 4 DDR4-2133 140 $389 Core i7-4790K 4/8 4.0 4.4 8 16 2 DDR3-1600 88 $339 Core i7-4690K 4/4 3.5 3.9 6 16 2 DDR3-1600 88 $242

The Core i7-5960X gives up some clock frequency to cram eight cores into its 140W power envelope. Those base and boost clocks of 3.0 and 3.5GHz are down quite a bit from the 3.6/4.0GHz speeds of the Core i7-4960X. Even with Haswell’s per-clock performance improvements, those lower frequencies will have consequences in workloads that don’t scale up to 16 threads perfectly.

As usual, Intel charges a big premium for its top-end processor. You’re probably better off buying the Core i7-5930K for over 400 bucks less, as long as you can live with “only” six cores (and 12 threads via Hyper-Threading.) The 5930K has the added advantage of slightly higher clock speeds, too. Then again, I’m not sure how much stock clocks matter since all of the X- and K-series parts shown above come with unlocked multipliers for dead-simple overclocking.

One product you’ll probably want to avoid is the Core i7-5820K, which Intel has ruined by disabling a bunch of the PCI Express lanes. I swear, if there’s a way to tune a knob or dial in order to gimp a CPU for the sake of product segmentation, Intel’s product people will find that knob and turn it, no matter what. In this case, the Core i7-5820K loses the ability to host a dual-graphics setup with 16 lanes to each PCIe slot. Have fun explaining that one to your friend who popped $389 for a CPU and about the same for a fancy X99 motherboard, only to find that it’s no better—not even in theory—than a 4790K for dual-GPU setups. This issue is more pressing now that AMD relies on PCI Express bandwidth for transferring CrossFire frames between GPUs.

We have in the past considered CPUs like the Core i7-3820 to be a nice entry point into Intel’s higher-end platforms. That ends here. The 5820K’s hobbled PCIe removes a major rationale for the X99 platform’s adoption among PC gamers. Unless you really know what you’re doing, stay away from it.

A new socket: LGA2011-v3

As you might expect given the VR integration and the shift to DDR4, Haswell-E adopts a new socket type that isn’t compatible with prior chips. Intel calls it Socket 2011-v3. Although the pin config is different, the new socket looks a lot like the one it replaces. That’s good news, since we’re fans of the robust retention mechanism and physical design of LGA2011. Coolers made for LGA2011 sockets should work just fine with LGA2011-v3, too.

This socket is tightly flanked by DIMM slots. Cooler clearance around it will be an issue, which is one reason folks tend to choose water cooling for Core i7 Extreme systems. As you can see above, Corsair sensibly chose to equip its Vengeance LPX DIMMs with low-profile heat spreaders, which is the right way to do it, in my view.

The deal with DDR4 memory

This new platform requires DDR4 memory, of course. The modules have 288 pins and are notched differently along the bottom, so they’re completely incompatible with DDR3. DDR3 has been with us for a long, long time, and the switch to a new memory type promises big benefits, at least eventually.

A DDR4 module (top) versus DDR3 (bottom)

One major plus is lower-power operation. Samsung says DDR4 modules require about 30-40% less power than even DDR3L DIMMs. Some of that gain comes from a lower 1.2V standard operating voltage, and the rest comes from a collection of design features expressly intended to improve power efficiency. For instance, DDR4’s smaller-sized pages require less power to activate. All told, the savings should add up to about 2W per module. That’s not really a big deal in the context of a high-end desktop, but it would be in a tablet or in a server crammed full of DIMMs.

Speaking of which, DDR4 is also primed to achieve higher bit densities than DDR3, and the spec includes native support for chip stacking. Samsung is already using through-silicon vias to stack four DDR4 chips on top of one another.

Another big perk of DDR4 is, of course, additional bandwidth. This new standard has been designed to reach higher transfer rates than DDR3. As with most new memory types, its potential may not be realized right away. DDR3 currently tops out at 2133MHz, more or less, and that’s where DDR4 starts with Haswell-E. Thing is, memory makers are already working on DDR4 chips capable of 3200 MT/s operation.

Although the Core i7-5960X doesn’t officially support RAM speeds above 2133 MT/s, the firmware on our Asus X99 Deluxe offers options as high as 4000 MT/s. Heck, the Corsair Vengeance LPX DIMMs we used for testing are rated for 2800 MT/s at 1.2V. Intel has even blessed an XMP (for eXtreme Memory Profile) 2.0 spec that will allow DDR4 DIMMs to auto-configure themselves at higher clocks on X99 motherboards.

So DDR4 looks to have plenty of headroom right out of the gate. The more difficult question is whether any common consumer applications will actually benefit from the additional bandwidth.

The X99 platform

Block diagram of the X99 chipset and platform. Source: Intel.

Few folks will question the wisdom of giving the X99 chipset more oomph. This new companion I/O chip is loaded to the gills, with 10 SATA 6Gbps ports and six USB 3.0 ports, which is enough not to be embarrassing like the X79. The X99 chip can also support M.2 and SATA Express-based storage, although you’re surely better off hanging fast SSDs directly off of the CPU. What happens there will depend on the motherboard makers.

Mobo manufacturers also have the option of implementing Thunderbolt 2 on the X99 platform if they wish. Doing so will add some costs, as Thunderbolt tends to do, but the return will be an external I/O connection that’s capable of 20 Gbps transfers. That’s twice the rate of the original Thunderbolt and four times what USB 3.0 can sustain.

The possibilities for I/O configurations on X99 boards are incredibly complex given the number of ports, lanes, and slots available between the CPU and the X99 chipset. Geoff will be covering the particulars of various motherboards in his reviews, including today’s look at the Asus X99 Deluxe. I’ll leave most of the detail to him, but there is one caveat about the X99’s setup I should note.

In the block diagram above, you can see the “DMI 2.0 x4” connection between the CPU and the X99. That’s essentially a dedicated PCIe 2.0-style link from chip to chip—which means it has only 20 Gbps of raw, bidirectional bandwidth available to it. Behind this not-especially-fast interconnect are six USB 3.0 ports, eight USB 2.0 ports, eight PCIe 2.0 lanes, 10 SATA 6Gbps ports, and more. Do the math, and it’s pretty abysmal. The X99 just can’t support nearly the amount of concurrent I/O that its port payload suggests—not if those transfers are going to the CPU or memory. For most desktop users, this bottleneck probably won’t become a problem too often, but it’s still pretty far from ideal.

Our testing methods

The Cooler Master Nepton 140XL kept our Haswell-E frosty

As usual, we ran each test at least three times and have reported the median result. Our test systems were configured like so:

Processor AMD FX-8350 AMD A6-7400K Pentium G3258 AMD A10-7800 Core i3-4360 Core i5-4590 Core i7-4790K Motherboard Asus Crosshair V Formula Asus A88X-PRO Asus Z97-A North bridge 990FX A88X FCH Z97 Express South bridge SB950 Memory size 16 GB (2 DIMMs) 16 GB (4 DIMMs) 16 GB (2 DIMMs) Memory type AMD Performance Series DDR3 SDRAM AMD Radeon Memory Gamer Series DDR3 SDRAM Corsair Vengeance Pro DDR3 SDRAM Memory speed 1866 MT/s 1866 MT/s 1333 MT/s 2133 MT/s 1600 MT/s Memory timings 9-10-9-27 1T 10-11-11-30 1T 8-8-8-20 1T 10-11-11-30 1T 9-9-9-24 1T Chipset drivers AMD chipset 13.12 AMD chipset 13.12 INF update 10.0.14 iRST 13.0.3.1001 Audio Integrated SB950/ALC889 with Realtek 6.0.1.7233 drivers Integrated A85/ALC892 with Realtek 6.0.1.7233 drivers Integrated Z97/ALC892 with Realtek 6.0.1.7233 drivers OpenCL ICD AMD APP 1526.3 AMD APP 1526.3 AMD APP 1526.3 IGP drivers – Catalyst 14.6 beta 10.18.10.3652

Processor Core i5-2500K Core i7-4960X Core i7-5960X Motherboard Asus P8Z77-V Pro Asus P9X79 Deluxe Asus X99 Deluxe North bridge Z77 Express X79 Express X99 South bridge Memory size 16 GB (2 DIMMs) 16 GB (4 DIMMs) 16 GB (4 DIMMs) Memory type Corsair Vengeance Pro DDR3 SDRAM Corsair Vengeance DDR3 SDRAM Corsair Vengeance LPX DDR4 SDRAM Memory speed 1333 MT/s 1866 MT/s 2133 MT/s Memory timings 8-8-8-20 1T 9-10-9-27 1T 15-15-15-36 1T Chipset drivers INF update 10.0.14 iRST 13.0.3.1001 INF update 10.0.14 iRST 13.0.3.1001 INF update 10.0.17 iRST 13.1.0.1058 Audio Integrated Z77/ALC892 with Realtek 6.0.1.7233 drivers Integrated X79/ALC898 with Realtek 6.0.1.7233 drivers Integrated X99/ALC1150 with Realtek 6.0.1.7233 drivers OpenCL ICD AMD APP 1526.3 AMD APP 1526.3 AMD APP 1526.3 IGP drivers – – –

They all shared the following common elements:

Hard drive Kingston HyperX SH103S3 240GB SSD Discrete graphics XFX Radeon HD 7950 Double Dissipation 3GB with Catalyst 14.6 beta drivers OS Windows 8.1 Pro Power supply Corsair AX650

Thanks to Corsair, XFX, Kingston, MSI, Asus, Gigabyte, Cooler Master, Intel, and AMD for helping to outfit our test rigs with some of the finest hardware available. Thanks to Intel and AMD for providing the processors, as well, of course.

Some further notes on our testing methods:

The test systems’ Windows desktops were set at 1920×1080 in 32-bit color. Vertical refresh sync (vsync) was disabled in the graphics driver control panel.

We used a Yokogawa WT210 digital power meter to capture power use over a span of time. The meter reads power use at the wall socket, so it incorporates power use from the entire system—the CPU, motherboard, memory, graphics solution, hard drives, and anything else plugged into the power supply unit. (The monitor was plugged into a separate outlet.) We measured how each of our test systems used power across a set time period, during which time we encoded a video with x264.

After consulting with our readers, we’ve decided to enable Windows’ “Balanced” power profile for the bulk of our desktop processor tests, which means power-saving features like SpeedStep and Cool’n’Quiet are operating. (In the past, we only enabled these features for power consumption testing.) Our spot checks demonstrated to us that, typically, there’s no performance penalty for enabling these features on today’s CPUs. If there is a real-world penalty to enabling these features, well, we think that’s worthy of inclusion in our measurements, since the vast majority of desktop processors these days will spend their lives with these features enabled.

The tests and methods we employ are usually publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Memory subsystem performance

Since we have a new chip architecture and a new memory type on the bench, let’s take a look at some directed memory tests before moving on to real-world applications.

The fancy plot above mainly looks at cache bandwidth. This test is multithreaded, so the numbers you see show the combined bandwidth from all of the L1 and L2 caches on each CPU. Since Haswell-E has eight 32KB L1 caches, we’re still in the L1 cache at the 256KB block size above. The next three points, up to 2MB, are hitting the L2 caches, and beyond that, up to 16MB, we’re into the L3.

Intel’s architects essentially doubled the bandwidth in Haswell’s L1 and L2 caches compared to Ivy Bridge in order to make them fast enough to support AVX2’s higher throughput. We don’t see quite a doubling of performance in our measurements when comparing the Core i7-4960X to the 5960X, but there are a lot of moving parts here. The 5960X has more cores but a lower frequency, for instance. Regardless, the 5960X’s caches can sustain vastly more throughput than any other CPU we’ve tested.

Stream offers a look at main memory bandwidth, and the results are scandalous. I tried a number of different thread counts and affinity configs, but I just couldn’t extract any more throughput from the 5960X with this version of Stream. In fact, I even tried raising the DDR4 speed from 2133 to 2800 MT/s, and throughput didn’t improve. Frustrated, I decided to try a different bandwidth test from AIDA64.

That’s more like it. The Haswell-E/DDR4 combo can achieve higher throughput in the right situation. The memory read results don’t tell the whole story, though.

Looks like DDR4 writes are slower than reads, at least with AIDA64’s access pattern. In the end, the memory copy test shows a reasonably good overall result for the 5960X and DDR4. At 2133 MT/s, it’s not much faster than an i7-4960X with DDR3-1866, but at 2800 MT/s, the new CPU and memory type move ahead.

Next up, let’s look at access latencies.

SiSoft has a nice write-up of this latency testing tool, for those who are interested. We used the “in-page random” access pattern to reduce the impact of prefetchers on our measurements. This test isn’t multithreaded, so it’s a little easier to track which cache is being measured. If the block size is 32KB, you’re in the L1 cache. If it’s 64KB, you’re into the L2, and so on.

The bottom line: Haswell-E achieves roughly double the cache bandwidth without any real increase in the number of clock cycles of access latency. That’s excellent.

Accessing main memory is a bit of a different story. The Haswell-E-and-DDR4 combo has higher memory access latencies at 2133 MT/s than most of the DDR3-based setups. Fortunately, that slowdown pretty much evaporates once we crank the DDR4 up to 2800 MT/s.

Honestly, I’m not sure the added memory latency matters much, anyhow, given the fact that the 5960X has a massive 20MB L3 cache. We conducted most of our testing at the CPU’s officially supported memory spec of DDR4-2133. We may have to do some additional testing at 2800 MT/s to see what it nets us in real applications.

Some quick synthetic math tests

The folks at FinalWire have built some interesting micro-benchmarks into their AIDA64 system analysis software. They’ve tweaked several of these tests to make use of new instructions on the latest processors, including Haswell-E. Of the results shown below, PhotoWorxx uses AVX2 (and falls back to AVX on Ivy Bridge, et al.), CPU Hash uses AVX (and XOP on Bulldozer/Piledriver), and FPU Julia and Mandel use AVX2 with FMA.

Good grief. The Core i7-5690X is off to one heck of a start. Many of the big generational performance gains you’re seeing above come from the use of AVX2 and the FMA (or fused multiply-add) instruction. Haswell has it, and Ivy Bridge doesn’t. Notice how the quad-core 4790K nearly matches or even beats the six-core 4960X? That’s Haswell magic at work. Now, give the Haswell chip eight cores, and you have the i7-5960X.

Power consumption and efficiency

The workload for this test is encoding a video with x264, based on a command ripped straight from the x264 benchmark you’ll see later.





The 5960X’s impressive reductions in idle power consumption versus the i7-4960X come mostly courtesy of the Asus X79 motherboard on the 4960X system; it’s a great board, but we’ve seen similarly configured X79 systems idle at around 64W. In fact, I’m a little disappointed that our 5960X system doesn’t go lower at idle. Yes, it has twice as many DIMMs as our quad-core Haswell systems onboard, but I’d hoped the power savings from DDR4 might move the needle a bit more.

Now that’s more like it. This drop in peak power use is an earnest improvement over the 4960X, since that Asus X79 mobo only draws more power than other boards at idle, not under load. At first, the fact that the 5960X system pulls less power than the 4960X one surprised me. After all, the 5960X has a 140W TDP, and the 4960X’s peak power rating is 130W. However, the 5960X’s TDP encompasses the CPU’s integrated voltage regulators. On the 4960X, the VRMs are external and don’t count toward the TDP, so it makes sense that the 5960X’s total system power draw would be lower.

That said, this workload doesn’t really engage all eight cores and 16 threads on the 5960X throughout its execution. We saw transient peaks of 130W or more during the test run, and this same system will draw as much as 163W during a 3D rendering workload.

We can quantify efficiency by looking at the amount of power used, in kilojoules, during the entirety of our test period, when the chips are busy and at idle.

Perhaps our best measure of CPU power efficiency is task energy: the amount of energy used while encoding our video. This measure rewards CPUs for finishing the job sooner, but it doesn’t account for power draw at idle.

The 5960X looks pretty darned efficient overall, regardless of the fact that this isn’t the ideal workload for a 16-threaded CPU. Only a couple of quad-core Haswells use less energy to complete the task—and the 5960X brings a vast improvement in efficiency over the Core i7-4960X.

Pour one out for my homies at AMD.

Crysis 3

For the most part, Crysis 3 isn’t nearly as demanding as past Crysis games were, relatively speaking. Any reasonably modern system can run it well at the right settings. There is, however, one level that’s particularly difficult for slower CPUs—and it seems to benefit from having more hardware threads and cores on hand. Some of you all suggested that we test there, so we did. You can see the crazy amounts of grass and other vegetation in the video below.





As usual, we recorded the time needed to render each and every frame the game produced. Click on the buttons above to cycle through plots of the frame times from one of our three test runs for each CPU.

What you’ll see, I think, is that frame times vary much more widely on the slower processors. The faster the CPU, the smoother and more consistent the frame rendering times become.

Average FPS is often kind of meaningless for reasons I’ve explained in the past. The slowest CPUs here, though, are clearly struggling. AMD’s A10-7800 is a more affordable CPU that competes with the Core i3-4360, and its 22 FPS average just isn’t getting it done. Its 99th percentile frame time of 68 milliseconds ain’t great, either; that means the slowest 1% of frames are being rendered at a rate equivalent to 14 FPS or less. Interestingly enough, the FX-8350 performs much better; it’s an eight-core variant of more or less the same basic architecture.

The Core i7-5960X essentially ties for the lead with its predecessor, the 4960X, and its little brother, the Haswell-based 4790K. Although the 4790K only has four cores, it has eight hardware threads and runs at substantially higher clock speeds than the 5960X. As a result, it comes out ahead of the 5960X by just a smidgen.

We can sort the frame times from best to worst and look at the tail end of the results, where the slowest frames reside, to see a clear separation between the faster and slower CPUs. The 4760X, 4790K, and 5960X are all packed together almost identically. Above is the eight-core FX-8350, whose performance here is commendable, followed by the Core i5-4590 and the rest of the pack.





Our final frame-time-based metric looks at “badness,” those cases where frames take an especially long time to produce and the user is most likely to notice. We have several thresholds of “badness,” starting at 50 ms, which is the equivalent of 20 FPS or three full refresh cycles on a 60Hz display. Any time you’re waiting 50 ms or more for a frame, you’re likely to notice that in-game animation isn’t as smooth as it should be. Few of these processors spend any time at all beyond our 50-ms threshold in this test session, and none of the high-end ones do.

The 5960X essentially ties the 4960X and 4790K at the other two thresholds, so it’s among the fastest CPUs we’ve tested. Unfortunately, though, this CPU doesn’t break any new performance ground in this test scenario compared to last year’s model.

Watch_Dogs

Here’s another game with a reputation for making life hard on CPUs. We tested by taking a quick stroll around the block and then blowing up an electrical junction box with our phone. Kind of like real life around here.





Well, this game is strenuous—but only for the A10-7800 among the CPUs we tested. Even the Core i3-4360 aces this test, rendering 99% of the frames in 19 ms or less. Notably, the Core i7-5960X takes the slightest of leads over the 4790K and 4960X.





Yeah, perhaps we need to find a different area to test, but this game doesn’t look to be as tough on the CPU as we expected. Even the A10-7800 stays below our 50-ms “badness” threshold throughout the test session.

Arkham Origins





This game is based on Unreal Engine 3, which has gotta be the most widely used game engine right now. UE3 doesn’t use truly robust multithreading, so Arkham Origins doesn’t respond as well as the last two games did to the addition of more cores and hardware threads. The CPUs do encounter some challenges, but the processor that performs the best is the one with the highest clock speed: the quad-core i7-4790K.

Not only that, but for whatever reason, the 5960X runs into a bit of trouble. Some of its frame times are higher than what we see out of even the Core i3-4360. That may simply be because this game is especially sensitive to clock speeds. The Core i3-4360 is Haswell-based and runs at 3.7GHz, while the 5960X starts at 3GHz and maxes out at 3.5GHz with Turbo.

Whatever the cause, the 5960X places near the back of the pack in the 99th percentile frame time scores.

You can see how the 5960X’s latency curve shoots upward during the last few percentage points worth of frames. That’s probably because, in those toughest instants, the game’s performance is gated by a single thread’s execution speed. With its relatively modest clocks, the 5960X can’t power through those situations as easily as the other Intel CPUs we tested.





Remember that we’re talking about minute differences here. None of these processors push beyond our 50-ms “badness” threshold, and the 5960X barely spends any time beyond 33 ms, either. The deltas between CPUs are only apparent at the 16.7-ms threshold. For what it’s worth, though, this game felt “smoother” during play-testing on the other Intel CPUs than it did on the 5960X.

Battlefield 4 with Mantle









Test Battlefield 4 with Mantle, they said. It’ll be interesting, they said.

Ack.

Turns out the work that DICE and AMD have done with the low-overhead Mantle API in BF4 has really come together nicely over time. Even the A10-7800, which has struggled mightily in the other games, turns in a near-perfect performance.

Then again, you may recall that BF3 ran almost flawlessly on just about any CPU, too, and it used Direct3D.

Thief

Our final gaming test also explores Mantle performance. We had to use Thief’s built-in benchmark, since our usual tool for frame-time testing, Fraps, doesn’t yet work with Mantle.

By reducing CPU overhead, Mantle really does allow the lower-end CPUs to perform better in Thief. Unfortunately, the game wouldn’t start up properly in Mantle mode with the two lowest-end CPUs we tested, so they didn’t benefit from the new API.

Beyond that, the 5960X continues to perform well, but it’s clearly not the fastest CPU for gaming. That’s really no surprise, given its clock speeds.

Productivity

Compiling code in GCC

Our resident developer, Bruno Ferreira, helped put together this code compiling test. Qtbench tests the time required to compile the QT SDK using the GCC compiler. The number of jobs dispatched by the Qtbench script is configurable, and we set the number of threads to match the hardware thread count for each CPU.

TrueCrypt disk encryption

TrueCrypt supports acceleration via Intel’s AES-NI instructions, so the encoding of the AES algorithm, in particular, should be very fast on the CPUs that support those instructions. We’ve also included results for another algorithm, Twofish, that isn’t accelerated via dedicated instructions.

7-Zip file compression and decompression

The 5960X simply dominates the first chunk of our non-gaming application tests. These programs are all pretty widely multithreaded, so the 5960X is able to put all eight of its cores to good use.

JavaScript performance

These two JavaScript benchmarks tend to prize per-thread performance, so the 5960X finishes behind its higher-clocked Haswell stable mates. Beats the Ivy-based 4960X, though.

Video encoding

x264 HD video encoding

Our x264 test involves one of the latest builds of the encoder with AVX2 and FMA support. To test, we encoded a one-minute, 1080p .m2ts video using the following options:

–profile high –preset medium –crf 18 –video-filter resize:1280,720 –force-cfr

The source video was obtained from a repository of stock videos on this website. We used the Samsung Earth from Above clip.

As we noted earlier in the power efficiency tests, the x264 encoder doesn’t seem to use all 16 of the 5960X’s threads especially well—at least, not with the settings we used.

Handbrake HD video encoding

Our Handbrake test transcodes a two-and-a-half-minute 1080p H.264 source video into a smaller format defined by the program’s “iPhone & iPod Touch” preset.

The 5960X turns things around in Handbrake with an overall victory. That’s what one would hope to see, since one of the big target workloads for a CPU like this is video encoding.

Image processing

The Panorama Factory photo stitching

The Panorama Factory handles an increasingly popular image processing task: joining together multiple images to create a wide-aspect panorama. This task can require lots of memory and can be computationally intensive, so The Panorama Factory comes in a 64-bit version that’s widely multithreaded. I asked it to join four pictures, each eight megapixels, into a glorious panorama of the interior of Damage Labs.

Chalk up one more win for the 5960X in our photo-stitching app, but it’s a close one.

3D rendering

LuxMark

Because LuxMark uses OpenCL, we can use it to test both GPU and CPU performance—and even to compare performance across different processor types. OpenCL code is by nature parallelized and relies on a real-time compiler, so it should adapt well to new instructions. For instance, Intel and AMD offer integrated client drivers for OpenCL on x86 processors, and they both support AVX. The AMD APP driver even supports Bulldozer’s and Piledriver’s distinctive instructions, FMA4 and XOP. We’ve used the AMD APP ICD on all of the CPUs, since it’s currently fastest ICD in every case.

We’ll start with CPU-only results.

Eight cores are nice to have for parallel workloads like this one. The 5960X easily takes the top spot. Unfortunately, we’ve not yet seen the sort of speed-ups from AVX2 with FMA that we’d hoped—certainly nothing like what we saw in some of those synthetic AIDA64 tests, for instance.

We can try combining CPU and GPU computing power by asking both processor types to work on the same problem at once.

The 5960X’s rendering performance more than doubles when it gets help from AMD’s “Tahiti” GPU.

Cinebench rendering

The Cinebench benchmark is based on Maxon’s Cinema 4D rendering engine. This test runs with just a single thread and then with as many threads as CPU cores (or threads, in CPUs with multiple hardware threads per core) are available.

POV-Ray rendering

The 5960X erases any questions about which desktop CPU is the best at 3D rendering. Yeesh.

Scientific computing

MyriMatch proteomics

MyriMatch is intended for use in proteomics, or the large-scale study of protein. You can read more about it here.

STARS Euler3d computational fluid dynamics

Euler3D tackles the difficult problem of simulating fluid dynamics. Like MyriMatch, it tends to be very memory-bandwidth intensive. You can read more about it right here.

Haswell-E with DDR4 looks to be especially well-suited to these sorts of workloads, which is why this same chip will also be sold in multi-socket Xeon workstations and servers.

Legacy comparisons

Many of you have asked for broader comparisons with older CPUs, so you can understand what sort of improvements to expect when upgrading from an older system. We can’t always re-test every CPU from one iteration of our test suite to the next, but there are some commonalities that carry over from generation to generation. We might as well try some inter-generational mash-ups.

Now, these comparisons won’t be as exact and pristine as our other scores. Our new test systems run Windows 8.1 instead of Windows 7, for instance, and have higher-density RAM and larger SSDs. We’re using some slightly different versions of POV-Ray, too. Still, scores in the benchmarks we selected shouldn’t vary too much based on those factors, so… let’s do this.

Our first set of mash-up results comes from our last two generations of CPU test suites, as embodied in our FX-8350 review from the fall of 2012 and our original desktop Haswell review from last year. This set will take us back at least four generations for both Intel and AMD, spanning a price range from under $100 to $1K.

Productivity

Image processing

3D rendering

Scientific computing

I think the short answer is: yes, it’s time to upgrade.

Legacy comparisons, continued

That was a nice start on the last page, but we can go broader than that. This next set of results includes fewer common benchmarks, but it takes us as far back as the Core 2 Duo and, yes, a chip derived from the Pentium 4: the Pentium Extreme Edition 840. Also present: dual-core versions of low-power CPUs from both Intel and AMD, the Atom D525 and the E-350 APU. We retired this original test suite after the 3960X review in the fall of 2011. We’ve now mashed it up with results from our first desktop Haswell review and from today.

Image processing

3D rendering

Still not old-school enough for you? In April of 2001, the Pentium III 800 rendered this same “chess2” POV-Ray scene in just under 24 minutes.

Overclocking

Since the 5960X’s key multiplier is unlocked, overclocking this CPU only requires changing a few settings in the system BIOS. Naturally, we took a shot at it.

With some fiddling, we were able to get our 5960X running reasonably stable with all eight cores synced at 4.5GHz. To do so, we set the CPU voltage on our Asus X99 Deluxe motherboard to 1.325V. With a big water cooler attached, the CPU’s temperatures reached the low-to-mid 70s Celsius, which isn’t too bad.

We tried to push further, to 4.6GHz, but cranking up the multiplier produced BSODs in Windows almost instantly. We raised the voltage to 1.35V and then 1.375V, but it didn’t help. We then tried bumping up the cache voltage and turning up the fan speed on our water cooler, but the BSODs persisted. We had to settle for 4.5GHz.

Or so we thought. We then kicked off Cinebench to see what the clock speed increases had won us, but the system crashed during the benchmark with a BSOD. Ultimately, we had to drop down to 4.4GHz at 1.3V in order to run Cinebench without crashing.

That’s still, uh, crazy fast, especially in multithreaded applications.

Based on our experience and what we’ve heard from other folks with Haswell-E overclocking experience, I don’t think you can expect to see these chips getting into the 4.7-4.8GHz range that’s more common with Haswell dual- and quad-core parts—not very often, at least.