62 SHARES Share Tweet

As widely expected, Intel finally officially announced the Ivy Bridge 22 nm refresh of its Xeon E5 family, the dual socket Xeon E5-26xx v2, this past week at the San Francisco IDF. While it took them longer than expected to get the new doritos out, Intel’s engineers used the extra time pretty well.



First, this is the first Xeon family that comes out with three distinct dies right upfront, helping optimize the core and cache subsystems, including the famed ring buses and their latencies, for specific configurations. All of them share the LGA2011 socket, plug compatible with the one from Sandy Bridge.



So, firstly, there is the large 12-core die with 30 MB L3, where three 4-core columns share two sections of L3, 20 + 10 MB, across three ring buses. This huge die may have the longest inter-core latencies from one end to another, since the third ring around the die is the longest one, but avoids needing a hop between two ring buses. Yet it provides very innovative cache proximity layout to minimize the latencies from each core to its nearest L3 bank – yes, it’s still 2.5 MB cache pe core, just like before (Haswell EP Xeons will most likely stay with the same number). In our first review, the Xeon E5-2697v2 2.7 GHz is the top bin part representing this die.

You’d think this monster drinks power like a sweaty runner after a marathon? Well, guess what, measured at the power supply, the whole dual-CPU 24-core system with 128 GB DDR3-1866 RAM and enterprise SSD, plus a high end AMD7990 GPU, Consumed less power than the brand new 4-dualcore AMD-9590 single socket system with 8x less memory and similar graphics, for a roughly ~4x system level performance gain!

Then, we have the native 10-core die with 25 MB L3, with two 5-core columns sharing one large L3 array, and two ring buses, each linking all the cores – so no multiple long rings needed here. This die allows quite a bit higher per-core clocks, so the Xeon E5-2690v2 is a 10 core 3.0 GHz chip, vs the old E5-2690 8 core 2.9 GHz direct predecessor that is being replaced. The 8 core workstation part we test, the E5-2687Wv2, at 3.4 GHz, which replaces the old 8 core 3.1 GHz 2687W, is this 10 core die with 2 cores and their accompanying cache banks disabled. This allows it to fit within the prescribed TDP, although I feel Intel could have easily pushed all these guys further bit higher clock wise… there seem to be a plenty of TDP margin.

Finally, the smallest die, barely larger than the desktop Ivy Bridge, is the native 6-core 15 MB L3

Cache option with obviously the shortest cache and ring bus latencies of all. This baby is the same one that, on the desktop, became the Core i7 4960X – ECC and dual CPU QPI are turned off there, but the clocks are, as we know, unlocked. In my mind, a pair of those makes excellent sense as a workstation, and Intel offers them in the Xeon E5 family as 3.5 GHz parts, same clock as the desktop ones. I personally feel they could have offered those even at 3.8 or 4 GHz, there is so much TDP headroom there. How it fares, we’ll tell you in the next test round together with the 10 core sibling.

We plugged the processors into our old trusty Asus Z9PE-D8 WS dual socket workstation mainboard platform, whose 4 full bandwidth x16 PCIe v3 slots actually make it darn good for serious multi GPU tests as well, since all slots have maximum bandwidth. We used the Micron DDR3-1866 ECC memory that Intel supplied, but also tested the older (but still faster) Samsung DDR3-1600 ECC modules which run at both 1600 and 1866 speeds at lower latency than Micron (CL 8 and 9 vs CL 10 and 11 after tuning). The operating system was Windoze 2008 Server R2 SP1 running off Intel DC3700 800 GB Enterprise SSD, and the usual battery of tests comprised Sandra, Aida, Cinebench, few more highly parallel ray trace tests, as well as Linpack.

So, how do the new E5-2697v2 (validation) and 2687v2 (validation) compare against the old E5-2690 (validation)? Here are the benchmark shots:

Hmm, if you don’t use the random number generators and such, the new processors per core, per clock performance is about the same. The per clock latencies are just a bit longer, as you can see – otherwise it is a measurable net performance gain consistent with the core number and cache size increase. The memory speed increase from 1600 to 1866 didn’t have much, if any, impact on the tests run here, simply because 8-channel total memory subsystem across two sockets provides plenty of bandwidth either way.

The results are, anyway, the top of the hill by far, as there is simply no competition for those in the single or dual socket X86 space. Even with this minimal hop from the Sandy Bridge generation almost 18 months ago, Intel just increased the performance and efficiency distance between themselves and AMD. On the other hand, the power consumption did go down noticeably, and, as mentioned before, it is kind of shame that Intel’s top notch dual socket systems with 24 real cores and 8x memory takes less juice than AMD single socket one with 4 core pair blocks (they call it 8 cores, mind you).

Do you upgrade? For Sandy Bridge E5 Xeons, I think no point – the first reasonable upgrade to contemplate will be Xeon E5v3 Haswell EP a year later, or, more likely, the E5v4 Broadwell EP parts in 2015. However, for Nehalem generation users, I’d recommend this as a suitable ‘major performance & power benefit’ generation to upgrade to.

For serious multi core gamers using well threaded games, I feel a dualie E5v2 with either two 8-core 3.4 GHz or two 6-core 3.5 GHz (but lower internal latencies too) processors with 8-channel memory feeding four full bandwidth GPU and still having one more x8 PCIe slot for superfast SSD, may make good sense. A good 1200 – 1500W PSU can nicely feed the whole shebang with juice to spare.