Yes, folks, today Intel is introducing the long-anticipated new CPU code-named Ivy Bridge. The release of any new microprocessor comes with a tremendous amount of complex information, and Ivy is certainly no exception. Intel has handed over vast amounts of detail about its new chip, the products based on it, and their dizzying arrays of features. To that, we’ve added a boatload of test results comparing Ivy Bridge to her contemporaries. We’re practically bursting with info to share with you.

However, I’ve been reviewing CPUs for quite a while now, and I’ll let you in on a little secret. Sometimes, beneath all of the complexity, the scuttlebutt on a new chip is pretty simple. And, truth be told, that’s pretty much the case with Ivy Bridge. This new CPU is an incremental refinement of Sandy Bridge; its benefits are a slight jump in performance and a somewhat larger reduction in power consumption.

I’m oversimplifying, of course. The changes to Ivy Bridge’s integrated graphics are sweeping, for example, and there are a multitude of other tweaks worthy of note. Still, I see no reason not to give you the simple answer up front. Much of what follows is our attempt to distill a vast amount of information about Ivy Bridge down to the most relevant details and then to poke and prod this new chip to see how it compares. Because, you know, that’s what we do. We’ve had Ivy Bridge on the test bench here in Damage Labs for some time now, and we’ve focused most of our efforts on devising—and executing—new methods of testing CPUs. Ivy has been an intriguing test subject, and we hope to show you some things about her that you won’t learn anywhere else.

Into Ivy

Ivy Bridge comes into life with every advantage, because it is derived from Intel’s excellent Sandy Bridge processor. Ivy is a “tick” in Intel’s vaunted tick-tock development cadence, a familiar architecture ported to a new, smaller chip fabrication process. Thus, Ivy looks very much like her older sister in terms of overall layout; they both have quad CPU cores, 8MB of last-level cache, integrated graphics, and built-in PCI Express connectivity, all tied together by a high-speed communications ring. Ivy Bridge processors will drop into the same LGA1155 socket as Sandy Bridge CPUs, in fact.

The biggest change here is the transition from a 32-nm fab process with Sandy to a 22-nm process for Ivy. Don’t zone out when you hear those words, folks. This conversion is not at all trivial, even though Intel has made regular work of transitioning to new fabrication processes every couple of years. The drum-beat of Moore’s Law has continued apace only because Intel and others have sunk billions into the development of new chipmaking techniques, and these transitions are getting harder to achieve each time. Those pesky laws of physics are becoming ever more difficult to navigate at the nanometer level, which is why companies like GlobalFoundries (which makes AMD CPUs) and TSMC (which makes GPUs for both AMD and Nvidia) are still struggling to produce enough chips with the right characteristics at the 32/28-nm level. Meanwhile, Intel is at least one full generation ahead by shipping 22-nm Ivy Bridge chips in volume today.

In order to make that happen, the firm has fundamentally rebuilt the transistor using a three-dimensional structure that it calls the tri-gate transistor. Intel has claimed these new transistors offer “up to 37 percent performance increase at low voltage versus Intel’s 32nm planar transistors,” a property its had said will prove especially useful for “small handheld devices” like smart phones, whose low-power chips should be able to operate at considerably higher clock speeds. There’s another way to capitalize on the process improvements, too. The new transistors can deliver even larger power savings at the same operating speed as 32-nm chips; the firm has claimed power reductions of over 50% in that case.

These things are well and good, of course, but the trick is how they’ll translate into desktop processors like the Core i7-3770K we’re reviewing today. The claims cited above were expressly made about operation at relatively low voltages and clock speeds compared to those of current desktop CPUs. At clock speeds approaching 4GHz and their accompanying voltage levels, the advantages offered by Intel’s 22-nm process are more modest. Desktop CPUs are probably approaching the hairy end of the frequency-voltage curve for 22-nm chips, where exponential growth in power consumption really begins to ramp up. That reality, perhaps combined with the changing dynamics of the PC market, appears to have driven Intel to make an unusual decision with Ivy Bridge: to realize 22-nm process tech improvements in the form of power reductions, not speed increases, for its desktop processors. Rather than rolling out a bunch of new CPU models with higher clock speeds in the traditional power bands, Intel has elected to reduce desktop power envelopes and hold clock speeds more or less steady.

In fact, the Core i7-3770K’s basic CPU clocks and specs are very close to those of the Core i7-2600K introduced in January 2011. The 3770K’s base and Turbo clocks are 100MHz higher, just like the 2700K model released last October. Prices have largely held steady, too. The 2600K’s introductory price was $317, and it hasn’t dropped over the course of the past 16 months. The 3770K supplants it for $5 less. The only truly dramatic change is the reduction in TDP, from 95W for the top Sandy Bridge chips to 77W for their Ivy-based replacements.

This move will have some positive impacts, of course, but they’re not exactly the sort of price-performance gains that have made PC enthusiasts swoon in the past. Will folks be excited by claims like “reduced cubic volumes for desktop enclosures” or “easier integration into all-in-one systems?” I’m having a hard time imagining the banner ads. Of course, everything Intel is doing with Ivy Bridge makes a tremendous amount of sense for laptops and other types of mobile devices, which is where much of the PC market is headed.

Code name Key products Cores Threads Last-level cache size Process node (Nanometers) Estimated transistors (Millions) Die area (mm²) Lynnfield Core i5, i7 4 8 8 MB 45 774 296 Gulftown Core i7-970, 990X 6 12 12 MB 32 1168 248 Sandy Bridge Core i5, i7 4 8 8 MB 32 995 216 Sandy Bridge-E Core-i7-39xx 8 16 20 MB 32 2270 435 Ivy Bridge Core i5, i7 4 8 8 MB 22 1400 160 Deneb Phenom II 4 4 6 MB 45 758 258 Thuban Phenom II X6 6 6 6 MB 45 904 346 Llano A8, A6, A4 4 4 1 MB x 4 32 1450 228 Orochi/Zambezi FX 8 8 8 MB 32 1200 315

One bit of good news for somebody, whether it’s Intel shareholders or eventually consumers, is that Ivy Bridge should be very affordable to produce once Intel’s 22-nm process matures. At 160 mm² for the quad-core variant with the beefiest HD 4000 graphics, Ivy Bridge is easily one of the smallest desktop processors in recent years.

A closer look at the numbers above will give you a sense of how far ahead of the competition Intel truly is. It’s no secret that you can expect to see Ivy Bridge outperforming the FX-8150 processor, yet Ivy occupies almost half the die area of Zambezi—and Zambezi lacks integrated graphics and PCIe connectivity. The gap in TDPs between the two would be laughable, if it weren’t kind of dire. AMD’s true competitor in Ivy’s weight class is Llano, which has four cores and almost the same transistor budget, yet Llano is a larger chip because it’s fabbed on a 32-nm process. Llano’s prospects for matching Ivy in CPU performance are similar to my hometown Royals’ prospects for winning the A.L. Central.

Call it a tick-plus?

In spite of holding such a commanding lead, Intel hasn’t simply shrunk Sandy Bridge and left it at that. The ~50% increase in Ivy Bridge’s transistor count should be a clue that there’s much more going on here.

As often happens with “ticks,” Ivy’s CPU core microarchitecture has been tweaked in a host of small ways in order to improve per-clock performance. Intel architect Stephen Fischer told us he estimates the cumulative effect of those improvements to be a 4-6% gain in IPC, or instructions per clock. Among the changes is deeper pipelining in the divider unit, which should result in double the throughput for both integer and floating-point math. The cache prefetcher has gotten smarter and is able to cross page boundaries, allowing it to better track and anticipate complex access patterns. The prefetcher also has an adaptive mechanism to avoid hogging memory bandwidth; when queues grow too deep, it will throttle back its activity. There are other tweaks to improve Hyper-Threading (a few queues are now partitioned dynamically between two threads, rather than shared statically at 50-50) and AVX performance (more registers to help deal with memory access that cross cache lines).

The neatest trick is probably the virtualization of move operations; rather than moving data through the ALU, such operations can be accomplished via register renaming, so long as the source and destination datatypes are the same. Fischer told us this feature alone results in an IPC gain of roughly 1.5%.

Ivy has even added several new instructions. Some are related to a new feature intended to prevent escalation of privileges exploits. Another accesses a new on-chip digital random number generator, which will act as a high-quality entropy source for encryption algorithms of all types, not just the AES algorithm that’s already accelerated explicitly. Ivy also adds AVX instructions to convert quickly between 32-bit and 16-bit floating-point datatypes, allowing for high-precision 32-bit computation to be combined with more compact 16-bit storage.

All in all, the microarchitectural changes are fairly extensive for a “tick,” but they are just the tip of the iceberg. This chip has a number of new power-saving features, too numerous to recount in any detail here. One of the big ones is the power-gating of DDR memory at idle, which should help notebook battery life quite a bit. Also, interestingly enough, Intel now tests the optimal voltage for each chip at multiple frequencies and stores that information on the die, where it can be used by the power management controller. Previously, only two frequency points were tested, and the power controller would interpolate between them. Products with Turbo Boost enabled should presumably operate more efficiently, and Intel has some related tricks up its sleeve, such as products with configurable TDPs. A laptop chip could, for instance, operate at one TDP while on battery power and switch to a higher TDP when snapped into a docking station.

Meanwhile, Intel graphics architect Tom Piazza isn’t content to call Ivy Bridge a “tick” all all. He calls it a “tick+” because the graphics architecture has been extensively overhauled, more along the lines of what happens with a “tock” refresh on the CPU side. At IDF last fall, Piazza acknowledged some risk in introducing an “unknown” new graphics core in concert with a process shrink, especially because “the last thing you want to do at Intel is hold up a factory,” but the move was apparently a success. In fact, he said he saw no reason not to continue with major graphics architectural improvements like this one, particularly since “graphics move fast.”

Ivy’s new graphics core adds a broad range of new capabilities, in many ways bringing Intel up to feature parity with AMD’s Llano (and forthcoming Trinity) IGPs. The headliner is support for the DirectX 11 graphics API, with all that implies, including hardware tessellation capabilities and a broader selection of texture formats. Additionally, like most DX11 GPUs, Ivy’s IGP supports a range a compute-focused features, making it compatible with both Microsoft’s DirectCompute and the OpenCL 1.1 standard. As we understand it, all of the major compute-focused capabilities are truly present in the hardware, not just emulated in software, including double-precision FP datatypes, denorms, and support for atomic transactions.

The IGP’s execution unit count is up from Sandy Bridge—from 12 EUs to 16—but don’t let that number lead you astray. The EUs have been totally restructured in what amounts to a doubling of almost all resources versus Sandy Bridge, with the exception of memory bandwidth. Another interesting change is the addition of a 256KB L3 cache in the graphics core, a feature Piazza said was originally intended for Sandy Bridge but was “retracted” because it didn’t offer much performance benefit. Piazza claims this cache delivers an “amazing” reduction of bandwidth utilization between the graphics core and the 8MB last-level cache. Those reductions in ring traffic translate directly into power savings, which turns out to be the cache’s primary benefit.

Overall, it sounds like Intel is cleaning up quite a few loose ends with this IGP refresh. Although the firm has miles to go in catching AMD and Nvidia in terms of software support and game compatibility, we do expect changes like the expansion of texture formats to go a long way toward improving compatibility with existing games. Another issue Piazza says they’re cleaning up this time around is the anisotropic filtering algorithm, which in Sandy was highly variant depending on the surface’s angle of inclination. Now, he tells us, the IGP will “draw circles instead of flowers” in the aniso tunnel test. In part thanks to the doubling of texture samplers, the IGP’s media processing capabilities should be substantially faster, too, including QuickSync video encoding.

In a bit of a surprise to us, Intel has upped the number of discrete displays the IGP can support from two to three, of any major output type, including DVI, HDMI, and DisplayPort.

Piazza told us the IGP has been laid out in five physically distinct “slices” comprised of different resource types. Most notably on that front, the EU/texture sampler slice can be scaled up and down. The first use of that capability will surely be the lower-end versions of Ivy with Intel HD 2500 graphics, which should have 8 EUs and half the texturing capacity of the HD 4000. However, Piazza explicitly mentioned future “scale-up opportunities” in this context, as well. Hmmm. We’re unsure whether he was thinking of the next “tock” code-named Haswell or something more imminent.

A new-ish platform, too

Although Ivy Bridge fits into the same LGA1155 socket as Sandy, hardware compatibility will depend on the motherboard maker and chipset type. At the very least, motherboards based on Intel’s older 6-series chipsets will require a BIOS update to ensure compatibility with Ivy. Some of Intel’s business-focused chipsets officially won’t support Ivy Bridge at all.

Instead, Intel has introduced a range of new 7-series chipsets to go along with its new CPU. The one of most interest to enthusiasts will surely be the Z77 Express. We’ve already published a nice round-up of Z77 boards right here, for those who are interested. The only major update in the 7-series platform controller hub (PCH) silicon is the addition of support for USB 3.0. There are a few software enhancements, though, including the addition of a suspend-to-SSD feature inherited from Intel’s mobile offerings.

Above are a couple of pictures of the Core i7-3770K alongside the MSI Z77A-GD65 motherboard in our test system. As you can see, Ivy’s packaging will be difficult to distinguish from Sandy’s by looks alone.

Test notes

We’ve completely overhauled the portion of Damage Labs dedicated to desktop CPU testing for this review, and we’ve added a number of new tests and methods along the way, as well. Here’s a look at one of our new CPU test rigs, the one destined for Ivy Bridge:

Yep, we’ve mounted it in one of those slick open-air cases from MSI, which is just about ideal for our purposes. Sadly, this MSI case isn’t a commercial product, but stay tuned: we plan to give one away to a lucky reader shortly.

The rest of the hardware involved was provided by several companies who were kind enough to support our efforts. For this system, we used MSI’s Z77A-GD65 motherboard, as we’ve noted. Although they’re kind of hard to see in the pictures above, Corsair provided the Vengeance DIMMs, which are 4GB each and capable of 1600MHz operation at 1.5V. Corsair also supplied the AX650 power supply, which is very efficient at low loads and is incredibly quiet, particularly because it switches off its cooling fan under low loads.

That handsome graphics card is a Radeon HD 7950 DD Edition from XFX. These cards have granted our test systems a much higher ceiling, so we can test CPU performance in recent games at common resolutions without running into GPU bottlenecks. These cards draw very little power when idle, so they don’t contribute too much when we test system power draw and CPU efficiency. Last but not least, these Radeons are PCI Express 3.0-compatible, so they should be able to talk to Ivy Bridge at full speed.

Also kind of hidden in the first couple of pictures is the Kingston HyperX 120GB SSD. Based on the latest SandForce controller with synchronous NAND, this drive is one of the best SSD configs available. It’s also completely silent, very power efficient, and cuts our boot times between tests dramatically versus a hard disk drive.

We’ve built four of these test systems for the different CPU socket types out there, so we’re able to test multiple processors concurrently. Our Ivy review here has “only” seven different processor types included, but we expect to be able to expand that number over time and to include a range of different CPU vintages and socket types, just as we’ve done in the past. Just bear with us as we accumulate results with our new methods and test rigs. Fuller specifications for the individual test systems are available below.

Our testing methods

We ran every test at least three times and reported the median of the scores produced.

The test systems were configured like so:

Processor AMD

FX-8150 Phenom II X6 1100T Core

i7-2600K Core i7-3770K Core

i7-3960X Core i7-3820 AMD

A8-3850 Motherboard Asus

Crosshair V Formula MSI

Z77A-GD65 Intel

DX79SI Gigabyte

A75M-UD2H North bridge 990FX Z77

Express X79

Express A75

FCH South bridge SB950 Memory size 8 GB (2 DIMMs) 8 GB (2 DIMMs) 16 GB

(4 DIMMs) 8 GB

(2 DIMMs) Memory type AMD

Entertainment Edition DDR3 SDRAM Corsair Vengeance DDR3 SDRAM Corsair Vengeance DDR3 SDRAM Corsair Vengeance DDR3 SDRAM Memory speed 1600 MT/s 1600 MT/s 1600 MT/s 1600 MT/s Memory timings 9-9-9-24

1T 9-9-9-24

1T 9-9-9-24

1T 9-9-9-24

1T Chipset drivers AMD

chipset 12.3 INF

update 9.3.0.1020 iRST 11.1.0.1006 INF

update 9.2.3.1022 RSTe 3.0.0.3020 AMD

chipset 12.3 IGP drivers – 8.15.10.2696 – Catalyst

12.3 Audio Integrated SB950/ALC889 with Realtek 6.0.1.6602 drivers Integrated Z77/ALC898 with



Realtek 6.0.1.6602 drivers Integrated X79/ALC892 with Realtek 6.0.1.6602 drivers Integrated A75/ALC889 with Realtek 6.0.1.6602 drivers

They all shared the following common elements:

Hard drive Kingston

HyperX SH100S3B 120GB SSD Discrete graphics XFX

Radeon HD 7950 Double Dissipation 3GB with Catalyst 12.3 drivers OS Windows 7 Ultimate x64 Edition

Service Pack 1 (AMD systems only: KB2646060, KB2645594 hotfixes) Power supply Corsair

AX650

Thanks to Corsair, XFX, Kingston, MSI, Asus, Gigabyte, Intel, and AMD for helping to outfit our test rigs with some of the finest hardware available. Thanks to Intel and AMD for providing the processors, as well, of course.

We used the following versions of our test applications:

Some further notes on our testing methods:

The test systems’ Windows desktops were set at 1900×1080 in 32-bit color. Vertical refresh sync (vsync) was disabled in the graphics driver control panel.

We used a Yokogawa WT210 digital power meter to capture power use over a span of time. The meter reads power use at the wall socket, so it incorporates power use from the entire system—the CPU, motherboard, memory, graphics solution, hard drives, and anything else plugged into the power supply unit. (The monitor was plugged into a separate outlet.) We measured how each of our test systems used power across a set time period, during which time we encoded a video with x264.

After consulting with our readers, we’ve decided to enable Windows’ “Balanced” power profile for the bulk of our desktop processor tests, which means power-saving features like SpeedStep and Cool’n’Quiet are operating. (In the past, we only enabled these features for power consumption testing.) Our spot checks demonstrated to us that, typically, there’s no performance penalty for enabling these features on today’s CPUs. If there is a real-world penalty to enabling these features, well, we think that’s worthy of inclusion in our measurements, since the vast majority of desktop processors these days will spend their lives with these features enabled. We did disable these power management features to measure cache latencies, but otherwise, it was unnecessary to do so.

The tests and methods we employ are usually publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Memory subsystem performance

We typically kick off our CPU test results with a look at the performance of the memory subsystems, and I figure we might as well continue that tradition. These synthetic tests are intended to measure specific properties of the system and may not end up tracking all that closely with real-world application performance. Still, they can be enlightening.

No real surprises here. We’ve clocked the memory for all of these systems at 1600MHz with an aggressive 1T command rate, and the 3770K does as much with its dual memory channels as any of the other two-channel solutions—though not much more than its predecessor does.

This would be a good time to introduce the various contenders, I think. The 2600K is the Sandy Bridge incumbent, and it’s very similar to the Ivy-based Core i7-3770K in most regards, with the slight exception that its base and Turbo clock speeds are 3.4GHz and 3.8GHz, 100MHz slower than the 3770K’s respective speeds. The 2600K was the fastest Sandy Bridge derivative when that chip was first introduced, and it should be a nice foil to the 3770K throughout our tests. Yes, had I used a Core i7-2700K instead, we’d have had a true clock-for-clock comparison. What we have here is more of a price-parity comparison, since these two CPUs are priced only $5 apart.

Another interesting contender from Intel is the Core i7-3820, which we have not properly reviewed up to this point. The 3820 is a quad-core Sandy Bridge-E part; it shares the same 3.8GHz Turbo peak with the 2600K, but its base clock is 3.6GHz, 100MHz higher than the 3770K’s. The 3820 also has quad memory channels and a 10MB last-level cache. We’re curious to see how often its additional platform bandwidth can grant it an advantage over the regular Sandy and Ivy parts. At $294, the 3820 is moderately priced, probably to offset its higher platform costs.

The Core i7-3960X is the 3820’s big brother, a six-core Sandy Bridge-E monstrosity with a 3.3GHz base clock, a 3.9GHz Turbo peak, and a 15MB LLC. At $999, it is Intel’s fastest desktop processor to date—unless Ivy takes that crown in a bit of an upset. Obviously, the four memory channels on these processors give them substantially more bandwidth, as our test results indicate.

Finally, we have the three contenders from AMD. The FX-8150 is AMD’s fastest desktop processor, based on the new “Bulldozer” microarchitecture. Although it’s a large, eight-core chip, the FX-8150 lists at $245, substantially cheaper than the 3770K. The FX also has a much higher TDP, or thermal envelope, of 130W, like the Sandy-E parts. With dual channels of 1600MHz memory, it nearly extracts as much throughput as Ivy. The A8-3850 is more like Sandy and Ivy’s spiritual competitor, a smaller chip with quad cores and integrated graphics. However, the A8-3850 is based on an older CPU core with less aggressive prefetchers and no L3 cache, so it doesn’t do as much with its dual memory channels. Below that is the Phenom II X6, AMD’s prior desktop leader, before Bulldozer arrived. The X6 takes even less advantage of this relatively fast RAM, but you may be surprised by how well it keeps pace with the FX-8150 overall.

This test is multithreaded, so it captures the bandwidth of all caches on all cores concurrently. The different test block sizes step us down from the L1 and L2 caches into L3 and main memory. I think the short answer here is that Ivy’s internal caches are no faster or slower than Sandy’s. Most likely, the 100MHz difference in clock speeds explains the differences between the 3770K and the 2600K here.

This is a new latency testing tool. SiSoft has a nice write-up on it, for those who are interested. We used the “in-page random” access pattern to reduce the impact of prefetchers on our measurements. We’ve also taken to reporting the results in terms of CPU cycles, which is how this tool returns them. The problem with translating these results into nanoseconds, as we’ve done in the past with latency measurements, is that we don’t always know the clock speed of the CPU, which can vary depending on Turbo responses.

The only real divergence between Sandy and Ivy is at the 8MB data point, when we’re right at the edge of the last-level cache. I’d wager the difference there is due to Ivy’s improved cache prefetchers, which can cross page boundaries. Perhaps they’re not fooled by the in-page randomization in Sandra’s access pattern.

I’ve omitted a lot of the other CPUs for the sake of readability. The Intel chips all deliver very similar results. The FX-8150’s latencies don’t look so bad here, especially when you consider that its peak clock speed is 4.2GHz and that the entire architecture was apparently intended to run at even higher frequencies.

Some quick synthetic math tests

We don’t have a proper SPEC rate test in our suite (yet!), but I wanted to take a quick look at some synthetic computational benchmarks, to see how the different architectures compare, before we move on to more varied and robust application-based workloads. These simple tests in AIDA64 are nicely multithreaded and make use of the the latest instructions, including Bulldozer’s XOP in the CPU Hash test and FMA4 in the FPU Julia and Mandel tests. The latter two tests also use Intel’s FMA3 AVX instruction.

Looks to me like those estimates of 4-6% IPC gains from Sandy to Ivy are probably about right, although the 3770K also has a 100MHz advantage on the 2600K. I warn you, the question of IPC gains versus clock speed differences is going to haunt you in the following pages. My apologies in advance, folks.

The FX-8150 is competitive only in the CPU Hash test, where its eight integer cores and XOP instruction give it the advantage. Otherwise, in the two FPU-focused tests, the FX’s four AVX-capable floating-point units are distinctly disappointing. In theory we’d expect them to be matching Sandy and Ivy clock for clock, but nothing of the sort happens.

Power consumption and efficiency

Well, why wait, right? Let’s take a look at Ivy’s finest attribute, her increased power efficiency, in our first-real world test. Note that we’ve measured total system power consumption at the wall socket, so our results are taking account of the whole platform picture.

Our workload for this test was encoding a video with x264, based on a command ripped straight from the x264 benchmark you’ll see later. This encoding job is a two-pass process. The first pass is lightly multithread and will give us the chance to see how power consumption looks when mechanisms like SpeedStep and core power gating are is use. The second pass is more widely multithreaded.

There’s the story for Ivy Bridge, right there in those two plots, if you read ’em right. The 3770K draws less power than the 2600K, yet it finishes the job about five seconds faster. Let’s see what specifics we can derive from these data.

Ivy’s power draw at idle is very similar to Sandy’s, despite Ivy’s ~50% higher transistor count. Of the other solutions, only AMD’s A8-3850 comes close.

We measured the 2600K’s peak power draw at 17W higher than the 3770K’s. The gap between their TDP ratings? 18W. With Turbo Boost’s dynamic power management, both chips are likely reaching something close to their peaks and remaining there. And Ivy is clearly more efficient.

Here’s a look at energy consumed over our entire test period, where both active and idle time are taken into account. During the entire span, the 3770K’s combination of low peak power, quick execution, and relatively low idle power allows it to take the top spot.

Now we’re looking at just the energy consumed while the video was being encoded. Here, the 3770K’s lead on the other processors grows. Thanks to the benefits of Intel’s 22-nm process and some modest improvements in per-clock performance, Ivy Bridge is the most energy-efficient chip of this bunch by a considerable margin.

The Elder Scrolls V: Skyrim

Now for something completely different.

Yep, it’s time for some game benchmarking, but not, perhaps, as you know it. We tested performance using Fraps while taking a stroll around the town of Whiterun in Skyrim. The game was set to the graphical quality settings shown above. Note that we’re using fairly high quality visual settings, basically the “ultra” presets at 1920×1080 but with FXAA instead of MSAA. Our test sessions lasted 90 seconds each, and we repeated them five times per CPU.

The thing is, as we tested, we were recording the time required to produce every single frame of the animation in the game. Our reasoning behind this madness is explained in my article, Inside the second: A new look at game benchmarking. Much of what we said in that article was oriented toward GPU testing, but the same methods of game benchmarking can apply to CPUs, as well. This is our first chance to give those methods a try in the context of a CPU review, so we’re excited to see what happens.

Frame time

in milliseconds FPS

rate 8.3 120 16.7 60 20 50 25 40 33.3 30 50 20

Here’s a crack at explaining the reasons behind our new testing methods. The constant stream of images produced by a game engine as you play creates the illusion of motion. We often talk about gaming performance in terms FPS, or frames per second, but most of the tools that measure gaming performance actually average out frame production over an entire second in order to give you a result. That’s not terribly helpful. If you encounter a delay of half a second, or 500 ms, for a single frame surrounded by a stream of lightning-quick 16.7 ms frames, that entire second will average out to about 35 FPS. Most folks will look at that FPS number and think the performance was reasonably acceptable, if not stellar. (Because, hey, a stream of frames at a constant 35 FPS wouldn’t be half bad.) They will, of course, be very wrong. Even a shorter interruption of, say, 200 ms or less while playing a game will feel like an eternity, destroying the illusion of motion and any sense of immersion—and possibly getting your character killed.

Fortunately, we have the tools to measure and quantify gaming performance in much greater detail, and we can bring those to bear in considering CPU performance. Let’s start by looking at plots of the time required to produce individual frames during one of our test runs. (We’ve used just one run for the visualizations, but the rest of our results take all five runs into account.) Remember, since we’re looking at frame times, lower is better in these plots. Also, if you want to convert FPS because it’s more familiar, you can simply refer to the table on the right.

As you can see, the raw data show some clear differences in performance between the CPUs. The faster processors tend to produce more frames, of course. There are spikes in frame times for all of the processors, but the sizes of the spikes tend to be larger in certain cases. Some frames take quite a bit of time to produce, which isn’t good. The AMD chips especially seem to struggle during the opening moments of our test run, where we’re up by the Jarl’s castle, looking out over Whiterun and the mountains beyond.

The traditional FPS average gives us a sense of the performance differences. Obviously, the Core i7-3770K acquits itself well in this test, as do all of the Intel CPUs. The AMD processors are all quite a bit slower. However, even the slowest one averages over 60 FPS. Doesn’t that mean all of the processors are more than adequate for this task?

Not necessarily, as those spikes in frame times tend to show.

Another way of thinking about gaming performance is in terms of real-time frame latencies. That is, after all, what smooth animation relies upon. We’ve borrowed a bit from the transaction latency measurements in the server benchmarking world and suggested that a look at the 99th percentile frame latency might be a good starting point for this approach. This metric simply offers a bit of information, telling you that 99% of all frames were produced in x milliseconds or less. It’s a simple way of thinking about overall frame delivery.

Here, all of the Intel processors again perform very well. They’re cranking out 99% of all frames in the 17-18 ms range, not far from the 16.7-ms frame time that equates to a steady 60 FPS. The 99th percentile frame latencies for the AMD chips are nearly double that.

Then again, this metric only considers one select point where 99% of all frames have been produced. We can look at the entire latency picture for each CPU by plotting the latency curve from the 50th percentile up.

The contest between the Intel processors is incredibly tight. For most intents and purposes, they are all evenly matched.

Things become more interesting when we look at the AMD CPUs. The Phenom II X6 and the FX-8150 are essentially tied in both the average FPS and 99th percentile results. However, a funny thing happens to the FX-8150 while it’s rendering the toughest 5% of the frames, on the right edge of the plot: its frame times shoot up above the Phenom II X6’s. That outcome is likely the result of a unique characteristic of the Bulldozer architecture: its relatively low per-thread performance in many cases. When this real-time system, the Skyrim game engine, runs into a trouble spot, the FX-8150 doesn’t have the per-thread oomph to power through. I’d say the Phenom II X6 is a better Skyrim companion than the FX-8150, as a result. (Although Lydia is still the best.)

We are, of course, splitting hairs a bit here, just because we can. Even frame latencies in the 30-plus millisecond range are relatively decent. One reality check we can give ourselves is to consider the worst-case scenarios, those long-latency frames that are most likely to ruin the sense of smooth motion. We’ve done that in the past, with GPUs, by looking at the amount of time spent rendering frames beyond a threshold of 50 milliseconds. 50 ms equates to 20 FPS, and we figure if you dip below 20 FPS, most folks are going to notice. However, none of these CPUs deliver frames that slowly. Our next obvious step down is 33.3 milliseconds, or 30Hz. If you have vsync enabled while gaming on a 60Hz monitor, frames that take longer than 33.3 milliseconds won’t be shown until two full display refresh cycles have passed.

None of these CPUs spend much time at all working on frames that take longer than our 33.3 ms threshold. However, we can ratchet things down one more time, to 16.7 milliseconds or a constant 60 FPS, and see what happens then.

If you are looking for glassy smooth animation in Skyrim, any of these Intel CPUs will deliver it. Interestingly enough, the Ivy Bridge chip with its slightly improved per-clock performance has an ever-so-slim lead over even the mighty Core i7-3960X. The AMD processors, meanwhile, spend quite a bit of time working on frames beyond 16.7 ms. They’re not poor performers here, but the Intel processors ensure more consistent low-latency frame delivery.

Batman: Arkham City

Now that we’ve established our evil methods, we can deploy them against Batman. Again, we tested in 90-second sessions, this time while grappling and gliding across the rooftops of Gotham in a bit of Bat-parkour. Again, we’re using pretty decent image quality settings at two megapixels; we’re just avoiding this game’s rather pokey DirectX 11 mode.

These plots are much spikier than what we saw in Skyrim, and they’re consistent with what we’ve seen from this game in the past, in GPU testing. The severity of those spikes looks to be somewhat CPU-dependent, which could prove interesting.

Once again, nearly all of the solutions average over 60 FPS. The Intel chips score higher, but not by as wide a margin as in Skyrim.

The latency picture is pretty remarkable. The Intel chips fare better across the entire curve, including our stopping point at the 99th percentile. Once again, there’s little difference between them. The AMD CPUs simply require more time to render frames.

As we saw in Skyrim, the Core i7-3770K fares best of all the processors when dealing with the worst-case scenarios. Regardless of where we put our thresholds—at the equivalent of 20, 30, or 60 FPS—the Intel processors fare better. In spite of averaging over 60 FPS, the FX-8150 and Phenom II X6 burn quite a few cycles on long-latency frames. Having a faster CPU doesn’t mean that this game’s frequent latency spikes are eliminated, but it means their durations are reduced to much less consequential levels.

Crysis 2

Our test session in Crysis 2 was only 60 seconds long, mostly for the sake of ensuring a precisely repeatable sequence. Also, we got to stealth kill two Cell soldier dudes with a knife to the chest and a neck snap, which was great for taking out some aggression.

Yeah, those plots are hard to read due to the nature of the data. Sorry about that. The first thing you’ll probably notice here is that big spike near the beginning of the test run on every single CPU. We noticed that while playing; it appears to be a bit of a hitch in the game, probably because the next area is being loaded or something. Let’s zoom in on that portion of the sequence and see how it looks.

The spike happens on every single CPU, but notice that it appears to be at least partially CPU dependent. We’re waiting for nearly a third of a second on the A8-3850, while Ivy Bridge gets past the problem in under half that time.

The FPS averages are closer than ever between the Intel and AMD camps, and there’s essentially no difference between the various Intel chips once more.

The latency situation is a bit different in this game. Several of the processors have funny shapes to their curves. However, even the 99th percentile frame times are in the twenties for all CPUs, so things never get to be terribly difficult for any of them.

The 3770K continues to be the champ at ensuring consistently low-latency frame times, although the margin between it and the Core i7-2600K is pretty tiny, in the grand scheme of things.

Battlefield 3

As with Crysis 2, our BF3 test sessions were 60-seconds long to keep them easily repeatable. We tested at BF3‘s high-quality presets, again at 1920×1080.

You know how some people say that CPUs don’t matter for gaming performance, since they’re all fast enough these days? Here’s a case where that’s actually true. Have a look at all of our metrics, and they all agree.

Any of these CPUs will spit out 99% of the frames rendered at a near-constant 60 FPS rate in BF3. The few spikes we do see don’t add up to much of anything, with roughly a tenth of a second spent rendering beyond our 16.7-ms cutoff, generally.

Multitasking: Gaming while transcoding video

A number of readers over the years have suggested that some sort of real-time multitasking test would be a nice benchmark for multi-core CPUs. That goal has proven to be rather elusive, but we think our new game testing methods may allow us to pull it off. What we did is play some Skyrim, with a 60-second tour around Whiterun, using the same settings in our earlier gaming test. In the background, we had Windows Live Movie Maker transcoding a video from MPEG2 to H.264, just like in our stand-alone video encoding test. Here’s a look at the quality of our Skyrim experience while encoding.

Overall, these processors handle the dual workloads quite well. As with x264, encoding in Windows Live Movie Maker appears to be a two-pass deal, with the number of threads rising later in the process. We kicked off a new encoding job before starting each test run, so we never got to the later, more heavily threaded encoding workload during our Skyrim runs. With the exception of the A8-3850, all of these processors support at least six simultaneous threads, so they didn’t seem to be too burdened by what we were asking them to do.

We’re curious to add some lower-end chips to the mix, including the Hyper-Threading-deficient quad-core Core i5 parts that look to be good deals, like the 2500K and its Ivy-based analog, the 3570K. We’re interested to see if the lack of Hyper-Threading hinders multitasking smoothness. Among the processors we’ve tested, we can’t help but notice that the Core i7-3960X, with six cores and 12 threads, fares best. Still, the quad-core 3770K isn’t far off its pace.

Civilization V

We have one more gaming test to include before moving on to bigger and better things. This test is a simple scripted one that spits out an FPS average, because there are only so many hours in the day.

Civ V will run this benchmark in two ways, either while using the graphics card to draw everything on the screen, just as it would during a game, or entirely in software, without bothering with rendering, as a pure CPU performance test. Oddly enough, the 3770K comes out the clear winner in the conventional game test, but the six-core 3960X easily takes the top spot without the pesky graphics card getting in the way.

Productivity

Compiling code in GCC

Another persistent request from our readers has been the addition of some sort of code-compiling benchmark. With the help of our resident developer, Bruno Ferreira, we’ve finally put together just such a test. Qtbench tests the time required to compile the QT SDK using the GCC compiler. Here is Bruno’s note about how he put it together:

QT SDK 2010.05 – Windows, compiled via the included MinGW port of GCC 4.4.0. Even though apparently at the time the Linux version had properly working and supported multithreaded compilation, the Windows version had to be somewhat hacked to achieve the same functionality, due to some batch file snafus. After a working multithreaded compile was obtained (with the number of simultaneous jobs configurable), it was time to get the compile time down from 45m+ to a manageable level. This required severe hacking of the makefiles in order to strip the build down to a more streamlined version that preferably would still compile before hell froze over. Then some more fiddling was required in order for the test to be flexible about the paths where it was located. Which led to yet more Makefile mangling (the poor thing).

The number of jobs dispatched by the Qtbench script is configurable, and the compiler does some multithreading of its own, so we did some calibration testing to determine the optimal number of jobs for each CPU. We found that one job per core worked best on Llano/Phenom II, six on the quad-core Intel chips with Hyper-Threading, and eight on the Core i7-3960X and Bulldozer.

TrueCrypt disk encryption

TrueCrypt supports acceleration via Intel’s AES-NI instructions, so the encoding of the AES algorithm, in particular, should be very fast on the CPUs that support those instructions. We’ve also included results for another algorithm, Twofish, that isn’t accelerated via dedicated instructions.

7-Zip file compression and decompression

SunSpider JavaScript performance

The Ivy-based Core i7-3770K shows us a few flashes of likely IPC improvement in Qtbench, 7-Zip compression, and Sunspider, where it puts some distance between itself and the 2600K. We should also note many of these productivity tests are widely multithreaded and rely heavily on integer math rather than floating-point. As a result, the FX-8150 tends to be much more competitive than it was in most of our gaming tests.

Image processing

The Panorama Factory photo stitching

The Panorama Factory handles an increasingly popular image processing task: joining together multiple images to create a wide-aspect panorama. This task can require lots of memory and can be computationally intensive, so The Panorama Factory comes in a 64-bit version that’s widely multithreaded. I asked it to join four pictures, each eight megapixels, into a glorious panorama of the interior of Damage Labs.

In the past, we’ve added up the time taken by all of the different elements of the panorama creation wizard and reported that number, along with detailed results for each operation. However, doing so is incredibly data-input-intensive, and the process tends to be dominated by a single, long operation: the stitch. Thus, we’ve simply decided to report the stitch time, which saves us a lot of work and still gets at the heart of the matter.

picCOLOR image processing and analysis

picCOLOR was created by Dr. Reinert H. G. Müller of the FIBUS Institute. This isn’t Photoshop; picCOLOR’s image analysis capabilities can be used for scientific applications like particle flow analysis. Dr. Müller has supplied us with new revisions of his program for some time now, all the while optimizing picCOLOR for new advances in CPU technology, including SSE extensions, multiple cores, and Hyper-Threading. Many of its individual functions are multithreaded.

At our request, Dr. Müller graciously agreed to re-tool his picCOLOR benchmark to incorporate some real-world usage scenarios. As a result, we now have four tests that employ picCOLOR for image analysis: particle image velocimetry, real-time object tracking, a bar-code search, and label recognition and rotation. For the sake of brevity, we’ve included a single overall score for those real-world tests, along with an overall score for picCOLOR’s suite of synthetic tests of different image processing functions.

Video encoding

x264 HD benchmark

This benchmark tests one of the most popular H.264 video encoders, the open-source x264. The results come in two parts, for the two passes the encoder makes through the video file. I’ve chosen to report them separately, since that’s typically how the results are reported in the public database of results for this benchmark.

Windows Live Movie Maker 14 video encoding

For this test, we used Windows Live Movie Maker to transcode a 30-minute TV show, recorded in 720p .wtv format on my Windows 7 Media Center system, into a 320×240 WMV-format video format appropriate for mobile devices.

Remember how I said the question of what gains to attribute to the 3770K’s 100MHz clock speed advantage and what gains to credit to Ivy’s IPC improvements would haunt you? Yep.

3D rendering

LuxMark OpenCL rendering

We’ve deployed LuxMark in several recent reviews to test GPU performance. Since it uses OpenCL, we can also use it to test CPU performance—and even to compare performance across different processor types. Since OpenCL code is by nature parallelized and relies on a real-time compiler, it should adapt well to new instructions. For instance, Intel and AMD offer integrated client drivers for OpenCL on x86 processors, and they both claim to support AVX. The AMD APP driver even supports Bulldozer’s distinctive instructions, FMA4 and XOP.

We decided to test with both of the ICDs when possible. LuxMark will let you specify which OpenCL devices to use, so we asked it to use the Radeon HD 7950 GPUs in our test systems, as well, for a bit of dramatic flair—and to see if the different CPUs acting in support had any effect on the GPU’s performance. Finally, we combined two devices, the AMD APP x86 ICD and the Radeon HD 7950, to see if a CPU and GPU could team up to complete the job faster than either one could alone.

Funny thing: the AMD APP ICD runs faster on Intel chips than Intel’s own OpenCL driver. Meanwhile, the Intel driver refuses to run on the non-AVX-infused AMD chips.

The fastest processor here, by far, is the Radeon HD 7950. The Core i7-3770K has to settle for a distant third, but it’s the undisputed champ of its weight class. Happily, the poor FX-8150 doesn’t look to be as completely outclassed as it was in our earlier synthetic AVX tests, although none of the AMD CPUs like being asked to team up with a Radeon. Shades of corporate politics, perhaps? Meanwhile, the Intel CPUs can contribute to a higher overall score while also supporting the Radeon HD 7950.

Cinebench rendering

The Cinebench benchmark is based on Maxon’s Cinema 4D rendering engine. It’s multithreaded and comes with a 64-bit executable. This test runs with just a single thread and then with as many threads as CPU cores (or threads, in CPUs with multiple hardware threads per core) are available.

POV-Ray rendering

The 3770K fares well enough, if predictably, in the rest of our rendering tests, although they’re not quite as exciting as the new OpenCL hotness, in my view.

Scientific computing

MyriMatch proteomics

Our benchmarks sometimes come from unexpected places, and such is the case with this one. David Tabb is a friend of mine from high school and a long-time TR reader. He has provided us with an intriguing new benchmark based on an application he’s developed for use in his research work. The application is called MyriMatch, and it’s intended for use in proteomics, or the large-scale study of protein. I’ll stop right here and let him explain what MyriMatch does:

In shotgun proteomics, researchers digest complex mixtures of proteins into peptides, separate them by liquid chromatography, and analyze them by tandem mass spectrometers. This creates data sets containing tens of thousands of spectra that can be identified to peptide sequences drawn from the known genomes for most lab organisms. The first software for this purpose was Sequest, created by John Yates and Jimmy Eng at the University of Washington. Recently, David Tabb and Matthew Chambers at Vanderbilt University developed MyriMatch, an algorithm that can exploit multiple cores and multiple computers for this matching. Source code and binaries of MyriMatch are publicly available.

In this test, 5555 tandem mass spectra from a Thermo LTQ mass spectrometer are identified to peptides generated from the 6714 proteins of S. cerevisiae (baker’s yeast). The data set was provided by Andy Link at Vanderbilt University. The FASTA protein sequence database was provided by the Saccharomyces Genome Database. MyriMatch uses threading to accelerate the handling of protein sequences. The database (read into memory) is separated into a number of jobs, typically the number of threads multiplied by 10. If four threads are used in the above database, for example, each job consists of 168 protein sequences (1/40th of the database). When a thread finishes handling all proteins in the current job, it accepts another job from the queue. This technique is intended to minimize synchronization overhead between threads and minimize CPU idle time.

The most important news for us is that MyriMatch is a widely multithreaded real-world application that we can use with a relevant data set. I should mention that performance scaling in MyriMatch tends to be limited by several factors, including memory bandwidth, as David explains:

Inefficiencies in scaling occur from a variety of sources. First, each thread is comparing to a common collection of tandem mass spectra in memory. Although most peptides will be compared to different spectra within the collection, sometimes multiple threads attempt to compare to the same spectra simultaneously, necessitating a mutex mechanism for each spectrum. Second, the number of spectra in memory far exceeds the capacity of processor caches, and so the memory controller gets a fair workout during execution.

This time around, we’re using a brand-new MyriMatch binary with a larger data set, so our results won’t be comparable to the ones you’ve seen here in the past.

STARS Euler3d computational fluid dynamics

Charles O’Neill works in the Computational Aeroservoelasticity Laboratory at Oklahoma State University, and he contacted us to suggest we try the computational fluid dynamics (CFD) benchmark based on the STARS Euler3D structural analysis routines developed at CASELab. This benchmark has been available to the public for some time in single-threaded form, but Charles was kind enough to put together a multithreaded version of the benchmark for us with a larger data set. He has also put a web page online with a downloadable version of the multithreaded benchmark, a description, and some results here.

In this test, the application is basically doing analysis of airflow over an aircraft wing. I will step out of the way and let Charles explain the rest:

The benchmark testcase is the AGARD 445.6 aeroelastic test wing. The wing uses a NACA 65A004 airfoil section and has a panel aspect ratio of 1.65, taper ratio of 0.66, and a quarter-chord sweep angle of 45º. This AGARD wing was tested at the NASA Langley Research Center in the 16-foot Transonic Dynamics Tunnel and is a standard aeroelastic test case used for validation of unsteady, compressible CFD codes.

The CFD grid contains 1.23 million tetrahedral elements and 223 thousand nodes . . . . The benchmark executable advances the Mach 0.50 AGARD flow solution. A benchmark score is reported as a CFD cycle frequency in Hertz.

So the higher the score, the faster the computer. Charles tells me these CFD solvers are very floating-point intensive, but they’re oftentimes limited primarily by memory bandwidth. He has modified the benchmark for us in order to enable control over the number of threads used. Here’s how our contenders handled the test with optimal thread counts for each processor.

Well, lookie there. I’ll bet you were getting tired of seeing the exact same finishing order for the top four processors. Right before we conclude our tests, the Core i7-3820 finally manages to overtake the 3770K for once, likely thanks to the bandwidth provided by its two extra memory channels. The 3820 doesn’t often appear to benefit from the additional bandwidth, but it does in the final test in our CPU performance suite.

Overclocking

The overclocking-related knobs and dials of recent Intel processors. Source: Intel.

Ivy Bridge offers a few additional tweaking opportunities over Sandy, including slightly higher peak multipliers, memory speeds up to 2667MHz, and the ability to control memory clocks in 200MHz increments. Ivy doesn’t offer the additional flexibility of allowing for multiple base clock speeds, like Sandy Bridge-E does, however.

Fortunately, since our Ivy-based subject is a K-series model with an unlocked multiplier, we didn’t have to worry about fiddling with the base clock. If you care at all about overclocking, we think it’s worth paying a few extra bucks for a K-series part.

I’ve been knee-deep in other work, so Geoff stepped in to handle our overclocking experiments with Ivy. He’s written up his experiences right here, if you’re interested. The bottom line was that our 3770K sample overclocked quite nicely, to 4.4GHz at its stock voltage and to 4.9GHz at 1.35V. However, something funny happened on the way to 5GHz: even with a massive Thermaltake Frio cooler rated for 220W of heat dissipation, our 3770K reached the boiling point of water and began thermal throttling. In other words, our cooler ran out of thermal headroom before our Ivy Bridge chip ran out of clock speed headroom. Geoff checked power consumption, and it turns out the 3770K was indeed drawing enough power to tax that beefy cooler.

So Ivy Bridge appears to be a pretty willing overclocker, but if you’re planning on raising the voltage much above stock, you’d better bring along a good cooler. Here’s a look at how our 3770K performed at 4.9GHz, the highest speed we could maintain without invoking thermal throttling. Note that these scores came from a different test system than our usual one, with an Asus motherboard, although with the same memory speed and timings.

Performance does scale up nicely at 4.9GHz, I’d say.

IGP performance: Skyrim

Before we conclude, let’s take a quick look at how Ivy’s integrated graphics compare to Sandy’s and Llano’s in a couple of recent games. Up first is Skyrim, again in a 60-second loop around Whiterun, at the settings shown below.

Ivy’s HD 4000 graphics have closed the FPS gap with the A8’s integrated Radeon substantially, but the A8 still leads in the FPS sweeps. Look what happens when we consider the frame latency picture, though.

The A8 produces more frames and thus achieves a higher FPS average, but it also has quite a few more spikes caused by longer-latency frames. As a result, the A8’s advantage over the 3770K evaporates as we approach the 99th percentile, and the last 1% of frames are higher latency on the AMD APU.

Those long-latency frames contribute to the A8 spending more time rendering beyond our 33.3-ms threshold. It’s close, but we’d say the 3770K provides a smoother gaming experience, both by the numbers and by the seat of our pants.

As you know, Skyrim is somewhat sensitive to CPU performance, so it’s possible—perhaps even likely—that the A8-3850’s relatively pokey CPU cores could be contributing to those long frame times. The A8 did fare a little better in our earlier test with a discrete graphics card, but remember that Llano will throttle back its CPU cores in order to clear out enough thermal headroom for its IGP. Dynamic power management in Llano is a one-way street.

IGP performance: Battlefield 3

As you may recall, Battlefield 3 tends not to be CPU limited with any of these processors. In that sort of game, the A8-3850 manages to outperform the 3770K any way you measure it.

Still, Intel’s new IGP has closed the gap with AMD’s Llano substantially. We’d say AMD should be concerned, if we weren’t expecting a similar leap in graphics performance from AMD’s own upcoming Trinity processor, which should be arriving very soon.

AMD’s bigger concern, perhaps, might be what happened in Skyrim. If the CPU portion of the processor becomes a limiting factor, then Intel doesn’t have to match the performance of AMD’s integrated Radeons in order to provide a better overall gaming chip.

IGP performance: Luxmark

One more crazy experiment before we tie things up. Intel’s new IGP supports OpenCL 1.1, so how does it compare to Llano’s IGP on that front?

AMD’s old IGP is faster in LuxMark than Intel’s newer one, but, well, they’re both pretty slow—vastly slower than their own CPU cores in this nicely parallel workload, in fact. There is a little bit of performance to be gained by throwing the CPU cores and IGP at the same workload, though. This outcome raises some interesting philosophical questions about the relative worth of the CPU and IGP components of these integrated processors, but we’ll save that discussion for a later date.