Intel is on the verge of transitioning to 32nm. We'll see the first parts this year. What do you do with your 45nm fabs when you start moving volume away from them? Make really cheap quad-core Nehalems of course:

I'm talking $196. I'm talking faster than AMD's entire lineup. I'm talking about arguably the best processor of 2009. I'm talking about Lynnfield, and here's its backside:



Mmm

I spent much of the past year harping on AMD selling Nehalem-sized Phenom IIs for less than Intel sold Nehalems. With Lynnfield, Intel actually made Nehalem even bigger all while driving prices down. Like I said, what do you do when you're still making boatloads of money in a recession and are about to start emptying your 45nm fabs?

I should clear things up before we progress much further. Lynnfield is the codename for mainstream 45nm quad-core Nehalem, while Bloomfield refers to the first Nehalem launched at the end of 2008:

Processor Manufacturing Process Die Size Transistor Count Socket Bloomfield 45nm 263 mm2 731M LGA-1366 Lynnfield 45nm 296 mm2 774M LGA-1156

Despite being cheaper, Lynnfield is larger than Bloomfield. The larger die is due to one major addition: an on-die PCIe controller.



Bloomfield, The First Nehalem, circa 2008



Lynnfield, Nehalem for All, circa 2009

The pink block to the right of the die is the PCIe controller, that's 16 PCIe 2.0 lanes coming right off the chip. Say hello to ultra low latency GPU communication. You'd think that Intel was about to enter the graphics market or something with a design like this.

Sacrifices were made to reduce CPU, socket and board complexity. Gone are the two QPI links that each provided 25.6GB/s of bandwidth to other CPUs or chips on the motherboard. We also lose one of the three 64-bit DDR3 memory channels, Lynnfield only has two like a normal processor (silly overachieving Bloomfield).



Intel's Bloomfield Platform (X58 + LGA-1366)

The sum is that Lynnfield is exclusively single-socket; there will be no LGA-1156 Skulltrail. While the dual-channel memory controller isn't really a limitation for quad-core parts, six and eight core designs may be better suited for LGA-1366.



Intel's Lynnfield Platform (P55 + LGA-1156)

The loss of QPI means that Lynnfield doesn't have a super fast connection to the rest of the system, but with an on-die PCIe controller it doesn't matter: the GPU is fed right off the CPU.

The Lineup

We get three Lynnfield CPUs today: the Core i7 870, Core i7 860 and the Core i5 750. Intel's branding folks told us that the naming would make sense one we saw the rest of the "Core" parts introduced; yeah that was pretty much a lie. At least there aren't any overlapping part numbers (e.g. Core i5 860 and Core i7 860).

The i7 in this case denotes four cores + Hyper Threading, the i5 means four cores but no Hyper Threading. The rules get more complicated as you bring notebooks into the fray but let's momentarily bask in marginal simplicity.

Processor Clock Speed Cores / Threads Maximum Single Core Turbo Frequency TDP Price Intel Core i7-975 Extreme 3.33GHz 4 / 8 3.60GHz 130W $999 Intel Core i7 965 Extreme 3.20GHz 4 / 8 3.46GHz 130W $999 Intel Core i7 940 2.93GHz 4 / 8 3.20GHz 130W $562 Intel Core i7 920 2.66GHz 4 / 8 2.93GHz 130W $284 Intel Core i7 870 2.93GHz 4 / 8 3.60GHz 95W $562 Intel Core i7 860 2.80GHz 4 / 8 3.46GHz 95W $284 Intel Core i5 750 2.66GHz 4 / 4 3.20GHz 95W $196

Keeping Hyper Threading off of the Core i5 is purely done to limit performance. There aren't any yield reasons why HT couldn't be enabled.

Intel was very careful with both pricing and performance of its Lynnfield processors. I'm going to go ahead and say it right now, there's no need for any LGA-1366 processors slower than a Core i7 965:

This is only one benchmark, but it's representative of what you're about to see. The Core i7 870 (LGA-1156) is as fast, if not faster, than every single LGA-1366 processor except for the ones that cost $999. Its pricing is competitive as well:

For $196 you're getting a processor that's faster than the Core i7 920. I'm not taking into account motherboard prices either, which are anywhere from $50 - $100 cheaper for LGA-1156 boards. I don't believe LGA-1366 is dead, but there's absolutely no reason to buy anything slower than a 965 if you're going that route.

The LGA-1156 Socket: Size and Installation

The first Core i7, Bloomfield, went into a 1366-pin LGA socket:

A year later we have Lynnfield, and it fits in a much tighter space:

The LGA-1156 socket and Lynnfield CPUs are about as big as the old LGA-775 sockets/chips:



From Left to Right: Intel Core i7 "Bloomfield" (LGA-1366), Intel Core i7 "Lynnfield" (LGA-1156), Intel Core 2 Quad "Yorkfield" (LGA-775)



Note the pad densitiy of Lynnfield vs. LGA-775 processors

The installation process is largely the same as any other Intel LGA socket, the difference being that LGA-1156 uses a new one-sided retention mechanism.

After the socket is "open", gently place the CPU on top of the pins. The chip can only fit in one direction so just pay attention:

With the chip in the socket and the lever still pulled back, move the socket cover over the CPU and slide its teeth under the retention screw on the opposite side:

Then, lower the lever, lock it in place and you're good to go:

New Heatsinks and Motherboards

LGA-1156 processors use a different heatsink than both LGA-1366 and LGA-775 chips.



Lynnfield and its cooler

As the numbers would imply, the LGA-1156 heatsink has a larger footprint than LGA-775 but smaller than LGA-1366.



From Left to Right: Retail LGA-1366 Cooler, Retail LGA-1156 Cooler, Retail 45nm LGA-775 Cooler

The retail LGA-1156 is actually much closer to the 45nm LGA-775 retail cooler than the LGA-1366 retail HSF:

As you'll see later on in the article, the retail cooler isn't very good for heavy overclocking. Power users will want something a little bigger:

The Lynnfield/P55 launch is huge. Virtually every single motherboard manufacturer has a P55 board available. Prices range from ~$110 - $300 depending on the number of bells and whistles.



Gigabyte's ultra high end UD6 (left) and Gigabyte's lower end micro-ATX UD4 (right)



Gigabyte's high end UD6 comes with 6 DIMM slots like its X58 brethren.

Micro-ATX is increasing in popularity and we actually have some good options this time if you're trying to build a smaller Lynnfield system. Combined with Lynnfield's excellent idle power (the lowest of any quad-core we've ever tested), this could make for an unusually potent HTPC.



A closer look at Gigabyte's micro-ATX P55M-UD4

The only thing we're really missing is a good mini-ITX Lynnfield board. But perhaps the manufacturers will wait until we have on-package graphics before going down that route...

One More Time: New H55 Boards Next Year

As I subtley implied at the end of the last section, Intel is bringing on-package graphics to Nehalem starting in Q4 of this year:

The 32nm Nehalem shrink, codenamed Westmere, will be available with a 45nm Intel graphics core on the processor's package. This graphics core is an evolution of what's currently in the G45 chipset and not Larrabee (although eventually that will change). From what I've heard, this is actually going to be Intel's first reasonably good integrated graphics core.

With the graphics on-package, there needs to be an interface from the processor socket to video output located on the motherboard. As you can see from the P55 motherboards that are launching today: none of them have this video out. Granted there aren't any CPUs out to take advantage of it either.



No DVI/HDMI/VGA out...yet

Early next year (or maybe even late this year) we'll see a new breed of LGA-1156 motherboards with video output, designed for use with these Westmere IGP parts. Rumor has it that these motherboards will use Intel's H55 chipset.

Lynnfield early adopters need not worry, 32nm quad-core processors won't be out for at least a year.

Homework: How Turbo Mode Works

AMD and Intel both figured out the practical maximum power consumption of a desktop CPU. Intel actually discovered it first, through trial and error, in the Prescott days. At the high end that's around 130W, for the upper mainstream market that's 95W. That's why all high end CPUs ship with 120 - 140W TDPs.

Regardless of whether you have one, two, four, six or eight cores - the entire chip has to fit within that power envelope. A single core 95W chip gets to have a one core eating up all of that power budget. This is where we get very high clock speed single core CPUs from. A 95W dual core processor means that individually the cores have to use less than the single 95W processor, so tradeoffs are made: each core runs at a lower clock speed. A 95W quad core processor requires that each core uses less power than both a single or dual core 95W processor, resulting in more tradeoffs. Each core runs at a lower clock speed than the 95W dual core processor.

The diagram below helps illustrate this:

Single Core Dual Core Quad Core Hex Core TDP Tradeoff

The TDP is constant, you can't ramp power indefinitely - you eventually run into cooling and thermal density issues. The variables are core count and clock speed (at least today), if you increase one, you have to decrease the other.

Here's the problem: what happens if you're not using all four cores of the 95W quad core processor? You're only consuming a fraction of the 95W TDP because parts of the chip are idle, but your chip ends up being slower than a 95W dual core processor since its clocked lower. The consumer has to thus choose if they should buy a faster dual core or a slower quad core processor.

A smart processor would realize that its cores aren't frequency limited, just TDP limited. Furthermore, if half the chip is idle then the active cores could theoretically run faster.

That smart processor is Lynnfield.

Intel made a very important announcement when Nehalem launched last year. Everyone focused on cache sizes, performance or memory latency, but the most important part of Nehalem was far more subtle: the Power Gate Transistor.

Transistors are supposed to act as light switches - allowing current to flow when they're on, and stopping the flow when they're off. One side effect of constantly reducing transistor feature size and increasing performance is that current continues to flow even when the transistor is switched off. It's called leakage current, and when you've got a few hundred million transistors that are supposed to be off but are still using current, power efficiency suffers. You can reduce leakage current, but you also impact performance when doing so; the processes with the lowest leakage, can't scale as high in clock speed.

Using some clever materials engineering Intel developed a very low resistance, low leakage, transistor that can effectively drop any circuits behind it to near-zero power consumption; a true off switch. This is the Power Gate Transistor.

On a quad-core Phenom II, if two cores are idle, blocks of transistors are placed in the off-state but they still consume power thanks to leakage current. On any Nehalem processor, if two cores are idle, the Power Gate transistors that feed the cores their supply current are turned off and thus the two cores are almost completely turned off - with extremely low leakage current. This is why nothing can touch Nehalem's idle power:

Since Nehalem can effectively turn off idle cores, it can free up some of that precious TDP we were talking about above. The next step then makes perfect sense. After turning off idle cores, let's boost the speed of active cores until we hit our TDP limit.

On every single Nehalem (Lynnfield included) lies around 1 million transistors (about the complexity of a 486) whose sole task is managing power. It turns cores off, underclocks them and is generally charged with the task of making sure that power usage is kept to a minimum. Lynnfield's PCU (Power Control Unit) is largely the same as what was in Bloomfield. The architecture remains the same, although it has a higher sampling rate for monitoring the state of all of the cores and demands on them.

The PCU is responsible for turbo mode.

Lynnfield's Turbo Mode: Up to 17% More Performance

Turbo on Bloomfield (the first Core i7) wasn't all that impressive. If you look back at our Core i7 article from last year you'll see that it's responsible for a 2 - 5% increase in performance depending on the application. All Bloomfield desktop CPUs had 130W TDPs, so each individual core had a bit more breathing room for how fast it could run. Lynnfield brings the TDP down around 27%, meaning each core gets less TDP to work with (the lower the TDP, the greater potential there is for turbo). That combined with almost a full year of improving yields on Nehalem means that Intel can be much more aggressive with Turbo on Lynnfield.

SYSMark 2007: Overall Dawn of War II Sacred 2 World of Warcraft Intel Core i7 870 Turbo Disabled 206 74.3 fps 84.8 fps 60.6 fps Intel Core i7 870 Turbo Enabled 233 81.0 fps 97.4 fps 70.7 fps % Increase from Turbo 13.1% 9.0% 14.9% 16.7%

Turbo on Lynnfield can yield up to an extra 17% performance depending on the application. The biggest gains will be when running one or two threads as you can see from the table below:

Max Speed Stock 4 Cores Active 3 Cores Active 2 Cores Active 1 Core Active Intel Core i7 870 2.93GHz 3.20GHz 3.20GHz 3.46GHz 3.60GHz Intel Core i7 860 2.80GHz 2.93GHz 2.93GHz 3.33GHz 3.46GHz Intel Core i5 750 2.66GHz 2.80GHz 2.80GHz 3.20GHz 3.20GHz

If Intel had Turbo mode back when dual-cores first started shipping we would've never had the whole single vs. dual core debate. If you're running a single thread, this 774M transistor beast will turn off three of its cores and run its single active core at up to 3.6GHz. That's faster than the fastest Core 2 Duo on the market today.



WoW doesn't stress more than 2 cores, Turbo mode helps ensure the i7 870 is faster than Intel's fastest dual-core CPU

It's more than just individual application performance however, Lynnfield's turbo modes can kick in when just interacting with the OS or an application. Single threads, regardless of nature, can now execute at 3.6GHz instead of 2.93GHz. It's the epitomy of Intel's hurry up and get idle philosophy.

The ultimate goal is to always deliver the best performance regardless of how threaded (or not) the workload is. Buying more cores shouldn't get you lower clock speeds, just more flexibility. The top end Lynnfield is like buying a 3.46GHz dual-core processor that can also run well threaded code at 2.93GHz.

Take this one step further and imagine what happens when you have a CPU/GPU on the same package or better yet, on the same die. Need more GPU power? Underclock the CPU cores, need more CPU power? Turn off half the GPU cores. It's always availble, real-time-configurable processing power. That's the goal and Lynnfield is the first real step in that direction.

Speed Limits: Things That Will Keep Turbo Mode from Working

As awesome as it is, Turbo doesn't work 100% of the time, its usefulness varies on a number of factors including the instruction mix of active threads and processor cooling.

The actual instructions being executed by each core will determine the amount of current drawn and total TDP of the processor. For example, video encoding uses a lot of SSE instructions which in turn keep the SSE units busy on the chip; the front end remains idle and is clock gated, so power is saved there. The resulting power savings are translated into higher clock frequency. Intel tells us that video encoding should see the maximum improvement of two bins with all four cores active.

Floating point code stresses both the front end and back end of the pipe, here we should expect to see only a 133MHz increase from turbo mode if any at all. In short, you can't simply look at whether an app uses one, two or more threads. It's what the app does that matters.

There's also the issue of background threads running in the OS. Although your foreground app may only use a single thread, there are usually dozens (if not hundreds) of active threads on your system at any time. Just a few of those being scheduled on sleeping cores will wake them up and limit your max turbo frequency (Windows 7 is allegedly better at not doing this).

You can't really control the instruction mix of the apps you run or how well they're threaded, but this last point you can control: cooling. The sort-of trump all feature that you have to respect is Intel's thermal throttling. If the CPU ever gets too hot, it will automatically reduce its clock speed in order to avoid damaging the processor; this includes a clock speed increase due to turbo mode.



Lynnfield and its retail cooler

The retail cooler that ships with the Core i7 is tiny and while it's able to remove heat well enough to allow the chip to turbo up, we've seen instances where it doesn't turbo as well due to cooling issues. Just like we recommended in the Bloomfield days, an aftermarket cooler may suit you well.

Lynnfield: Made for Windows 7 (or vice versa)

Core Parking is a feature included in Windows 7 and enabled on any multi-socket machine or any system with Hyper Threading enabled (e.g. Pentium 4, Atom, Core i7). The feature looks at the performance penalty from migrating a thread from one core to another; if the fall looks too dangerous, Windows 7 won't jump - the thread will stay parked on that core.

What this fixes are a number of the situations where enabling Hyper Threading will reduce performance thanks to Windows moving a thread from a physical core to a logical core. This also helps multi-socket systems where moving a thread from one core to the next might mean moving it (and all of its data) from one memory controller to another one on an adjacent socket.

Core Parking can't help an application that manually assigns affinity to a core. We've still seen situations where HT reduces performance under Windows 7 for example with AutoCAD 2010 and World of Warcraft.

With support in the OS however, developers should have no reason to assign affinity in software - the OS is now smart enough to properly handle multi-socket and HT enabled machines.

Lynnfield's Un-Core: Faster Than Most Bloomfields

A few years ago I had a bet going with AMD's Ian McNaughton. We were at an AMD event where the Phenom architecture was first being introduced and he insisted that the L3 cache was part of the memory controller. This didn't make any sense to me so I disagreed. Minutes later a presentation slide went up on a projector talking about how the L3 cache and memory controller were on the same voltage plane; that's what he meant. Ian laughed a lot and to this day he holds it over my head.

The moral of the story is in Phenom and later in Nehalem, the processor is divided into two parts. Intel named them the core and the un-core. The "core" of these multi-core processors is made up of each individual processor core and its associated private caches (L1/L2). The "uncore" refers to everything else: PCIe controller, memory controller, DMI/QPI and the L3 cache.

The uncore isn't as critical for performance but is made up of a ton of transistors; roughly 400 million in the case of Lynnfield/Bloomfield (more if you count the PCIe controller). In order to save power, Intel uses slower transistors that have lower leakage for the un-core. As a result, the un-core can't clock up as high as the core and runs at a lower multiplier.

Take the Bloomfield Core i7 975 for example. The core runs at 25x BCLK (25 x 133MHz = 3.33GHz), but the un-core runs at 20x BCLK (20 x 133MHz = 2.66GHz). The rest of the chips, including Lynnfield, have slower un-cores:

CPU Socket Core Clock Un-Core Clock Intel Core i7 975 Extreme LGA-1366 3.33GHz 2.66GHz Intel Core i7 965 Extreme LGA-1366 3.20GHz 2.66GHz Intel Core i7 950 LGA-1366 3.06GHz 2.13GHz Intel Core i7 940 LGA-1366 2.93GHz 2.13GHz Intel Core i7 920 LGA-1366 2.66GHz 2.13GHz Intel Core i7 870 LGA-1156 2.93GHz 2.40GHz Intel Core i7 860 LGA-1156 2.80GHz 2.40GHz Intel Core i5 750 LGA-1156 2.66GHz 2.13GHz

Here's another area where Lynnfield is better than the lower end Bloomfields: its uncore runs at 2.40GHz instead of 2.13GHz. The exception being the Core i5 750, its uncore is stuck at 2.13GHz as well. Once again, only the "Extreme" Bloomfields have a faster uncore.

Lynnfield's Memory Controller: Also Faster than Bloomfield

Intel only officially supports two memory speeds on Bloomfield: DDR3-800 and DDR3-1066. Obviously we're able to run it much faster than that, but this is what's officially validated and supported on the processors.

Lynnfield is a year newer and thus gets a tweaked memory controller. The result? Official DDR3-1333 support.



Three Lynnfield memory kits (left to right): OCZ, Patriot and Kingston

The same sort of rules apply to Lynnfield memory kits that we saw with Bloomfield. You don't want to go above 1.65V and thus all the kits we've seen run at 1.5V for the stock JEDEC speeds or 1.65V for the overclocked modules.



Like Bloomfield, 1.65V is the max we'll see on Lynnfield

Discovery: Two Channels Aren't Worse Than Three

Intel told me something interesting when I was out in LA earlier this summer: it takes at least 3 cores to fully saturate Lynnfield's dual-channel DDR3-1333 memory bus. That's three cores all working on memory bandwidth intensive threads at the same time. That's a pretty stiff requirement. In the vast, vast majority of situations Lynnfield's dual channel DDR3 memory controller won't hurt it.

Move up to 6 or 8 core designs and a third memory channel is necessary, and that's why we'll see those processors debut exclusively on LGA-1366 platforms. In fact, X58 motherboards will only need a BIOS update to work with the 6-core 32nm Gulftown processor next year. P55 looks like it'll be limited to four cores and below.

Because of this, Lynnfield's memory bandwidth and latency cores are actually quite similar to Bloomfield. I used Everest to look at memory bandwidth and latency between a Core i7 975 and Core i7 870 (Lynnfield):

Lynnfield's memory controller is good, easily as good as what's in Bloomfield if not slightly better.

Both processors turbo'd up to 3.46GHz, indicating that Everest's memory test uses no more than two threads. The 975 ran DDR3-1066 memory (the highest it officially supports), while the 870 used DDR3-1333. The faster memory gave the 870 the advantage. Since we're not taxing all four cores, Lynnfield is at no disadvantage from a bandwidth perspective. Surprisingly enough, even SiSoft Sandra (which does use four cores for its memory bandwidth test) shows Lynnfield's dual-channel DDR3-1333 memory controller as equal to Bloomfield's triple-channel DDR3-1066 interface.

SiSoft Sandra 2009.SP4 Intel Core i7 975 Intel Core i7 870 Aggregate Memory Bandwidth 17.8 GB/s 17.3 GB/s

Long story short? Lynnfield won't be memory bandwidth limited with DDR3-1333 for the overwhelming majority of usage cases.

The Best Gaming CPU?

When I first previewed Lynnfield I theorized that its aggressive turbo modes would make it the best gaming CPU on the market. Most games these days use between two and four threads, not enough for Hyper Threading to be truly beneficial. As a result, Nehalem never really did all that well in games. It was generally faster than the competition, but not much and not on a performance-per-dollar basis.

I ran a few new game tests under Windows 7 to accompany our usual game benchmarks. The competitors here are limited to Lynnfield (of course), Bloomfield, Penryn and AMD's Phenom II.

Dawn of War II doesn't actually shatter any expectations. While turbo clearly benefits Lynnfield, it isn't enough to dethrone Bloomfield. The Core i7 920 is marginally faster than the new i5 750. Here's where things get interesting though: look at minimum frame rates. In both Lynnfield platforms, the minimum frame rates are higher than the competing Bloomfield system. That appears to be Lynnfield's aggressive turbo modes at work. While they're not constantly pushing Lynnfield to a higher clock speed, they do apparently help out when it matters the most.

The other thing to notice is the lowest Lynnfield is a faster gaming CPU than Intel's fastest dual-core: the E8600.

Sacred 2 is an example of performance standings in a more normal manner. Lynnfield can't seem to outperform Bloomfield, and the Core i5 750 actually falls slightly behind AMD's Phenom II X4 965 BE.

With World of Warcraft we're back to turbo mode having a very positive impact. The Core i7 870 is nearly as fast as the i7 975, while the i5 750 is a bit slower than the i7 920. Both are faster than the Phenom II X4 965 BE, which is in turn faster than the Q9650.

These three benchmarks seem to outline the three most realistic options for Lynnfield's gaming performance. In situations where its turbo modes can work, Lynnfield can be equal to if not faster than Bloomfield. In those situations where it doesn't kick in, Lynnfield is at least competitive with Phenom II and Bloomfield. In all situations the old Core 2 Quad Q9650 is at the bottom of the charts.

I'll throw in one more option just to complicate things. Have a look at this:

Not exactly the norm, but here we have the Phenom II X4 965 BE faster than everything - including the Core i7 975. Unfortunately there's no one benchmark that will sum up how these things perform, but overall it looks like Lynnfield is going to be one capable gaming CPU.

Multi-GPU SLI/CF Scaling: Lynnfield's Blemish

When running in single-GPU mode, the on-die PCIe controller maintains a full x16 connection to your graphics card:



Hooray.

In multi-GPU mode, the 16 lanes have to be split in two:

To support this the motherboard maker needs to put down ~$3 worth of PCIe switches:

Now SLI and Crossfire can work, although the motherboard maker also needs to pay NVIDIA a few dollars to legally make SLI work.

The question is do you give up any performance when going with Lynnfield's 2 x8 implementation vs. Bloomfield/X58's 2 x16 PCIe configuration? In short, at the high end, yes.

I looked at scaling in two games that scaled the best with multiple GPUs: Crysis Warhead and FarCry 2. I ran all settings at their max, resolution at 2560 x 1600 but with no AA.

I included two multi-GPU configurations. A pair of GeForce GTX 275s from EVGA for NVIDIA:



A coupla GPUs and a few cores can go a long way

And to really stress things, I looked at two Radeon HD 4870 X2s from Sapphire. Note that each card has two GPUs so this is actually a 4-GPU configuration, enough to really stress a PCIe x8 interface.

First, the dual-GPU results from NVIDIA.

NVIDIA GeForce GTX 275 Crysis Warhead (ambush) Crysis Warhead (avalanche) Crysis Warhead (frost) FarCry 2 Playback Demo Action Intel Core i7 975 (X58) - 1GPU 20.8 fps 23.0 fps 21.4 fps 41.0 fps Intel Core i7 870 (P55) 1GPU 20.8 fps 22.9 fps 21.5 fps 40.5 fps Intel Core i7 975 (X58) - 2GPUs 38.4 fps 42.3 fps 38.0 fps 73.2 fps Intel Core i7 870 (P55) 2GPUs 38.0 fps 41.9 fps 37.4 fps 65.9 fps

The important data is in the next table. What you're looking at here is the % speedup from one to two GPUs on X58 vs. P55. In theory, X58 should have higher percentages because each GPU gets 16 PCIe lanes while Lynnfield only provides 8 per GPU.

GTX 275 -> GTX 275 SLI Scaling Crysis Warhead (ambush) Crysis Warhead (avalanche) Crysis Warhead (frost) FarCry 2 Playback Demo Action Intel Core i7 975 (X58) 84.6% 83.9% 77.6% 78.5% Intel Core i7 870 (P55) 82.7% 83.0% 74.0% 62.7%

For the most part, the X58 platform was only a couple of percent better in scaling. That changes with the Far Cry 2 results where X58 manages to get 78% scaling while P55 only delivers 62%. It's clearly not the most common case, but it can happen. If you're going to be building a high-end dual-GPU setup, X58 is probably worth it.

Next, the quad-GPU results from AMD:

AMD Radeon HD 4870 X2 Crysis Warhead (ambush) Crysis Warhead (avalanche) Crysis Warhead (frost) FarCry 2 Playback Demo Action Intel Core i7 975 (X58) - 2GPUs 25.8 fps 31.3 fps 27.0 fps 70.9 fps Intel Core i7 870 (P55) 2GPUs 24.4 fps 31.1 fps 26.6 fps 71.4 fps Intel Core i7 975 (X58) - 4GPUs 27.0 fps 57.4 fps 47.9 fps 117.9 fps Intel Core i7 870 (P55) 4GPUs 24.2 fps 50.0 fps 36.5 fps 116 fps

Again, what we really care about is the scaling. Note how single GPU performance is identical between Bloomfield/Lynnfield, but multi-GPU performance is noticeably lower on Lynnfield. This isn't going to be good:

4870 X2 -> 4870 X2 CF Scaling Crysis Warhead (ambush) Crysis Warhead (avalanche) Crysis Warhead (frost) FarCry 2 Playback Demo Action Intel Core i7 975 (X58) 4.7% 83.4% 77.4% 66.3% Intel Core i7 870 (P55) -1.0% 60.8% 37.2% 62.5%

Ouch. Maybe Lynnfield is human after all. Almost across the board the quad-GPU results significantly favor X58. It makes sense given how data hungry these GPUs are. Again, the conclusion here is that for a high end multi-GPU setup you'll want to go with X58/Bloomfield.

A Quick Look at GPU Limited Gaming

With all of our CPU reviews we try to strike a balance between CPU and GPU limited game tests in order to show which CPU is truly faster at running game code. In fact all of our CPU tests are designed to figure out which CPUs are best at a number of tasks.

However, the vast majority of games today will be limited by whatever graphics card you have in your system. The performance differences we talked about a earlier will all but disappear in these scenarios. Allow me to present data from Crysis Warhead running at 2560 x 1600 with maximum quality settings:

NVIDIA GeForce GTX 275 Crysis Warhead (ambush) Crysis Warhead (avalanche) Crysis Warhead (frost) Intel Core i7 975 20.8 fps 23.0 fps 21.4 fps Intel Core i7 870 20.8 fps 22.9 fps 21.5 fps AMD Phenom II X4 965 BE 20.9 fps 23.0 fps 21.5 fps

They're all the same. This shouldn't come as a surprise to anyone, it's always been the case. Any CPU near the high end, when faced with the same GPU bottleneck, will perform the same in game.

Now that doesn't mean you should ignore performance data and buy a slower CPU. You always want to purchase the best performing CPU you can at any given pricepoint. It'll ensure that regardless of the CPU/GPU balance in applications and games that you're always left with the best performance possible.

The Test



Motherboard: Intel DP55KG (Intel P55)

Intel DX58SO (Intel X58)

Intel DX48BT2 (Intel X48)

Gigabyte GA-MA790FXT-UD5P (790FX) Chipset: Intel X48

Intel X58

Intel P55

AMD 790FX Chipset Drivers: Intel 9.1.1.1015 (Intel)

AMD Catalyst 9.8 Hard Disk: Intel X25-M SSD (80GB) Memory: Qimonda DDR3-1066 4 x 1GB (7-7-7-20)

Corsair DDR3-1333 4 x 1GB (7-7-7-20)

Patriot Viper DDR3-1333 2 x 2GB (7-7-7-20)

Video Card: eVGA GeForce GTX 280 Video Drivers: NVIDIA ForceWare 190.62 (Win764)

NVIDIA ForceWare 180.43 (Vista64)

NVIDIA ForceWare 178.24 (Vista32) Desktop Resolution: 1920 x 1200 OS: Windows Vista Ultimate 32-bit (for SYSMark)

Windows Vista Ultimate 64-bit

Windows 7 64-bit

Turbo mode is enabled for the P55 and X58 platforms.

SYSMark 2007 Performance

Our journey starts with SYSMark 2007, the only all-encompassing performance suite in our review today. The idea here is simple: one benchmark to indicate the overall performance of your machine.



I already spoiled the surprise and gave out the SYSMark data earlier, but this should put things in perspective. See how the Core i7 870 and Core i5 750 fit nicely in between the Core i7 920 and 975? Yeah, that's because pretty much anything below the 975 doesn't make sense anymore thanks to Lynnfield.

Guess what else doesn't make sense anymore? AMD's pricing on the Phenom II X4 965 BE. The 965 BE is priced at $245 while the i5 750 is a $196 processor. The 750 is about 6% faster here. AMD will need to adjust its prices downward after this.

The standings move around a bit in the individual SYSMark tests, but the bottom line remains: the Core i5 750, despite lacking Hyper Threading, is worthy.















Adobe Photoshop CS4 Performance

To measure performance under Photoshop CS4 we turn to the Retouch Artists’ Speed Test. The test does basic photo editing; there are a couple of color space conversions, many layer creations, color curve adjustment, image and canvas size adjustment, unsharp mask, and finally a gaussian blur performed on the entire image.

The whole process is timed and thanks to the use of Intel's X25-M SSD as our test bed hard drive, performance is far more predictable than back when we used to test on mechanical disks.

Time is reported in seconds and the lower numbers mean better performance. The test is multithreaded and can hit all four cores in a quad-core machine.

Hyper Threading does have a real benefit in Photoshop and thus we see the Core i5 750 suffering a bit. It's still faster than the Phenom II 965 BE but it is marginally slower than the i7 920. The 870 is bested only by the i7 975.

DivX 8.5.3 with Xmpeg 5.0.3

Our DivX test is the same DivX / XMpeg 5.03 test we've run for the past few years now, the 1080p source file is encoded using the unconstrained DivX profile, quality/performance is set balanced at 5 and enhanced multithreading is enabled:

And we're done. DivX, historically a stronghold for AMD's Phenom II processors (at least compared to their price-competitive Penryn counterparts) is faster on the Core i5 750 than on the Phenom II X4 965 BE. What's wrong with that?

The i5 750 costs $199, the 965 BE costs $245. Intel is selling you more transistors for less than AMD is for once.

x264 HD Video Encoding Performance

Graysky's x264 HD test uses the publicly available x264 codec (open source alternative to H.264) to encode a 4Mbps 720p MPEG-2 source. The focus here is on quality rather than speed, thus the benchmark uses a 2-pass encode and reports the average frame rate in each pass.

In the first pass AMD is quite competitive, outpacing the i5 750, but when we get to the actual encode:

It's close, but the cheaper i5 750 is faster than the Phenom II X4 965 BE once again; Hyper Threading keeps the i7 920 ahead.

Windows Media Encoder 9 x64 Advanced Profile

In order to be codec agnostic we've got a Windows Media Encoder benchmark looking at the same sort of thing we've been doing in the DivX and x264 tests, but using WME instead.

AMD is about 6% faster than the i5 750 here, it looks like the Phenom II does have some hope left for it. Let's see how the rest unfolds...

3dsmax 9 - SPECapc 3dsmax CPU Rendering Test

Today's desktop processors are more than fast enough to do professional level 3D rendering at home. To look at performance under 3dsmax we ran the SPECapc 3dsmax 8 benchmark (only the CPU rendering tests) under 3dsmax 9 SP1. The results reported are the rendering composite scores:

And we're back down to utter dominance yet again. The i5 750 is 12.6% faster than the Phenom II X4 965 BE and 18.8% cheaper. Harder, better, faster stronger.

Blender 2.48a

Blender is an open source 3D modeling application. Our benchmark here simply times how long it takes to render a character that comes with the application.

To get Blender to perform right on Lynnfield we actually had to update our graphics drivers. It looks like the on-die PCIe does require the latest NVIDIA/ATI drivers to work properly. The results aren't unusual; Intel has done very well in these tests and Lynnfield continues to dominate. The i5 750 is a bit slower than the 920 (and Q9650) thanks to its missing HT support.





Cinebench R10

Created by the Cinema 4D folks we have Cinebench, a popular 3D rendering benchmark that gives us both single and multi-threaded 3D rendering results.

The single threaded benchmark tells us everything we need to know. The Core i5 750 and i7 870 are two of the fastest processors we've ever tested at single-threaded applications. Very few microprocessors will be able to retire instructions from a single thread as quickly as Lynnfield. This is actually very noticeable in simply using the OS. Many tasks still aren't multithreaded but they execute very, very fast on Lynnfield.

Crank up the threads and Lynnfield is still competitive. Because it's missing Hyper Threading, the i5 750 is barely faster than the Phenom II X4 965 BE. Although I understand Intel wanting to segment its product line, it seems that the i5's missing HT goes a bit too far.

POV-Ray 3.73 beta 23 Ray Tracing Performance

POV-Ray is a popular, open-source raytracing application that also doubles as a great tool to measure CPU floating point performance.

I ran the SMP benchmark in beta 23 of POV-Ray 3.73. The numbers reported are the final score in pixels per second.

We see the same results under POV-Ray. Regardless of thread count, Lynnfield delivers the best performance possible short of a $1000 CPU.

Microsoft Excel 2007

Excel can be a very powerful mathematical tool. In this benchmark we're running a Monte Carlo simulation on a very large spreadsheet of stock pricing data.

The Excel test is peculiar in its results. It must be one of the few situations where Bloomfield's memory bandwidth advantage is seen as even the Core i7 870 can't outperform the i7 920. The Core 2 Quad Q9650 does well thanks to its large 12MB L2 cache, as does the Q6600 with a beefy 8MB cache.

Sony Vegas Pro 8: Blu-ray Disc Creation

Although technically a test simulating the creation of a Blu-ray disc, the majority of the time in our Sony Vegas Pro benchmark is spend encoding the 25Mbps MPEG-2 video stream and not actually creating the Blu-ray disc itself.

Hyper Threading is good for about 4% here, giving the 920 the slight edge over the Core i5 750.

Sorenson Squeeze: FLV Creation

Another video related benchmark, we're using Sorenson Squeeze to convert regular videos into Flash videos for use on websites.

The i5 750 pays the HT penalty, taking another 20 seconds to render our test video than the i7 920. It is still faster than the Phenom II X4 965 BE at a much lower cost. The Core i7 870 comes close but can't beat the i7 975.

PAR2 Multithreaded Archive Recovery Performance

Par2 is an application used for reconstructing downloaded archives. It can generate parity data from a given archive and later use it to recover the archive

Chuchusoft took the source code of par2cmdline 0.4 and parallelized it using Intel’s Threading Building Blocks 2.1. The result is a version of par2cmdline that can spawn multiple threads to repair par2 archives. For this test we took a 708MB archive, corrupted nearly 60MB of it, and used the multithreaded par2cmdline to recover it. The scores reported are the repair and recover time in seconds.

Faster than AMD? Check. Slower than the Core i7 920? Check. Costs under $200? Check. It's a shame that Intel didn't enable Hyper Threading on the Core i5 750, otherwise it would've really ruined most of the LGA-1366 lineup. The Core i7 860 is probably the best of both worlds unfortunately they are very hard to come by at this point.

The Core i7 870 is actually faster than the i7 975 here. I'll chalk that up to DDR3-1333 with some aggressive turboing.

WinRAR - Archive Creation

Our WinRAR test simply takes 300MB of files and compresses them into a single RAR archive using the application's default settings. We're not doing anything exotic here, just looking at the impact of CPU performance on creating an archive:

Large file compression is very well threaded and thus we see a real difference in performance between the HT enabled i7 920 and the i5 750 without Hyper Threading. The i7 870 however is within 5% of the i7 975, at 56% of the cost.

Fallout 3 Game Performance

Bethesda’s latest game uses an updated version of the Gamebryo engine (Oblivion). This benchmark takes place immediately outside Vault 101. The character walks away from the vault through the Springvale ruins. The benchmark is measured manually using FRAPS.

The numbers are all very close, but the Core i7 870 edges out the 975 for the lead here. The i5 750 manages to outperform the i7 920 thanks to its more aggressive turbo modes. The Phenom II X4 965 BE is faster than its closest competitor, but it needs a price adjustment in a major way.

Left 4 Dead

Once more we have Lynnfield near the top, the only thing that's faster is the i7 975. In these situations however the difference between first and fourth place is neglible.





FarCry 2 Multithreaded Game Performance

FarCry 2 ships with the most impressive benchmark tool we’ve ever seen in a PC game. Part of this is due to the fact that Ubisoft actually tapped a number of hardware sites (AnandTech included) from around the world to aid in the planning for the benchmark.

For our purposes we ran the CPU benchmark included in the latest patch:

Even when four cores are stressed, the i5 750 can pull ahead of the i7 920.





Crysis Warhead

Power Consumption

If you'll remember back to last year's Nehalem coverage I made a point to mention that the Nehalem architecture, thanks to its PCU and power gate transistors, was the most power efficient of the high end options. The lower the TDP, the more important power efficiency is and thus it's no surprise to see Lynnfield truly impress when it comes to power consumption:

At idle the Core i5 and Core i7 870 use less power than any other processor we've ever tested. Note that these idle power figures include an idling GeForce GTX 280. With a lower power graphics card, you could easily get to idle power consumption around 60W. Once we start seeing on-package GPUs, total system power consumption should drop even further.

Under load the Core i5 and Core i7 870 continue to impress. They both draw less power than a Q6600 or a Q9650, all the while outperforming the two. Power consumption is also noticeably lower than Bloomfield.

These things are fast and smart with power. Just wait until Nehalem goes below 65W...

Overclocking: Great When Overvolted, Otherwise...

Back when I asked Intel why anyone would opt for LGA-1366 over LGA-1156 one of the responses I got was: overclocking. The most overclockable CPUs will be LGA-1366 chips.

We tried overclocking three different CPUs: the Core i7 870, Core i7 860 and Core i5 750. We overclocked using two different coolers: the retail low profile HSF and a Thermalright MUX-120 (the heatsink Intel is sending around to reviewers for high performance testing). I'll get one thing out of the way: the retail heatsink pretty much sucks for overclocking:

Intel Core i7 870 Max Overclock (Turbo Disabled) Intel Retail LGA-1156 Cooler 3.52GHz (160MHz x 22.0) Thermalright MUX-120 4.20GHz (200MHz x 21.0)

The Thermalright enables higher overclocks by removing heat quickly enough allowing us to increase the voltage to the CPU. While roughly 1.35V is the limit for the retail cooler, The Thermalright MUX-120 let us go up to 1.40V. In both cases you need to have a well ventilated case.



Um, yeah.

Now for the actual overclocking results. We overclocked in two ways: 1) with turbo mode enabled and ensuring stability at all turbo frequencies (both single and multiple cores active), and 2) with turbo mode disabled simply going for highest clock speed.

The results are in the table below:

CPU Stock Clock Speed Max Overclock (Turbo Enabled) Max Overclock (Turbo Disabled) Intel Core i7 870 2.93GHz Default: 3.39GHz (154 x 22.0) 3C/4C Active: 3.70GHz

2C Active: 4.00GHz

1C Active: 4.16GHz 4.20GHz (200 x 21.0) Intel Core i7 860 2.80GHz 3.23GHz (154 x 21.0) 3C/4C Active: 3.54GHz

2C Active: 3.85GHz

1C Active: 4.00GHz 3.99GHz (210 x 19.0) Intel Core i5 750 2.66GHz 3.2GHz (160 x 20.0) 3C/4C Active: 3.96GHz

2C Active: 4.00GHz

1C Active: 4.16GHz 3.92GHz (206.5 x 19)

For best performance with all four cores active, disabling turbo mode is the way to go. Otherwise you have to reduce the BCLK in order to make sure your system is still stable when the one-active-core turbo mode kicks in. For example, with our Core i7 870 with turbo disabled we hit 4.2GHz using a 200MHz BCLK. If we used the same BCLK but left turbo enabled, when only one core was active we'd hit 5.4GHz - clearly not realistic with only air cooling.

The benefit of leaving turbo enabled is that you get a more balanced system that's not always using more power than it needs to.



The Core i5 750



Our Core i7 860 sample wasn't that great of an overclocker



Breaking 4.2GHz with our Core i7 870

At roughly 4GHz overclocks for all of these CPUs, it's reasonable to say that they are good overclockers. But how about with no additional voltage and the retail heatsink?

CPU Stock Clock Speed Max Overclock, Turbo Disabled (No Additional Voltage) Intel Core i7 870 2.93GHz 3.37GHz (22 x 153MHz)

The stock overclocks just plain suck on Lynnfield, you need added voltage to overclock the chip. With more voltage it works just like a Bloomfield or Phenom II, but at stock voltages Lynnfield just doesn't clock very high. And it has nothing to do with yields.

Overclocking Lynnfield at Stock Voltage: We're PCIe Limited

Remember the on-die PCIe controller? Yep. It's to blame.

Lynnfield is Intel's first attempt at an on-die PCIe controller and it actually works surprisingly well. There are no performance or compatibility issues.





The on-die PCIe controller needs more voltage as you overclock Lynnfield, limiting Lynnfield's stock vt overclocking potential.

Unfortunately the PCIe controller on Lynnfield is tied to the BCLK. Increase the BCLK to overclock your CPU and you're also increasing the PCIe controller frequency. This doesn't play well with most PCIe cards, so the first rule of thumb is to try and stay at 133MHz multiples when increasing your BCLK.

The second issue is the bigger one. As you increase the BCLK you increase the frequency of the transistors that communicate to the GPU(s) on the PCIe bus. Those transistors have to send data very far (relatively speaking) and very quickly. When you overclock, you're asking even more of them.

We know that Bloomfield can easily hit higher frequencies without increasing the core voltage, so there's no reason to assume that Lynnfield's core cannot (in fact, we know it can). The issue is the PCIe controller; at higher frequencies those "outside facing" transistors need more juice to operate. Unfortunately on Lynnfield rev 1 there doesn't appear to be a way to selectively give the PCIe transistors more voltage, instead you have to up the voltage to the entire processor.

Intel knows the solution to Lynnfield's voltage requirement for overclocking, unfortunately it's not something that can be applied retroactively. Intel could decouple the PCIe controller from BCLK by introducing more PLLs into the chip or, alternatively, tweak the transistors used for the PCIe interface. Either way we can expect this to change in some later rev of the processor. Whether that means we'll see it in the 45nm generation or we'll have to wait until 32nm remains to be seen.

The good news is that Lynnfield can still overclock well. The bad news is that unlike Bloomfield (and Phenom II) you can't just leave the Vcore untouched to get serious increases in frequency.

Final Words

I'll start this conclusion with what AMD must do in response to Lynnfield. The Core i5 750 is a great processor at $196, in fact, it's the best quad-core CPU you can buy at that price today. In nearly every case it's faster than AMD's Phenom II X4 965 BE, despite the AMD processor costing almost another $50. Granted you can probably save some money on an integrated 785G motherboard, but if you're comparing ~$120 motherboards the AMD CPU is simply overpriced.



Lynnfield (top) vs. Phenom II (bottom)

Luckily, the solution isn't that difficult. AMD needs to lower prices. The problem is that AMD has too many products below $200 already. The Phenom II X3 and X4 series both exist below $200 and rumor has it that AMD is also going to introduce a quad-core Athlon II somewhere down there. Lynnfield's arrival causes a lot of price compression on AMD's side. The most AMD should sell the 965 BE for is $199, but if it is to remain competitive the chip needs to be priced much lower. That doesn't leave much room for other AMD CPUs. On the bright side, this could force AMD to simplify its product lines again (similar to what it has quietly been doing already).

The next thing that the Core i5 750 does is it finally ends the life of LGA-775. Just as was the case with AMD, the Core 2 Quad Q9650 is easily destroyed by the Core i5 750 and at a lower price. With significantly lower motherboard costs than the LGA-1366 chips, the Core i5 750 can actually compete in the high end LGA-775 space. It's only a matter of time before the sub-$200 LGA-775 parts are made obsolete as well.

Lynnfield power consumption is just excellent, these are the most power efficient quad-core CPUs we've ever tested. They use less power at idle than similarly clocked dual-core processors and under load they deliver better performance per watt than any of their closest competitors. Later this year we'll see 32nm dual-core Westmere start to ship for notebooks. I don't have performance data but I'd expect that early next year will be the perfect time to buy a new notebook.

Can you tell that I like the Core i5 750? Again, at $196 you can't find a better processor. Intel did its homework very well and managed to deliver something that kept AMD in check without completely upsetting the balancing of things. There's no technical reason that Intel couldn't have enabled Hyper Threading on the Core i5, it's purely a competitive move. A Core i5 750 with HT would not only defeat the purpose of most of the i7s, but it would also widen the performance gap with AMD. Intel doesn't need to maintain a huge performance advantage, just one that's good enough. While I'd love to have a 750 with HT, I'd still recommend one without it.

The Core i7 870 gets close enough to the Core i7 975 that I'm having a hard time justifying the LGA-1366 platform at all. As I see it, LGA-1366 has a few advantages:

1) High-end multi-GPU Performance 2) Stock Voltage Overclocking 3) Future support for 6-core Gulftown CPUs

If that list doesn't make you flinch, then Lynnfield is perfect. You'll save a bunch on a motherboard and the CPUs start at $196 instead of $284. We didn't have enough time with our Core i7 860 to include performance results here but my instincts tell me that at $284 that'll be the Lynnfield sweetspot. You get excellent turbo modes and Hyper Threading, without breaking $300.

Speaking of turbo, I'd say that Intel is definitely on to something here. The performance impact was small with Bloomfield, but turbo on Lynnfield is huge. My tests showed up to a 17% increase in performance depending on the workload, with most CPU-influenced scenarios seeing at least 9 or 10%. The turbo mode transitions happen fast enough to accelerate even simple actions like opening a new window. OS and application responsiveness is significantly improved as a result and it's something that you can actually feel when using a Lynnfield machine. It all works so seamlessly, you just always get the best performance you need. It's like Intel crammed the best single, dual and quad-core processors all into one package.

Perhaps that's what kept me from falling in love with Bloomfield right away. It was fast but in the same way that its predecessors were fast. If you didn't have a well threaded application, Bloomfield wasn't any better than a similarly clocked Penryn. Lynnfield's turbo modes change the game. Say goodbye to tradeoffs, the Core i5 and Core i7 are now fast regardless of thread count. It speed that is useful, it speed that you can feel, it's what truly makes Lynnfield the best desktop microprocessor of 2009. It's not just faster, it's smarter, it's better. It's why today's title borrows from Daft Punk and not Star Wars; it's not more of the same, it's something futuristic and new.

Lynnfield shows us the beginning of how all microprocessors are going to be made in the future. Even AMD is embracing turbo, we'll see it with Fusion in 2011. Extend turbo to its logical conclusion and you end up with something very exciting. Imagine a processor made up of many different cores, large and small, CPU and GPU. Each one turning on/off depending on the type of workload, and each running as fast as possible without dissipating more heat than your system can handle.

My only two complaints with Lynnfield are that the chips do require additional voltage (above stock) to overclock and of course the lack of Hyper Threading on the Core i5. It doesn't ruin the processor, but it gives us something to wish for.

Our work is never over.