So far, 2009 has been a year of renewed competition in desktop processors, as a resurgent AMD has rolled out portions of its 45nm CPU lineup with reasonable success. For its part, Intel has responded by dropping prices on its own Core 2 processors to remain competitive, while riding high on the Core i7’s undisputed performance leadership at the top of the market.

Tighter competition means better choices for consumers, but in technology, it almost always means more choices, as well. That principle was on ample display last week as AMD introduced several new products that essentially complete its transition to 45nm technology. Meanwhile, Intel has countered by quietly freshening up its low-end and mid-range offerings a bit and trumpeting the release of a new flagship CPU, the Core i7-975 Extreme.

With these moves, you have, uh, five.. no, four… wait, maybe it’s six different strata of desktop CPUs from which to choose. Not counting the low-power versions. We have new processors to test that fit into… uh, I think four of those categories.

Marketing is hard, folks. Especially when you’re on the receiving end of it.

Anyhow, there’s much ground to cover. Given the scope of the new releases, we decided to compare June’s crop of CPUs against, well, everything we could swing. The result is an enormous roundup of 26 different types of processors, including five new ones. We’ve poked, prodded, tested for performance, measured power efficiency, overclocked, and considered the value propositions. The only question now is whether I’ll pass out before I’m able to finish writing this thing. Should make for interesting times, no?

The new entries, from bottom to top

One of the least expensive chips on our agenda is the most novel. AMD has, at last, produced a native dual-core processor based on its latest technology, and the Athlon II X2 250 is the first incarnation of it. This chip features the same execution core and feature set as the Phenom II, but unlike other recent Athlon X2-branded products, it is not based on a gimpy quad-core chip. Instead, this is a true dual-core, 45nm processor with 1MB of L2 cache per coreand no L3 cache at all. The chip itself is made up of “only” 234 million transistors and fits into a die area of 117.5 mm²well under half the size of the a Phenom II, by both measurements. Yet this is a true Socket AM3 processor, with support for dual channels of DDR3 memory, HyperTransport 3 speeds of up to 4 GT/s, and backward compatibility with Socket AM2+ motherboards and DDR2 memory.

One would, of course, expect this silicon to become a very popular choice for some high-profile missions beyond the desktop, including a range of laptops and sub-notebooks of various designations.

On the desktop, the Athlon II X2 250 looks like a decent “value dual-core” CPU option, with a core clock speed of 3GHz, a relatively tame 65W TDP rating, and a price tag of just $87. Officially, the Athlon II X2 250 supports DDR2 memory at up to 800MHz and DDR3 memory at up to 1066MHz, but it worked just fine for us with 1333MHz DDR3 memory. Since its new-generation CPU cores should achieve higher clock-for-clock performance than older Athlon processors, and since it starts with a relatively high clock frequency, the X2 250 could be a very nice value in this part of the market.

Gigabyte’s MA770T-UD3P

To help illustrate that point, Gigabyte sent us a motherboard it recommends pairing with one of AMD’s new dual-core processors, the MA770T-UD3P. Unfortunately, it arrived in Damage Labs too late for our use in testing, but this compact AMD 770 chipset-based Socket AM3 board offers pretty much everything you’d need for a decent system, along with DDR3 support. The price? 89 bucks, American money. In fact, some places online are selling it for even less.

Choosing the parts for the Econobox in our next system guide just got more interesting, I think.

Should a mobo price like that one free up a few dollars in your CPU budget, you might wish to step up to a slightly faster processor. The Phenom II X2 550 Black Edition is intended to fill that role. Pretty straightforwardly, this one really is a Phenom II with two cores disabled. Each remaining core has 512KB of L2 cache, and the two cores share a common 6MB L3 cache. The X2 550 runs at 3.1GHz, with 2GHz HyperTransport and north bridge/L3 cache speeds, and it has a TDP rating of 80W. As AMD’s top dual-core product, the X2 550 is also a Black Edition, which means its clock multiplier is unlocked to facilitate easier overclocking. The Phenom II X2 550 lists for $102, so for 15 bucks more than the Athlon II X2 250, you get the L3 cache, another hundred megahertz, and an unlocked multiplier.

The final piece of the 45nm puzzle for AMD is a pair of low-power Phenom II processors also introduced last week. The Phenom II X4 905e ticks away at 2.5GHz, has 6MB of L3 cache, is rated for a 65W TDP, and will set you back $195. If you’re willing to drop from four cores to three, the Phenom II X3 705 offers the same basic specs for just $125. We’ve not yet had a chance to test these power-efficient processors, which will face off against some low-power Core 2 Quads from Intel, but we hope to do so soon, so stay tuned.

The natural counter to the Athlon II X2 would come from Intel’s Pentium E series of value dual-core processors. As if in anticipation of the Athlon II, Intel very discreetly let the Pentium E6300 slip into the wild last month. This chip is based on the 45nm “Wolfdale” Core 2 Duo silicon, but with only 2MB of its L2 cache enabled. The rest of the vitals: 2.8GHz core clock, 1066MHz front-side bus, and a 65W TDP. The E6300 is priced just opposite the Athlon II X2 250 at $84. At that price, the E6300 should be a formidable competitor, despite the fact that its name tends to engender confusion with the (quite different) 65nm Core 2 Duo E6300.

The most direct competition for the Phenom II X2 550 would probably be the Core 2 Duo E7400, which is the same thing as the Pentium E6300, only with 3MB of L2 cache instead of 2MB. We unfortunately don’t have an E7400 on hand for testing, but it should be only slightly quicker than the Pentium E6300, which is surely the better value.

We do have a Core 2 Quad Q8400 on hand, though, another product Intel released rather quietly recently. We have not been big proponents of past “value quad-core” processors, including the Core 2 Quad Q8200 and the Phenom II X4 810, because their modest clock speeds limit performance in lightly threaded applications, including games. The Q8400 aims to remedy this situation somewhat by bumping core clocks up to 2.66GHz. Otherwise, the Q8400 mirrors its siblings with a 1333MHz front-side bus, 4MB total L2 (2MB per chip), and a 95W TDP. The $183 Q8400’s closest competitor is probably the Phenom II X4 940, which lists for $195. The Q8400 steers well clear of the Core i7-920 at $284, so it’s positioned nicely as an affordable quad-core option.

Intel has also released a low-power version of this product, dubbed the Q8400S, with a 65W TDP rating for $245.

The CPU hogging all of the attention, though, is Intel’s new flagship, the Core i7-975 Extreme. This puppy steps directly into the role of “fastest desktop processor on the planet” courtesy of its quad-core Nehalem architecture and 3.33GHz core clock speed.

Well, clock speed is a tricky thing with a Core i7, thanks to its Turbo mode dynamic clock scaling. In reality, the Core i7-975 Extreme will spend much of its time above 3.33GHz, at up to 3.6GHz in single-threaded applications or 3.46GHz with multiple threads, depending on thermal headroom.

Like the Core i7-965 before it, the 975 has a QPI link speed of 6.4 GT/s.

The 975 Extreme is based on a new D-stepping of Nehalem silicon, which brings additional newness and possibly additional goodness in the form of higher clock speeds at lower voltages, if the rumors are true. Some extra headroom might be useful, since the 975 is an Extreme Edition with an unlocked upper multiplier. The Core i7-975 is indeed extreme, too, with a 130W TDP and a sticker price of $999. As you may know, AMD has nothing yet to compete with the Core i7-975 Extreme, although some interesting possibilities do suggest themselves, don’t you think?

If one dollar short of a grand is a little too rich for your blood, you might instead be interested in the Core i7-950, another part of June’s bumper CPU crop. With a 3GHz core clock and a 4.8 GT/s QPI link speed, the Core i7-950 essentially replaces the 2.93GHz Core i7-940. Both occupy the same $562 slot in Intel’s price sheet, which suggests the Core i7-940 isn’t long for this world. The i7-950 should be a minor step up in performance, but obviously not terribly different, which is why we didn’t bother testing this speed grade separately.

Test notes

Pictured above is a trio of DIMMs from OCZ that are intended for use with the Core i7-975 Extreme. These 2GB Blade series DDR3 modules are rated for operation at a blistering 2133MHz. Interestingly enough, the 975 Extreme presents the correct multipliers for both 1866MHz and 2133MHz memory via the BIOS of our Gigabyte EX58-UDR3 motherboard, a TR Recommended mobo we’ve selected for our new Core i7 testbed. This board is one of the newer sub-$200 X58 motherboards that offers better overclocking options than the Intel board we’ve used previously.

We did test the Core i7-975 Extreme with the Blade modules, but we used, ahem, a more pedestrian memory speed of “only” 1600MHz in our main testing, though with relatively tight timings. We found that we had to use looser timings to achieve higher memory clocks, and the tradeoff wasn’t always worth it. We continue to tinker, though.

Speaking of which, in order to gauge the impact of memory type on performance and power use, we’ve tested the Phenom II X4 810 both with DDR2 memory on a Socket AM2+ board and with DDR3 memory on a Socket AM3 board. You’ll find the results in the following pages, labeled appropriately.

The Core 2 Quad Q8300 processor we used for testing came to us courtesy of the good folks at NCIX and NCIXUS. Thanks to them for making this comparison possible. We’ve underclocked our Q8300 to simulate a Q8200 for this review.

We’ve simulated several other speed grades via underclocking, too. Specifically, the Phenom II X4 920 is an underclocked 940, and the Core 2 Quad Q9550 is an underclocked Core 2 Extreme QX9650. We expect the performance of these “simulated” speed grades to be identical to the real things, but we sometimes exclude these processors from our power consumption testing because we do anticipate power use could vary slightly from the actual products.

Our testing methods

As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and the results were averaged.

Our test systems were configured like so:

Processor Core

2 Quad Q6600 2.4 GHz Core

2 Duo E8400 3.00 GHz Core

2 Duo E8600 3.33 GHz Core 2 Quad Q8200 2.33 GHz Core 2 Quad Q9300 2.5 GHz Core 2 Quad Q9400 2.66 GHz Core 2 Quad Q9550 2.83 GHz Core

2 Extreme QX9770 3.2 GHz Dual

Core

2 Extreme QX9775 3.2 GHz Core

i7-940 2.66 GHz Core i7-940 2.93 GHz Core

i7-965 Extreme 3.2 GHz Core

i7-975 Extreme 3.33 GHz Athlon

64 X2 6400+ 3.2 GHz Phenom

X3 8750 2.4 GHz

Phenom II X4 920 2.8 GHz Phenom II X4 940 3.0 GHz Phenom

II X4 810 2.6 GHz

Phenom X4 9950 Black 2.6 GHz Phenom

II X3 720 2.8 GHz Phenom II X4 810 2.6 GHz Phenom

II X4 955 3.2 GHz Pentium

E6300 2.8 GHz Core

2 Quad Q8400 2.66 GHz Athlon

II X2 250 3.0 GHz Phenom

II X2 550 3.1GHz System bus 1066

MT/s (266 MHz) 1333

MT/s (333 MHz) 1600

MT/s (400 MHz) 1600

MT/s (400 MHz) QPI

4.8 GT/s (2.4 GHz) QPI

6.4 GT/s (3.2 GHz) QPI

6.4 GT/s (3.2 GHz) HT

2.0 GT/s (1.0 GHz) HT

3.6 GT/s (1.8 GHz) HT

3.6 GT/s (1.8 GHz) HT

4.0 GT/s (2.0 GHz) HT

4.0 GT/s (2.0 GHz) HT

4.0 GT/s (2.0 GHz) Motherboard Asus

P5E3 Premium Asus

P5E3 Premium Asus

P5E3 Premium Intel

D5400XS Intel

DX58SO Intel

DX58SO Gigabyte

EX58-UD3R Asus

M3A79-T Deluxe Asus

M3A79-T Deluxe MSI

DKA790GX Platinum Asus

M4A79T Deluxe BIOS revision 0605 0605 0605 XS54010J.86A.1149. 2008.0825.2339 SOX5810J.86A.2260. 2008.0918.1758 SOX5810J.86A.2260. 2008.0918.1758 F5 0403 0403 11/25/08 0703 1.6

(1/21/09) 0902 0802 0802 0089

(5/15/09) 1103 North bridge X48

Express MCH X48

Express MCH X48

Express MCH 5400

MCH X58

IOH X58

IOH X58

IOH 790FX 790FX 790GX 790FX South bridge ICH9R ICH9R ICH9R 6321ESB ICH ICH10R ICH10R ICH10R SB750 SB750 SB750 SB750 Chipset drivers INF

Update 9.0.0.1008 Matrix Storage Manager 8.5.0.1032 INF

Update 9.0.0.1008 Matrix Storage Manager 8.5.0.1032 INF

Update 9.0.0.1008 Matrix Storage Manager 8.5.0.1032 INF Update

9.0.0.1008 Matrix Storage Manager 8.5.0.1032 INF

update 9.1.0.1007 Matrix Storage Manager 8.5.0.1032 INF

update 9.1.0.1007 Matrix Storage Manager 8.5.0.1032 INF

update 9.1.0.1007 Matrix Storage Manager 8.5.0.1032 AHCI

controller 3.1.1540.61 AHCI

controller 3.1.1540.61 AHCI

controller 3.1.1540.61 AHCI

controller 3.1.1540.61 Memory size 4GB

(2 DIMMs) 4GB

(2 DIMMs) 4GB

(2 DIMMs) 4GB

(2 DIMMs) 6GB

(3 DIMMs) 6GB

(3 DIMMs) 6GB

(3 DIMMs) 4GB

(2 DIMMs) 4GB

(2 DIMMs) 4GB

(2 DIMMs) 4GB

(2 DIMMs) Memory type Corsair

TW3X4G1800C8DF DDR3 SDRAM Corsair

TW3X4G1800C8DF DDR3 SDRAM Corsair

TW3X4G1800C8DF DDR3 SDRAM Micron

ECC DDR2-800 FB-DIMM Corsair

TR3X6G1600C8D DDR3 SDRAM Corsair

TR3X6G1600C8D DDR3 SDRAM OCZ

OCZ3B2133LV2G DDR3 SDRAM Corsair

TWIN4X4096-8500C5DF DDR2 SDRAM Corsair

TWIN4X4096-8500C5DF DDR2 SDRAM Corsair

TWIN4X4096-8500C5DF DDR2 SDRAM Corsair

TW3X4G1600C9DHXNV DDR3 SDRAM Memory

speed (Effective) 1066

MHz 1333

MHz 1600

MHz 800

MHz 1066

MHz 1600

MHz 1600

MHz 800

MHz 1066

MHz 1066

MHz 1333

MHz CAS latency (CL) 7 8 8 5 7 8 8 4 5 5 8 RAS to CAS delay (tRCD) 7 8 8 5 7 8 8 4 5 5 8 RAS precharge (tRP) 7 8 8 5 7 8 8 4 5 5 8 Cycle time (tRAS) 20 20 24 18 20 24 24 12 15 15 20 Command

rate 2T 2T 2T 2T 2T 1T 1T 2T 2T 2T 2T Audio Integrated

ICH9R/AD1988B with SoundMAX 6.10.2.6480 drivers Integrated

ICH9R/AD1988B with SoundMAX 6.10.2.6480 drivers Integrated

ICH9R/AD1988B with SoundMAX 6.10.2.6480 drivers Integrated

6321ESB/STAC9274D5 with SigmaTel 6.10.5713.7 drivers Integrated

ICH10R/ALC889 with Realtek 6.0.1.5704 drivers Integrated

ICH10R/ALC889 with Realtek 6.0.1.5704 drivers Integrated

ICH10R/ALC888 with Realtek 6.0.1.5704 drivers Integrated

SB750/AD2000B with SoundMAX 6.10.2.6480 drivers Integrated

SB750/AD2000B with SoundMAX 6.10.2.6480 drivers Integrated SB750/ALC888 with Realtek 6.0.1.5704 drivers Integrated SB750/ALC1200 with Realtek 6.0.1.5704 drivers Hard drive WD Caviar SE16 320GB SATA Graphics Radeon

HD 4870 512MB PCIe with Catalyst 8.55.4-081009a-070794E-ATI

drivers OS Windows Vista Ultimate x64 Edition OS updates Service

Pack 1, DirectX redist update August 2008

Thanks to Corsair and OCZ for providing us with memory for our testing.

Our single-socket test systems were powered by OCZ GameXStream 700W power supply units. The dual-socket system was powered by a PC Power & Cooling Turbo-Cool 1KW-SR power supply. Thanks to OCZ for providing these units for our use in testing.

Also, the folks at NCIXUS.com hooked us up with a nice deal on the WD Caviar SE16 drives used in our test rigs. NCIX now sells to U.S. customers, so check them out.

The test systems’ Windows desktops were set at 1600×1200 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled.

We used the following versions of our test applications:

The tests and methods we employ are usually publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Memory subsystem performance

This test gives us a visual representation of the cache and memory subsystems of these CPUs. I’ve limited the results to the five new processors we’re testing, plus one more for comparison. As you can see, the higher-end CPUs tend to have larger, faster caches, as one would expect. The performance of the L2 and L3 caches of the Core i7-975 is truly remarkable. Meanwhile, the only real difference between the Core 2 Quad Q8400 and Q9400 is illustrated at the 4MB block size, where the Q9400’s larger L2 caches come into play.

Since it’s difficult to see the results once we get into main memory, let’s take a closer look at the 256MB block:

The Core 2 processors’ throughput is generally limited by their front-side bus speeds, while the AMD processors and the Core i7s have no such bottleneck.

The front-side bus also adds a latency penalty to memory accesses, which is why the Core 2 processors and the Pentium E6300 cluster near the bottom of these results. Interestingly, with no L3 cache onboard, the Athlon II X2 250 is a little quicker getting out to memory than the Phenom IIs. Again, though, the Core i7 is just an absolute monster, with the fastest memory subsystem by far in every way we’ve measured.

Crysis Warhead

We measured Warhead performance using the FRAPS frame-rate recording tool and playing over the same 60-second section of the game five times on each processor. This method has the advantage of simulating real gameplay quite closely, but it comes at the expense of precise repeatability. We believe five sample sessions are sufficient to get reasonably consistent results. In addition to average frame rates, we’ve included the low frame rates, because those tend to reflect the user experience in performance-critical situations. In order to diminish the effect of outliers, we’ve reported the median of the five low frame rates we encountered.

We tested at at relatively modest graphics settings, 1024×768 resolution with the game’s “Mainstream” quality settings, because we didn’t want our graphics card to be the performance-limiting factor. This is, after all, a CPU test.

Right out of the gate, Warhead demonstrates that one need not buy a fancy quad-core processor in order to play even the latest games. In fact, you’re better off with a high-frequency dual-core than you are with a slower quad, as testified by the fact that the Core 2 Duo E8600 outperforms the Core i7-940 here. Similarly, the Phenom II X2 550 essentially ties the Phenom II X4 955.

Warhead does appear to be sensitive to cache sizes, though, judging by the fact that the Athlon II X2 250 falls well behind the Phenom II X2 550. In cases like this one, paying a little more for the extra cache helps. But then that’s a pretty large difference in total effective cache size: 2304KB for the Athlon II X2 250 versus 7424KB for the Phenom II X2 550. The difference between the Core 2 Quad Q8400 and Q9400 is smaller4096KB vs. 6144KB, respectively, of total effective cacheand doesn’t appear to cross any major dividing lines: only two frames per second separate their averages.

Keeping score: Phenom II X4 940 over Q8400, the Pentium E6300 and Athlon II X2 250 essentially tie, and the Core i7-975 Extreme obliterates everything.

Far Cry 2

After playing around with Far Cry 2, I decided to test it a little bit differently by recording frame rates during the jeep ride sequence at the very beginning of the game. I found that frame rates during this sequence were generally similar to those when running around elsewhere in the game, and after all, playing Far Cry 2 involves quite a bit of driving around. Since this sequence was repeatable, I just captured results from three 90-second sessions.

Again, I didn’t want the graphics card to be our primary performance constraint, so although I tested at fairly high visual quality levels, I used a relatively low 1024×768 display resolution and DirectX 9.

The minimum frame rates here are a little higher than in Warhead, generally speaking, which means nearly any of these processors should play this game pretty smoothly. Even the Pentium E6300’s minimum frame rate is 30 FPS.

Unreal Tournament 3

As you saw on the preceding page, I did manage to find a couple of CPU-limited games to use in testing. I decided to try to concoct another interesting scenario by setting up a 24-player CTF game on UT3’s epic Facing Worlds map, in which I was the only human player. The rest? Bots controlled by the CPU. I racked up frags like mad while capturing five 60-second gameplay sessions for each processor.

Oh, and the screen resolution was set to 1280×1024 for testing, with UT3’s default quality options and “framerate smoothing” disabled.

Although the frame rates involved here clearly show that even the slowest processor is very much up to this task, this is one of the rare situations in current games where we can exploit more than two cores to gain additional performance. As a result, the dual-core CPUs congregate in the bottom third of the chart, along with the rather troubled original Phenoms.

Half Life 2: Episode Two

Our next test is a good, old custom-recorded in-game timedemo, precisely repeatable.

This is a very strong showing from the Phenom II X2 550, nearly in league with the Core 2 Duo E8400and, notably, faster than the Core i7-940, amazingly enough.

Source engine particle simulation

Next up is a test we picked up during a visit to Valve Software, the developers of the Half-Life games. They had been working to incorporate support for multi-core processors into their Source game engine, and they cooked up some benchmarks to demonstrate the benefits of multithreading.

This test runs a particle simulation inside of the Source engine. Most games today use particle systems to create effects like smoke, steam, and fire, but the realism and interactivity of those effects are limited by the available computing horsepower. Valve’s particle system distributes the load across multiple CPU cores.

There will be no upstart dual-cores taking the lead here. This one is about core counts, instructions per clock, and clock speed. The Q8400 nearly ties the Q9400 in this test, and I believe this is the first time we’ve seen the Q8400 finish ahead of the Phenom II X4 940. Need I even say that the Core i7-975 Extreme is at the top of the heap?

WorldBench

WorldBench’s overall score is a pretty decent indication of general-use performance for desktop computers. This benchmark uses scripting to step through a series of tasks in common Windows applications and then produces an overall score for comparison. WorldBench also records individual results for its component application tests, allowing us to compare performance in each. We’ll look at the overall score, and then we’ll show individual application results alongside the results from some of our own application tests.

So how in the world did the Core i7-975 Extreme finish behind the 965? Simple: for whatever reason, our new Core i7 test rig didn’t play well with the Nero 7 Ultra test. The application would hang on each attempt through the test. As a result, we couldn’t include results for Nero in the WorldBench composite score for the 975 Extreme, which held it back. The Nero test has long been a sticking point in the WorldBench suite, although it usually runs sucessfully for us after multiple tries. Hopefully we can overcome this issue next time around.

Beyond that, the general strength of the Intel processors is apparent here, as the blue clusters in the top half of the results and the green in the bottom half. The Pentium E6300 outdoes the Phenom II X2 550 by a single point, and the Q8400 scores another win over the Phenom II X4 940.

Productivity and general use software

MS Office productivity

Firefox web browsing

This very common PC activity emphasizes fast dual-cores over slower quads in fairly dramatic fashion. Hence the very strong showing the for Phenom II X2 550with its dual cores, large cache, and high clock speedsand the weak placement of the Core 2 Quad Q8400, with its quad cores, smaller caches, and relatively modest frequencies. I understand this test is also fairly sensitive to memory access latencies, which helps explain why the Athlon II X2 250 so dramatically outruns the Pentium E6300.

Multitasking – Firefox and Windows Media Encoder

WinZip file compression

The new dual-core AMD processors’ relatively strong performance here is a bit puzzling, but it seems this test just doesn’t play well with AMD’s quad-core processors. Ye olde Athlon 64 X2 6400+ outperforms all of the AMD quad-core CPUs, as well.

Nero CD authoring

These results are pretty clearly stratified by disk controller type.

Image processing

Photoshop

AMD processors haven’t performed well in WorldBench’s Photoshop test, and as we’ve noted before, some of the blame apparently lays at the feet of AMD’s disk controller. Between this test and Nero, AMD comes by its poor rankings in WorldBench rather honestly. One suspects, though, that with a better disk controller, the Phenoms and Athlons would put in a better showing.

The Panorama Factory photo stitching

The Panorama Factory handles an increasingly popular image processing task: joining together multiple images to create a wide-aspect panorama. This task can require lots of memory and can be computationally intensive, so The Panorama Factory comes in a 64-bit version that’s widely multithreaded. I asked it to join four pictures, each eight megapixels, into a glorious panorama of the interior of Damage Labs.

In the past, we’ve added up the time taken by all of the different elements of the panorama creation wizard and reported that number, along with detailed results for each operation. However, doing so is incredibly data-input-intensive, and the process tends to be dominated by a single, long operation: the stitch. So this time around, we’ve simply decided to report the stitch time, which saves us a lot of work and still gets at the heart of the matter.

The Intel CPUs come out on top here in every category: E6300 over both new dual-core AMDs, Q8400 over Phenom II X4 940, and Core i7-975 Extreme uber alles.

picCOLOR image analysis

picCOLOR was created by Dr. Reinert H. G. Müller of the FIBUS Institute. This isn’t Photoshop; picCOLOR’s image analysis capabilities can be used for scientific applications like particle flow analysis. Dr. Müller has supplied us with new revisions of his program for some time now, all the while optimizing picCOLOR for new advances in CPU technology, including MMX, SSE2, and Hyper-Threading. Naturally, he’s ported picCOLOR to 64 bits, so we can test performance with the x86-64 ISA. Many of the individual functions that make up the test are multithreaded.

This final image manipulation test is much closer than the others above. The Intel and AMD products in the same basic categories are pretty evenly matched.

Media encoding and editing

x264 HD benchmark

This benchmark tests performance with one of the most popular H.264 video encoders, the open-source x264. The results come in two parts, for the two passes the encoder makes through the video file. I’ve chosen to report them separately, since that’s typically how the results are reported in the public database of results for this benchmark. These scores come from the newer, faster version 0.59.819 of the x264 executable.

If you want to encode video quickly, a cheap dual-core processor definitely isn’t the way to go. You’re vastly better off with a Core 2 Quad Q8400 or its rival, the Phenom II X4 940, which is a little quicker in both tests. Of course, with price as no object, the best option is the Core i7-975 Extremeor perhaps more than one CPU. If you want to see something really cool, have a look at the dual-socket Nehalem systems reaching into the 50 to 60 FPS range in the more broadly multithreaded second pass of this process.

Windows Media Encoder x64 Edition video encoding

Windows Media Encoder is one of the few popular video encoding tools that uses four threads to take advantage of quad-core systems, and it comes in a 64-bit version. Unfortunately, it doesn’t appear to use more than four threads, even on an eight-core system. For this test, I asked Windows Media Encoder to transcode a 153MB 1080-line widescreen video into a 720-line WMV using its built-in DVD/Hardware profile. Because the default “High definition quality audio” codec threw some errors in Windows Vista, I instead used the “Multichannel audio” codec. Both audio codecs have a variable bitrate peak of 192Kbps.

Among the value dual-cores and the mid-range quad cores, the AMD processors prove to be a little faster here.

Windows Media Encoder video encoding

Roxio VideoWave Movie Creator

I’m not a big fan of WorldBench’s video encoding tests, with aren’t as multithreaded as real video encoding apps tend to be these days. I’ve included them for the sake of completeness.

LAME MT audio encoding

LAME MT is a multithreaded version of the LAME MP3 encoder. LAME MT was created as a demonstration of the benefits of multithreading specifically on a Hyper-Threaded CPU like the Pentium 4. Of course, multithreading works even better on multi-core processors. You can download a paper (in Word format) describing the programming effort.

Rather than run multiple parallel threads, LAME MT runs the MP3 encoder’s psycho-acoustic analysis function on a separate thread from the rest of the encoder using simple linear pipelining. That is, the psycho-acoustic analysis happens one frame ahead of everything else, and its results are buffered for later use by the second thread. That means this test won’t really use more than two CPU cores.

We have results for two different 64-bit versions of LAME MT from different compilers, one from Microsoft and one from Intel, doing two different types of encoding, variable bit rate and constant bit rate. We are encoding a massive 10-minute, 6-second 101MB WAV file here.

The performance differences between the CPUs here are pretty minor. This is, though, one more example where fewer cores and higher clock speeds win out.

3D modeling and rendering

Cinebench rendering

Graphics is a classic example of a computing problem that’s easily parallelizable, so it’s no surprise that we can exploit a multi-core processor with a 3D rendering app. Cinebench is the first of those we’ll try, a benchmark based on Maxon’s Cinema 4D rendering engine. It’s multithreaded and comes with a 64-bit executable. This test runs with just a single thread and then with as many threads as CPU cores (or threads, in CPUs with multiple hardware threads per core) are available.

The Core i7’s dominance here is staggering. I won’t belabor the point, but do have a look at the single-threaded results, where multiple cores and Hyper-Threading are no help at all. The Core i7-975 Extreme still leads the field by a considerable margin, in part because its Turbo mode mechanism allows it to run at 3.6GHz while rendering with a single thread. We’ve been talking some about the value trade-off between fewer, faster cores and more, slower cores in the context of the other processors. Thanks to Turbo mode, the Core i7 takes the edge off of that trade-off.

POV-Ray rendering

We’re using the latest beta version of POV-Ray 3.7 that includes native multithreading and 64-bit support. Some of the beta 64-bit executables have been quite a bit slower than the 3.6 release, but this should give us a decent look at comparative performance, regardless.

3ds max modeling and rendering

Valve VRAD map compilation

This next test processes a map from Half-Life 2 using Valve’s VRAD lighting tool. Valve uses VRAD to pre-compute lighting that goes into games like Half-Life 2.

Some trends emerge from our remaining 3D rendering tests. Among them: of course, having more cores is good. But also look at cache size. The Q8400 and Q9400 shadow one another, with very little daylight between them, and the Phenom II X2 550 barely stays ahead of the Athlon X2 550, with the only difference likely due to its 100MHz clock speed advantage.

Folding@Home

Next, we have a slick little Folding@Home benchmark CD created by notfred, one of the members of Team TR, our excellent Folding team. For the unfamiliar, Folding@Home is a distributed computing project created by folks at Stanford University that investigates how proteins work in the human body, in an attempt to better understand diseases like Parkinson’s, Alzheimer’s, and cystic fibrosis. It’s a great way to use your PC’s spare CPU cycles to help advance medical research. I’d encourage you to visit our distributed computing forum and consider joining our team if you haven’t already joined one.

The Folding@Home project uses a number of highly optimized routines to process different types of work units from Stanford’s research projects. The Gromacs core, for instance, uses SSE on Intel processors, 3DNow! on AMD processors, and Altivec on PowerPCs. Overall, Folding@Home should be a great example of real-world scientific computing.

notfred’s Folding Benchmark CD tests the most common work unit types and estimates performance in terms of the points per day that a CPU could earn for a Folding team member. The CD itself is a bootable ISO. The CD boots into Linux, detects the system’s processors and Ethernet adapters, picks up an IP address, and downloads the latest versions of the Folding execution cores from Stanford. It then processes a sample work unit of each type.

On a system with two CPU cores, for instance, the CD spins off a Tinker WU on core 1 and an Amber WU on core 2. When either of those WUs are finished, the benchmark moves on to additional WU types, always keeping both cores occupied with some sort of calculation. Should the benchmark run out of new WUs to test, it simply processes another WU in order to prevent any of the cores from going idle as the others finish. Once all four of the WU types have been tested, the benchmark averages the points per day among them. That points-per-day average is then multiplied by the number of cores on the CPU in order to estimate the total number of points per day that CPU might achieve.

This may be a somewhat quirky method of estimating overall performance, but my sense is that it generally ought to work. We’ve discussed some potential reservations about how it works here, for those who are interested. I have included results for each of the individual WU types below, so you can see how the different CPUs perform on each.

This one splits interestingly among product categories. At the low end, the Pentium E6300 outperforms the two value duallies from AMD. In the mid-range, the Phenom II X4 940 ends up just ahead of the Q8400, but it’s a pretty much a tie for practical purposes. And after a little turmoil due to low scores in each individual WU type, since the CPU is coping with two threads per core, the Core i7-975 Extreme only barely establishes its supremacy over its predecessor, the 965 Extreme.

MyriMatch proteomics

Our benchmarks sometimes come from unexpected places, and such is the case with this one. David Tabb is a friend of mine from high school and a long-time TR reader. He has provided us with an intriguing new benchmark based on an application he’s developed for use in his research work. The application is called MyriMatch, and it’s intended for use in proteomics, or the large-scale study of protein. I’ll stop right here and let him explain what MyriMatch does:

In shotgun proteomics, researchers digest complex mixtures of proteins into peptides, separate them by liquid chromatography, and analyze them by tandem mass spectrometers. This creates data sets containing tens of thousands of spectra that can be identified to peptide sequences drawn from the known genomes for most lab organisms. The first software for this purpose was Sequest, created by John Yates and Jimmy Eng at the University of Washington. Recently, David Tabb and Matthew Chambers at Vanderbilt University developed MyriMatch, an algorithm that can exploit multiple cores and multiple computers for this matching. Source code and binaries of MyriMatch are publicly available.

In this test, 5555 tandem mass spectra from a Thermo LTQ mass spectrometer are identified to peptides generated from the 6714 proteins of S. cerevisiae (baker’s yeast). The data set was provided by Andy Link at Vanderbilt University. The FASTA protein sequence database was provided by the Saccharomyces Genome Database. MyriMatch uses threading to accelerate the handling of protein sequences. The database (read into memory) is separated into a number of jobs, typically the number of threads multiplied by 10. If four threads are used in the above database, for example, each job consists of 168 protein sequences (1/40th of the database). When a thread finishes handling all proteins in the current job, it accepts another job from the queue. This technique is intended to minimize synchronization overhead between threads and minimize CPU idle time.

The most important news for us is that MyriMatch is a widely multithreaded real-world application that we can use with a relevant data set. MyriMatch also offers control over the number of threads used, so we’ve tested with one to eight threads.

I should mention that performance scaling in MyriMatch tends to be limited by several factors, including memory bandwidth, as David explains:

Inefficiencies in scaling occur from a variety of sources. First, each thread is comparing to a common collection of tandem mass spectra in memory. Although most peptides will be compared to different spectra within the collection, sometimes multiple threads attempt to compare to the same spectra simultaneously, necessitating a mutex mechanism for each spectrum. Second, the number of spectra in memory far exceeds the capacity of processor caches, and so the memory controller gets a fair workout during execution.

Here’s how the processors performed.

Make of these rather gratuitously complex results what you will. I’ll just note that the Core i7-975 Extreme finishes in under half time it takes the Core 2 Quad Q8400 to finish.

STARS Euler3d computational fluid dynamics

Charles O’Neill works in the Computational Aeroservoelasticity Laboratory at Oklahoma State University, and he contacted us to suggest we try the computational fluid dynamics (CFD) benchmark based on the STARS Euler3D structural analysis routines developed at CASELab. This benchmark has been available to the public for some time in single-threaded form, but Charles was kind enough to put together a multithreaded version of the benchmark for us with a larger data set. He has also put a web page online with a downloadable version of the multithreaded benchmark, a description, and some results here.

In this test, the application is basically doing analysis of airflow over an aircraft wing. I will step out of the way and let Charles explain the rest:

The benchmark testcase is the AGARD 445.6 aeroelastic test wing. The wing uses a NACA 65A004 airfoil section and has a panel aspect ratio of 1.65, taper ratio of 0.66, and a quarter-chord sweep angle of 45º. This AGARD wing was tested at the NASA Langley Research Center in the 16-foot Transonic Dynamics Tunnel and is a standard aeroelastic test case used for validation of unsteady, compressible CFD codes.

The CFD grid contains 1.23 million tetrahedral elements and 223 thousand nodes . . . . The benchmark executable advances the Mach 0.50 AGARD flow solution. A benchmark score is reported as a CFD cycle frequency in Hertz.

So the higher the score, the faster the computer. Charles tells me these CFD solvers are very floating-point intensive, but oftentimes limited primarily by memory bandwidth. He has modified the benchmark for us in order to enable control over the number of threads used. Here’s how our contenders handled the test with different thread counts.

Again, the Core i7-975 Extreme is over twice as fast as the mid-range quad-cores here. This is the sort of application for which the Nehalem architecture was intended.

Power consumption and efficiency

Our Extech 380803 power meter has the ability to log data, so we can capture power use over a span of time. The meter reads power use at the wall socket, so it incorporates power use from the entire systemthe CPU, motherboard, memory, graphics solution, hard drives, and anything else plugged into the power supply unit. (We plugged the computer monitor into a separate outlet, though.) We measured how each of our test systems used power across a set time period, during which time we ran Cinebench’s multithreaded rendering test.

All of the systems had their power management features (such as SpeedStep and Cool’n’Quiet) enabled during these tests via Windows Vista’s “Balanced” power options profile.

I’ve whittled down these results to just the new processors being tested. You can see the results for the other processors in prior reviews.

Let’s slice up the data in various ways in order to better understand them. We’ll start with a look at idle power, taken from the trailing edge of our test period, after all CPUs have completed the render.

One questions pops out immediately as we look at these results: Why does the Core i7-975 Extreme draw so much more power at idle than the 965 Extreme? Chalk it up to our new motherboard. The Gigabyte EX58-UD5 simply draws more power than the Intel DX58SO that we used with the other Core i7 processors. Gigabyte has a purported solution for this problem in the form of its Dynamic Energy Saver Advanced utility, which is supposed to reduce power consumption. I spent some time trying various versions of this utility, including the latest from Gigabyte’s website, on the EX58-UD5, and none of them workedthey were somehow incompatible with the BIOS revision I was using (the latest publicly available). Frustrating. I had expected Gigabyte to have this issue sorted by now.

Beyond that one issue, the rest of the CPUs tested look well within expectations. Notice that with its two disabled cores still present, the Phenom II X2 550 consumes as much power at idle as any AMD quad-core processor. Still, the Phenom II’s idle power draw is respectably low.

Next, we can look at peak power draw by taking an average from the ten-second span from 15 to 25 seconds into our test period, during which the processors were rendering.

Intel has the edge here in both the value dual-cores and the mid-range quads. The 975 Extreme again draws more power than the 965, probably largely due to the motherboard.

Another way to gauge power efficiency is to look at total energy use over our time span. This method takes into account power use both during the render and during the idle time. We can express the result in terms of watt-seconds, also known as joules.

We can quantify efficiency even better by considering specifically the amount of energy used to render the scene. Since the different systems completed the render at different speeds, we’ve isolated the render period for each system. We’ve then computed the amount of energy used by each system to render the scene. This method should account for both power use and, to some degree, performance, because shorter render times may lead to less energy consumption.

These final results should be no surprise to anyone who has been paying attention. In multithreaded applications, multi-core processors are much more efficient. That’s one reason server processors have been racing toward six and eight cores per socket. Even with its motherboard handicap, the Core i7-975 Extreme places among the top of the pack, because it spent so little time at peak utilization rendering the scene.

The rest of the results are close, yet Intel has a clear edge. The Q8400 proves more efficient than the Phenom II X4 940, and the Pentium E6300 requires less energy to render the scene than the two X2s.

Overclocking

Yep, I overclocked all five of these processors. Took a while, but eh. Just know that these overclocking results are of the quick-and-dirty variety. I didn’t test stability for hours on end, and I didn’t resort to heroic measures in an attempt to squeeze a few extra megahertz out of these CPUs. Instead, I took reasonable steps with common clock and voltage tweaks to reach the best stable speed I could, with air cooling used in all cases. I used smaller stock AMD and Intel coolers for the cheaper processors, but I pulled out the big dawg from Thermalright, a disturbingly large air cooler, for the Core i7-975 Extreme. Let’s take it CPU by CPU:

Athlon II X2 250  I started overclocking this chip by shooting for 3.6GHz, and that’s where I wound up in the end: at 3.6GHz on a 240MHz base HT clock, with the CPU voltage at 1.375V and the HT multiplier dialed back so the effective HyperTransport speed was 1.92GHz. Attempts to go higher were no use, even up to 1.4125V.

 I started overclocking this chip by shooting for 3.6GHz, and that’s where I wound up in the end: at 3.6GHz on a 240MHz base HT clock, with the CPU voltage at 1.375V and the HT multiplier dialed back so the effective HyperTransport speed was 1.92GHz. Attempts to go higher were no use, even up to 1.4125V. Phenom II X2 550 Black Edition  With an unlocked multiplier, this one was definitely easier. My final destination was 3.7GHz at 1.4125V. The system never could boot into Windows at 3.8GHz.

 With an unlocked multiplier, this one was definitely easier. My final destination was 3.7GHz at 1.4125V. The system never could boot into Windows at 3.8GHz. Core 2 Quad Q8400  I started off here by aiming for a logical stopping point: at 3.2GHz on a 1600MHz front-side bus, where all system clocks are in harmony once again. That proved possible on the first attempt with CPU voltage set to “auto” in the Asus BIOS, and I tested performance at that speed. With some tweaking, this baby then reached 3.68GHz on a 460MHz base FSB clock (1840MHz effective). For that, the CPU voltage was at 1.4125V, RAM was at 1226MHz, and FSB voltage was at 1.4V. Unfortunately, getting to this lofty goal involved some unexpected reboots, one of which corrupted my Steam install and prevented me from getting Half-Life 2: Episide Two scores for the Q8400 at 3.68GHz. Re-imaging and running them would have delayed this article another day.

 I started off here by aiming for a logical stopping point: at 3.2GHz on a 1600MHz front-side bus, where all system clocks are in harmony once again. That proved possible on the first attempt with CPU voltage set to “auto” in the Asus BIOS, and I tested performance at that speed. With some tweaking, this baby then reached 3.68GHz on a 460MHz base FSB clock (1840MHz effective). For that, the CPU voltage was at 1.4125V, RAM was at 1226MHz, and FSB voltage was at 1.4V. Unfortunately, getting to this lofty goal involved some unexpected reboots, one of which corrupted my Steam install and prevented me from getting Half-Life 2: Episide Two scores for the Q8400 at 3.68GHz. Re-imaging and running them would have delayed this article another day. Pentium E6300  The first logical stopping point for the E6300 was at 3.5GHz on a 1333MHz FSB, which it handled perfectly on the first try, with the CPU voltage set to “auto.” Unfortunately, the corruption of my Steam install dampened my enthusiasm for seeing exactly how high I could take the E6300. Perhaps in a future article, I’ll test its limits further.

 The first logical stopping point for the E6300 was at 3.5GHz on a 1333MHz FSB, which it handled perfectly on the first try, with the CPU voltage set to “auto.” Unfortunately, the corruption of my Steam install dampened my enthusiasm for seeing exactly how high I could take the E6300. Perhaps in a future article, I’ll test its limits further. Core i7-975 Extreme  With its unlocked multiplier, the 975 Extreme was another easy overclock, and it was another one where my initial attempted speed4GHz, in this caseproved to be the highest one it achieved, at 1.3875V. The system would boot into Windows at 4.1GHz, but it would reboot during a Prime95 stability test, even at 1.4125V.

There you have it. Here’s a quick performance test at our overclocked speeds, along with a few overclocked results from other recent processor reviews.

Some nice performance gains there in each case. Obviously, overclocking one of the value processors is a great thing, since it will put you in league with some of the fastest desktop CPUs aroundwitness the Phenom II X2 550 Black Edition at 3.7GHz. Still, as nice as that is, the Core i7-975 Extreme at 4GHz is just incredibly impressive.

The value proposition

We’ve taken a long and meandering route through several truckloads of performance data, and in order to help you make sense of it all, we have ripped a page from our last CPU value article.

To create a synthetic “overall performance” score, we computed an unweighted average of the results for a subset of our tests consisting of the benchmarks used in the CPU value article. Our formula includes 22 different benchmarks, but since our aim is practicality, it excludes a few more esoteric ones like the scientific computing applications. As our baseline, the Athlon X2 6400+ gets a 100% score. Other scores are all relative to it.

Of course, what you see below is a crazy experiment and probably meaningless, but some folks may find it a worthwhile thought exercise, at least. These scatter plots show price versus performance in a fairly intuitive way. To oversimplify slightly, the best CPU values tend to be located closer to the top and left edges of the plot.

As one might expect, some of the new CPUs we’re reviewing today come out looking good in this analysis. The totality of our benchmarks is somewhat biased toward multi-core processors, so the Core 2 Quad Q8400 shows up in a nice spot on this plotas does its rival, the Phenom II X4 940, whose overall performance is slightly higher. The Pentium E6300 appears to have a clear edge over the Athlon II X2 250, but the Phenom II X2 550 is also a strong value with a higher performance rating.

Of course, the 975 Extreme is no great value, but it is progress over the like-priced Core i7-965, which it replaces.

Now, here’s another crack at the same issue with total system cost taken into account. To get our pricing numbers for the X axis, we’ve added the cost of a motherboard, memory kit, graphics card, and hard drive to that of our processors. Wherever it made sense, we picked components from our latest system guide. Also, we got all our prices from Newegg. Here’s a complete breakdown:

Intel LGA775 platform AMD Socket AM2+ platform Intel Core i7 platform Gigabyte GA-EP45-UD3P $135 Gigabyte GA-MA790X-UD4P $110 Gigabyte GA-EX58-UD3R $200 4GB Kingston DDR2-800 $51 4GB Kingston DDR2-800 $51 6GB Corsair DDR3-1600 $104 Sapphire Radeon HD 4870 512MB $165 Sapphire Radeon HD 4870 512MB $165 Sapphire Radeon HD 4870 512MB $165 Western Digital Caviar Black 640GB $75 Western Digital Caviar Black 640GB $75 Western Digital Caviar Black 640GB $75 $426 $401 $544

Notice that we are making some assumptions here that may not be entirely valid. For instance, we’ve priced the Socket AM3 Phenom II processors on a Socket AM2+ motherboard with DDR2 memory, though we tested most of them with DDR3 memory. As you may have noticed, memory type didn’t make much difference at all to the performance of the Phenom II X4 810, and we expect the story will be similar for the rest. In the same vein, we priced the Core 2 processors with DDR2 memory, though we tested them with DDR3. Our goal in selecting these components was to settle on a standard platform for each CPU type with a decent price-performance ratio, not to exactly replicate our sometimes-exotic test systems.

Thanks to a lower overall cost for the Socket AM2+ platform, the AMD processors separate themselves from the Intels in this plot. Rather dramatically, a line of green dots runs down the left edge of the performance results between 100% and 175%. Assuming this difference in motherboard prices holds up as typical in the market, the Athlon II and Phenom II chips at various price points look to be the better deals, generally. Among the strongest values from Intel are the Core 2 Quad Q8400 and the Core i7-920, which is in a class by itself.

For what it’s worth, the Core i7-950 would presumably sit atop the i7-940 just like the i7-975 does above the i7-965 at the same price point, as a slightly better value.