Intel has agreed to settle a class action lawsuit that claims the company “manipulated” benchmark scores in the early 2000s to make its new Pentium 4 chip seem faster than AMD’s Athlon. Intel will pay affected consumers $15 if they purchased a Pentium 4 system between November 20, 2000 and June 30, 2002. Affected systems include all systems with a Pentium 4 CPU purchased between November 20, 2000 and December 31, 2001 — and all systems with a first-gen Willamette P4 or all P4s clocked below 2GHz, between January and June 2002. The exception is Illinois — if you live in Illinois and bought a P4, too bad for you.

Don’t worry about digging up a receipt for the purchase — the only thing you’re required to do is list the model number of the system you bought, and you qualify for the $15 reimbursement. You are required to verify under penalty of perjury that you belong to the stated class, but that’s the extent of the problem. Intel will also make a $4 million donation to an education fund as part of its settlement.

Did benchmark manipulations impact AMD’s relative performance?

Short answer? Yes.

Longer answer: Yes, and we can prove it.

Let’s look at two cases. First, there’s Sysmark. AMD CPUs were extremely competitive in Sysmark 2000, but fell far behind the Pentium 3 and Pentium 4 in Sysmark 2001’s Internet Content Creation tests. An investigation turned up the reason why — instead of simply checking to see if a CPU supported SSE, Windows Media Encoder 7 checked for the “GenuineIntel” string. Since AMD chips didn’t have it, the program refused to use SSE for AMD’s processors.

At the time this was treated as an unusual case and once-off — not a systemic campaign to damage AMD’s performance in system benchmarks. In actual fact, this was an early example of Intel’s “Cripple AMD” compiler function in action. It didn’t matter if AMD chips actually supported SIMD instructions — programs compiled with Intel’s compiler would refuse to use those instructions on AMD processors. (Sysmark 2002 was redesigned to blatantly favor and promote the P4, but that’s another story altogether).

More troubling is the issue of POV-Ray 3.6.0, which prides itself on being open-source. While this program was released somewhat later, it dropped simultaneously with Prescott’s launch (2004). When I tested it ten years ago, I found its performance to be extremely odd — the included benchmark ran slower on both AMD and Intel Northwood hardware compared to POV-Ray 3.5, but Prescott was much quicker.

A tech journalist scorned…

I wrote about this and declared it an example of benchmark shenanigans. In response, POV-Ray declared that I was lying. In an open letter, POV-Ray wrote “Our source code is openly available. In fact if you had cared to you could have downloaded both the v3.5 and v3.6 source code from our FTP site and compared them for any such tweaks – something that you did not, it appears, do.”

The funny thing is, I did do that — but the programmer friend who helped me with Intel’s compiler could never reproduce the results in POV-Ray 3.6.0, despite compiling six different executables with different optimization levels in an attempt to do so.

Fast forward almost a decade. A few months ago, I decided to play with a Perl script that can strip the “Cripple AMD” functions out of executables compiled by Intel compilers. I tested it on the copy of POV-Ray 3.6.0 I’ve kept on hand ever since. Please note that I tested using modern hardware and under Windows 7, not a 2004-era system. Not only did it detect and strip out the “Cripple AMD” function, the impact on performance was rather dramatic. (Note: POV-Ray 3.5 was not compiled with an Intel compiler. POV-Ray 3.6.0 was.).

I want to stress that this doesn’t mean AMD’s performance was crippled by 50% in the original test — but it’s clear that contrary to what the POV-Ray team was saying then, the 3.6.0 version of the test was compiled in a manner that tilted the competitive landscape towards Intel. But surely that’s a one-time thing, right? An artifact from ten years past?

Well, no. Not exactly. I also took Sysmark 2012 out for a spin, and applied the same script to strip out the Cripple AMD function from both the benchmark and its satellite applications.

Even allowing for run variance, the gaps in some tests are far too wide. To be fair, this doesn’t necessarily point to Intel cheating, because all of these applications are the work of third parties. The Sysmark 2012.exe, still showed performance differences after I patched it (According to Bapco, the Sysmark 2012.exe is compiled with Microsoft Visual Studio, not ICC, but the same strings were still detected and adjusted). Not all the performance improvement comes from that change, however, which illustrates how complicated the performance-measuring field can be. It’s difficult to declare a benchmark “neutral” when the application it runs is compiled in a manner that benefits one vendor over another.

The principle reason no one makes a big deal about these gaps anymore is because the difference between Intel and AMD has simply grown too wide. An 8-12% systemic improvement for Intel may make AMD look worse than it otherwise would, but AMD’s performance in Sysmark 2012 can lag Intel by as much as 50% — and that’s not something that compiler patches can fix.

Not every benchmark compiled with Intel’s compiler shows evidence of this kind of shift — I tested every Cinebench test going back to 2000 on the A10-7850K, and while several executables were compiled with Intel’s compiler, none of them show any signs of performance difference when patched. But it’s interesting to see how compiler choices continue to influence the performance of supposedly neutral benchmarks.