Ten years ago this week, AMD launched the first consumer x86 64-bit processors, the Athlon 64 FX-51 and Athlon 64 3200+. It remains one of the most exciting products that I ever covered — the technologies that the Athlon 64 debuted have continued to shape modern computing into the present day. The two chips launched after a delay of nearly a year, at a time when Intel was hammering its smaller rival with repeated clock jumps on the Pentium 4. This was a sharp reversal from 1999 to 2001, when the K7-based Athlon processor was more than holding its own against Intel’s product line, partly thanks to Intel’s repeated missteps.

After reeling for several years in the wake of Itanium problems, the i820-Rambus debacle, losing massive chipset business to VIA, the failed attempt to launch the 1GHz Pentium 3, and P4’s slow, subpar debut on 180nm technology, the 130nm Northwood P4 had kicked off with a vengeance in early 2002. Over the next 18 months, Intel drove the Pentium 4’s clock speed from 2GHz to 3.2GHz, an increase of 60%. AMD, meanwhile, was scrambling to keep up. While the company’s 180nm Athlon XP family had compared extremely well against late-model Pentium III and early Pentium 4 chips, as the P4’s clock rose, Athlon fell behind. AMD couldn’t match Intel’s aggressive clock speed ramp, the K7 lacked SSE2 support, and the now-aging core was confined to single-channel RAM compared to the P4’s dual-channel design.

AMD fought back with K7 refreshes like Thoroughbred (the first 130nm K7) and Barton (K7 with a 512K L2 cache, up from 256K), but the gulf between the two architectures was just too wide. Then, K8 launched, and everything changed.

The aptly named hammer

K8’s execution capabilities were actually identical to K7’s, but AMD reorganized the fetch/decode units to deliver a higher number of instructions per clock cycle (IPC). Branch prediction was significantly improved in K8, and the chip picked up SSE2 support. The major benefit to Athlon 64, however, was its integrated memory controller. Prior to K8, all AMD and Intel systems used a separate memory controller chip sitting on the motherboard. This chip ran at the same clock speed as the front side bus — 200-400MHz for AMD systems, 400-800MHz (equivalent) for P4 systems. Bringing the memory controller on to the CPU meant that instead of moving data at 400MHz, the chip could move it at 2000-2200MHz. With the controller on die, access latency also shot downwards thanks to shorter trace lengths.

The impact on K8’s performance was enormous. Memory latencies dropped from 100-120ns down to 50-60ns. Both the FX-51 and the K7 Barton 3200+ ran at 2.2GHz, but the new AMD chip was anywhere from 15-50% faster than its predecessor. Intel countered with a last-minute launch of the Pentium 4 “Extreme Edition,” a rebranded Gallatin-class Xeon with 2MB of L3 cache. The unexpected new chip kept the company from being altogether toppled in benchmark ratings, but between the Athlon 3200+ at around $400 and the FX-51 at $733, AMD suddenly had momentum on its side. I’ve gone back and pulled some benchmark data from the original review to demonstrate how significant the jump was. Keep in mind that when the 3.2GHz P4 EE did eventually ship, the chip was nearly $200 more expensive than the already-pricey FX-51. The top purple bar is the P4 Extreme Edition, the black bar is the Athlon 64 FX-51, and the blue bar at the bottom is the Barton 3200+.

There were still tests that the P4 won, thanks to Hyper-Threading — it retained an edge in workstation 3D rendering tests, for example — but AMD had surged ahead in many single-threaded workloads and was proven capable of closing the gap with Intel in a number of areas.

Next page: David vs. Goliath