AMD’s Bulldozer is finally here, after years of development — and its performance is significantly worse than anyone expected. The situation is ugly enough that it may explain why so many executives left AMD over the past twelve months, and why the company was so tight-lipped about their departure. Bulldozer’s general performance has been widely covered; our goal here is to drill into why the CPU performs the way it does rather than covering it in a wide range of real world scenarios.

Note: AMD’s Turbo Core and Intel’s Turbo Mode were disabled on all chips, in order to prevent them from adjusting the CPU’s clock speed and throwing off results. As a consequence, the results here will be lower than in a standard review, particularly for single-thread performance.

The first thing to understand about Bulldozer is that it leverages aspects of simultaneous multi-threading to combine the functions of what would normally be two discrete cores into a single package (AMD refers to this combination as a “module“). Each module contains what Windows identifies as two cores, but combining instruction scheduling and CPU resources has an impact on CPU scaling in multi-threaded tests when compared to the same programs running on “traditional” multi-core processors.

When AMD designed Bulldozer, it was aiming for a CPU that would be easier to ramp to higher frequencies while maintaining the same IPC (instructions per clock cycle) as its six-core predecessor. In order to hit higher clockspeeds, AMD lengthened the CPU’s pipeline and increased latencies throughout the architecture. The concept of building chips for higher frequency has had a bad rap since the disastrous Prescott Pentium 4; after seeing Bulldozer’s overall performance, AMD’s decision to take this route may not have been a very good one. As things stand, the FX-8150 struggles to surpass Thuban in a number of tests while its IPC definitely took a hit.

Before we dig into the CPU’s architecture, however, there’s an OS factor to discuss. According to AMD, Windows 7 doesn’t understand Bulldozer’s resource allocation very well. Windows 7 “sees” eight independent CPU cores, despite the fact that each module shares scheduling and execution resources. Sometimes it makes the most sense to spin threads off to idle cores before scheduling them on cores already busy with something else. Other times, it’s best to spin two related threads off to the same core. Windows 8 will apparently be much more proficient at scheduling work loads where it makes the most sense to execute them.

This issue has a practical impact on the CPU’s performance because of the way AMD’s Turbo Core is implemented. The new flavor of Turbo Core is meant to increase maximum clock speed by up to two speed grades if only four cores are enabled. Since Windows 7 doesn’t understand which cores to turn off, however, the CPU is less likely to increase its clock speed as high as it otherwise would. “Turbo” speeds were originally introduced by Intel as a way to squeeze more performance out of lightly-threaded or single-threaded workloads, but Bulldozer’s architecture makes those extra megahertz particularly important.

We checked the impact of Windows 7’s scheduler by measuring CPU performance in Maxwell Render 1.7 and Cinebench 11.5. Both programs allow the user to define a specific number of threads (four, in our case). The 4M/8C label means that all eight cores are active, 4M/4C means that all four modules are active, with one core operating per module, and 2M/4C denotes a dual-module/quad-core configuration. Both of these tests show a 4M/4C arrangement outperforming a 4M/8C system by roughly eight percent when four threads are used. This suggests that scheduler inefficiencies could indeed be hurting Bulldozer’s general performance in workloads that can’t take advantage of all eight cores.