This site may earn affiliate commissions from the links on this page. Terms of use

AMD’s Kaveri is, in many respects, a huge step forward. The new APU’s low-power performance is excellent, its integrated graphics torpedo anything Intel offers at an equivalent price point, and it includes support for features like Mantle, HSA, and TrueAudio. Yet, despite these lauded capabilities, there’s a clear problem sitting in the middle of Kaveri like a turd in the proverbial punchbowl: Steamroller.

Despite significant improvement in the low-power segment, it remains fundamentally incapable of matching Intel clock-for-clock. HSA might one day help address the problem, but it’ll be years before HSA-compatible software is readily available.

It’s time to take a page from Intel’s book and dump the core. The good news is, much in the same way that Intel’s Pentium M core would eventually replace NetBurst, AMD already has a core that’s capable of stepping into Steamroller’s shoes — it just needs to be fine-tuned for the role.

How killing the Pentium 4 saved Intel

The Pentium M (codename: Banias) was created because Intel recognized that the Pentium 4 wasn’t going to be capable of addressing the mobile market very effectively. The Pentium M design team took Intel’s older Pentium 3 core (Tualatin) and optimized it for high efficiency and low power.

Banias used the P4’s quad-pumped front-side bus, added support for SSE2, and inherited the sophisticated branch prediction unit that the P4 relied on to keep its 20-stage pipeline fed. Over the next few years, as it became increasingly clear that the P4 had run out of gas, Intel cross-pollinated between the two architectures. Efficiency-boosting technologies like SpeedStep and the Pentium M’s indirect branch predictor were ported to the P4 as well. In the long run, it was the Pentium M that gave Intel a path to the Core 2 Duo and Nehalem architectures — not the broken, fundamentally flawed Pentium 4.

Could Kabini’s Jaguar core do something similar? Let’s find out.

Calculating relative efficiency between Kabini, Kaveri, and Richland

The simplest way to measure the efficiency of the two chips is to divide their respective benchmark scores in a given application by (CPU Frequency * Core Count). This normalizes both variables and gives us a measure of intrinsic core performance. The next step was to turn each of these clock-and-core normalized figures into a percentage. In a test like Cinebench, a score less than 100% indicates that Kabini is less efficient than its big-core rival, while a score of greater than 100% means Kabini is more efficient.

Our test data was drawn from both our own tests and test results published at other major industry sites. The second set of efficiency figures is based on results in 18 synthetic and real-world tests, while the first set compares only real-world results (10 in total). Even if we omit the synthetic tests where Kabini does quite well, the core is still extremely competitive with AMD’s “big core” architecture, with an efficiency gap of less than 10%. More importantly, there’s low-hanging fruit that would close that distance. Turning the L2 cache back up to full speed would help close the performance gap between the two, as would more aggressive branch prediction.

Next page: Why ditching Steamroller is the right move for AMD