At long last, AMD has launched the second of its so-called Fusion "APUs," where APU stands for "accelerated processing unit" and refers to a single chip that hosts both a Central Processing Unit (CPU) and a Graphics Processing Unit (GPU). Anandtech is first out of the gate with benchmarks for AMD's Llano testbed notebook, and the results show that the new chip is a win for AMD in a two departments.

Llano's battery life is excellent, besting nearly all comers in terms of efficiency—only AMD's "Brazos" platform, with its simple Bobcat core, beats Llano in this department. This is the first time in a long time that AMD has been competitive with Intel in mobile power draw.

Llano's other big win is in graphics, about which we'll talk more in a moment. But before jumping into that discussion, we should briefly mention where Llano lags behind: the CPU.

The 32nm "Stars" cores that form the CPU side of Llano are a straightforward shrink of AMD's existing and venerable Phenom architecture, with a few minor updates. As such, these cores are significantly underpowered compared to Intel's Sandy Bridge cores, and the benchmarks show it. On CPU-bound workloads, Llano gets a sound drubbing from Intel's Sandy Bridge. So the CPU side of Llano is its Achilles heel, a fact that will keep Llano confined to the role of a budget alternative to Sandy Bridge, and not a competitor.

Surprises from Llano's GPU

The GPU is where Llano gets interesting. In both its mobile and desktop incarnations, Llano's GPU is a DX11-class, +400-shader GPU that takes up almost half of the processor die. Given that this GPU is a real, general-purpose graphics coprocessor, this makes Llano equal parts GPU and CPU. (Compare to Intel's Sandy Bridge, where the GPU is quite a bit smaller than the CPU region.)

The fact that Llano's GPU is so beefy is both a blessing and a curse. Unlike Sandy Bridge's relatively anemic GPU (which gets a huge performance boost from moving onto the processor die), Llano's GPU has so much horsepower that it's severely memory-bottlenecked in the on-die configuration. If the desktop Llano's 6550D GPU were put on a discrete graphics card with its own pool of fast GDDR, it would actually out-perform the integrated configuration. This is not something that I expected or predicted, but it makes sense.

To see what I mean, check out Anand's benchmarks of Llano in a desktop configuration. When used with an overclocked memory bus (DDR-1866), Llano leaps up in the rankings and lands squarely in budget discrete GPU territory. Clearly Llano's GPU is massively memory-bound, and CPU and GPU together are suffering from a lack of memory bandwidth in the APU socket. This fact has a few interesting implications.

First, it means that AMD can boost Llano's performance significantly in future versions simply by adding another dual-channel DDR3 controller and introducing a new socket. Or AMD could also push the more widespread use of higher-clocked DDR.

By far the most interesting implication of the Llano results, however, is that this is a chip that would benefit massively from the addition of some kind of IBM-style pool of on-die DRAM (i.e., PS2-style scratchpad RAM). Right now, Llano's CPU and GPU are connected via an internal bus, which is great as far as it goes. But if Llano had a large enough pool of on-die memory on the same bus, it would put it into a whole new performance league. (See below.)

Ultimately, Llano as it currently exists is in fine shape in the notebook segment, where it handily beats Intel's Sandy Bridge in almost every gaming benchmark (the exceptions are one or two CPU-bound titles). Despite the memory bottleneck, the GPU in Llano really delivers the gaming goods, and on a price/performance basis it sets a new standard for budget mobile gaming. Llano is so capable that it's able to compete with midrange discrete mobile graphics solutions, a fact that reinforces just how much trouble NVIDIA is in with this particular market.

In Anand's desktop preview, the desktop version of Llano puts up a solid showing against Intel's Sandy Bridge, where, again, it easily bests its opponent in all but a few CPU-bound benchmarks.

All told, Llano is a solid entry into the mobile market that will make a worthy budget alternative to Intel's mobile Sandy Bridge. On the desktop, Llano looks to be similarly positioned, but given that testing at most sites is still ongoing, we'll have to reserve judgment until the official launch.

Postscript: possible futures for Fusion

The Llano + eDRAM idea mentioned above is obviously not going to happen with Llano—the current design is transitional, and will live out its life as a budget part. But who's to say that AMD won't do something like this in a future APU? AMD has been made it clear that Llano is just a bridge design—halfway between the integrated graphics of the previous generation and the type of true heterogeneous multiprocessing that will characterize future Fusion efforts.

AMD hasn't really let on what it the ultimate Fusion part will look like architecturally, but the endgame for Fusion is probably a giant pool of shader cores, a small number of CPU cores, and a big enough pool of shared, on-die memory that the CPU and GPU can cut way down on the amount of memory traffic that goes off-die.

This pool of shared memory could be a wired-down section of L3 cache (IBM does this with its game console chips), or it could be a separate pool of "scratchpad memory" that the CPU and GPU have access to (IBM did this with the PS2's Emotion Engine). From the perspective of boosting graphics performance, the latter seems preferable to me, but my knowledge in this area is spotty, so more informed readers should feel free to correct me in the comments.

About four years ago, when IBM first began making waves with its eDRAM efforts, the idea that Big Blue's eDRAM cells might show up on an AMD processor was commonly floated. IBM and AMD have collaborated in the past on fab technology (prior to the Globalfoundries spinoff), so the idea is by no means out of the question.

IBM's POWER7 chip, a derivative of which will probably show up under the hood of the Wii U, sports a giant 32MB pool of on-die eDRAM in the form of an L3 cache. Even a fraction of that amount of memory added to Llano could cut back on a lot of off-die memory traffic and give a major leap in graphics performance.