Engineers at North Carolina State University have used a novel technique to boost the performance of an AMD Fusion APU by more than 20%. This speed-up was achieved purely through software and using commercial (probably Llano) silicon. No overclocking was used.

In an AMD APU there is both a CPU and GPU, both on the same piece of silicon. In conventional applications — in a Llano-powered laptop, for example — the CPU and GPU hardly talk to each other; the CPU does its thing, and the GPU pushes polygons. What the researchers have done is to marry the CPU and GPU together to take advantage of each core’s strengths.

To achieve the 20% boost, the researchers reduce the CPU to a fetch/decode unit, and the GPU becomes the primary computation unit. This works out well because CPUs are generally very strong at fetching data from memory, and GPUs are essentially just monstrous floating point units. In practice, this means the CPU is focused on working out what data the GPU needs (pre-fetching), the GPU’s pipes stay full, and a 20% performance boost arises.

Now, unfortunately we don’t have the exact details of how the North Carolina researchers achieved this speed-up. We know it’s in software, but that’s about it. The team probably wrote a very specific piece of code (or a compiler) that uses the AMD APU in this way. The press release doesn’t say “Windows ran 20% faster” or “Crysis 2 ran 20% faster,” which suggests we’re probably looking at a synthetic, hand-coded benchmark. We will know more when the team presents its research on February 27 at the International Symposium on High Performance Computer Architecture.

For what it’s worth, this kind of CPU/GPU integration is exactly what AMD is angling for with its Heterogeneous System Architecture (formerly known as Fusion System Architecture). AMD has a huge advantage over Intel when it comes to GPUs, but that means nothing if the software chain (compilers, libraries, developers) isn’t in place. The good news is that Intel doesn’t have anything even remotely close to AMD’s APU coming down the pipeline, which means AMD has a few years to see where this HSA path leads.

If the 20% speed boost can be brought to market in the next year or two, AMD might actually have a chance.

Updated @ 17:54: The co-author of the paper, Huiyang Zhou, was kind enough to send us the research paper. It seems production silicon wasn’t actually used; instead, the software tweaks were carried out a simulated future AMD APU with shared L3 cache (probably Trinity). It’s also worth noting that AMD sponsored and co-authored this paper.

Updated @ 04:11 Some further clarification: Basically, the research paper is a bit cryptic. It seems the engineers wrote some real code, but executed it on a simulated AMD CPU with L3 cache (i.e. probably Trinity). It does seem like their working is correct. In other words, this is still a good example of the speed-ups that heterogeneous systems will bring… in a year or two.

Read more at North Carolina State University