AMD Brings Kabini to the Desktop

AMD Packs a Punch at Low Power

Perhaps we are performing a study of opposites? Yesterday Ryan posted his R9 295X2 review, which covers the 500 watt, dual GPU monster that will be retailing for $1499. A card that is meant for only the extreme enthusiast who has plenty of room in their case, plenty of knowledge about their power supply, and plenty of electricity and air conditioning to keep this monster at bay. The product that I am reviewing could not be any more different. Inexpensive, cool running, power efficient, and can be fit pretty much anywhere. These products can almost be viewed as polar opposites.

The interesting thing of course is that it shows how flexible AMD’s GCN architecture is. GCN can efficiently and effectively power the highest performing product in AMD’s graphics portfolio, as well as their lowest power offerings in the APU market. The performance scales very linearly when it comes to adding in more GCN compute cores.

The product that I am of course referring to are the latest Athlon and Sempron APUs that are based on the Kabini architecture which fuses Jaguar x86 cores with GCN compute cores. These APUs were announced last month, but we did not have the chance at the time to test them. Since then these products have popped up in a couple of places around the world, but this is the first time that reviewers have officially received product from AMD and their partners.

Big Step up from Brazos

Several years ago AMD released the Brazos platform which combined the Bobcat CPU architecture with the Evergreen graphics component in a product called Zacate? Enjoying all the code words? Wikipedia keeps a nice record of them all. This was AMD’s first low power APU, and it was a pretty solid success for the company. Two Bobcat cores proved to be adequate competition to Intel’s Atom based products, and the graphics portion was a significant step above what the rest of the industry had to offer in that particular power envelope.

The Bobcat architecture is a big step away from what AMD was doing with Bulldozer at the time. IPC was important, but so was power. They went with a lower clockspeed and a lot of pretty clever design decisions to get IPC up there without sacrificing too much die space or power to get there. Keeping the clockspeed low allowed them not only to save power, but to really manipulate the core architecture to really push the work per clock as compared to Bulldozer which relies on higher clockspeeds to achieve good performance.

Jaguar builds upon this to a great degree. AMD has doubled the core count from two to four with Kabini, and they have also improved IPC and power consumption. No real sacrifices were made to achieve this, or at least so the engineers claim. No souls were lost in the design of this APU. AMD did extensive work on the front end (including a beefier branch predictor), re-arranged the integer and SSE/MMX/AVX pipelines to balance out the workload, and improved the caches. Here is a quick reference to what work was done.

Jaguar also embraces all of the latest instruction sets. This includes BMI, AVX, FC16, SSE 4.2, and AES. Unlike Bulldozer and its variants, Jaguar does not utilize a “modular” architecture. Each core in Jaguar is self sufficient with its own floating point unit, as compared to the shared unit with each dual core “module” in Bulldozer/Piledriver/Steamroller. The only thing shared is the decently sized L2 cache, which is 2 MB in size. Caches typically use up a lot of power, so the smaller the L2 cache the better… until it gets so small that it has a big negative impact on overall performance. 2 MB seems to be a pretty good balance between performance and power.

Kabini sports the latest iteration of AMD’s GCN architecture. It has support up to DX 11.2, OpenGL 4.3, and OpenGL ES 3.0. Kabini is not fully HSA compliant, but it does support a lot of HSA-like features. It fully supports OpenCL 1.2, DirectCompute, and C++AMP. All of the HSA optimizations in Kaveri did not trickle down to Kabini, but for a product in this particular niche, this is not necessarily a disadvantage. Kabini features 2 GCN compute cores with 128 stream units. In this case there are 4 x 16 wide vector units per compute core with a scalar co-processor. Multimedia features include UVD 4.2 (Universal Video Decoder) and VCE 2.0 (Video Coding Engine).

Kabini is made on TSMC’s 28nm HKMG process as of now. This could change in the future, but for the moment AMD is sticking with TSMC for this particular product. TSMC has been producing 28nm products in a variety of flavors for over three years now. If the reader has not gathered by now, Kabini is not new. It was introduced in Q2 of last year and Ryan covered the mobile implementation here. This chip has been used in low end notebooks as well as higher powered tablets.

Something that sometimes gets glossed over is that Kabini is a true SOC. It does not need a separate southbridge for I/O functions. Onboard it features 2 SATA 6G, 2 USB 3.0, and 8 USB 2.0 ports. It has multiple PCI-E 2.0 lanes coming out that can connect to peripherals such as Ethernet and wireless controllers.

Since power is the primary concern, AMD stuck with a single DDR-3 channel running at a max speed of DDR-3 1600. The chip can push up to three displays at one time. It natively supports DisplayPort, HDMI, DVI, and VGA outputs.

Jaguar is the basis for the latest generation of consoles from Sony and Microsoft. These custom designed SOCs feature 8 core Jaguar implementations attached to much more robust GCN based units than we see here. Still, it is interesting to see that Jaguar is going to be the basis for a lot of games that will be released on both consoles and PCs for many years to come.