AMD's next generation Kaveri APU architecture has been thoroughly detailed by PC.Watch which shows how the fourth generation APU platform from AMD combines the latest Steamroller and Radeon GCN cores, sharing the memory dynamically through HUMA and HSA enhancements. AMD also demonstrated Kaveri APU at their 2013 Computex seminar which can be seen here.

Images are courtesy of PC.Watch.Impress!

AMD Next Generation Kaveri APU Architecture Detailed

AMD launched their Richland "3rd Generation" APU platform last month which was an update to the already launched Trinity APU platform. Since the Richland APUs are a refresh of the Trinity platform, the performance was not improved much but it did bring enhanced clock speeds and much more power efficiency. AMD achieved this by tweaking their Piledriver core which itself is a revision of the Bulldozer architecture. However, we knew that AMD's Richland APU platform would be short lived since AMD had already planned to launch its fourth gen Kaveri APU sometime in Q4 2013.

Kaveri APU Equips 28nm Steamroller Core

The biggest architectural change Kaveri APU features is the use of the latest 28nm Steamroller architecture that is a true multi-threaded architecture focusing on enhancing the IPC (Instruction-Per-Cycle) by upto 25%. In each module, two separate threads are provided with their own parallel instruction decoder, due to enhancements, the steamroller die would be larger than Bulldozer and Piledriver with each module housing two steamroller cores with a shared L2 cache. You can see the block diagram for the differences between the Bulldozer and Steamroller architecture below:

One thing to note is that the Kaveri APU architecture is not only built around the 28nm Steamroller architecture but also features the 28nm GCN graphics module. Llano APUs featured the VLIW5 architecture while Trinity and Richland used the updated VLIW4 architecture but still older compared to the discrete GPU offerings. Kaveri APU on the other hand makes use of the Radeon GCN (Graphics Core Next 2.0) architecture. The 2.0 represents an enhanced and efficient design that aims to improve graphical performance while keeping the power consumption under limits.

AMD GCN Architecture Powers Kaveri Integrated GPU

On the GPU side, Kaveri APU would make use of upto 512 stream processors or (MADs) which amount to 8 compute units. It is mentioned that the number of stream processors would range from 384 SPs to 512 SPs though it could feature even a higher number such as 768 SPs. The AMD Radeon HD 7750 features 512 SPs while the upcoming Radeon HD 7730 would feature 384 SPs so we are pretty much looking at performance levels of a discrete entry level GPU here which would be fantastic for a integrated graphics module. The total performance on the compute end would fall above the 1 TFLOPs limit amounting the GPU and FPU performance for the CPU core.

Though, we are skeptical if we will see a higher number of stream processor count in AMD Kaveri APU architecture since the memory type is still limited to DDR3 as seen through the FM2+ motherboard showcased at Computex 2013. DDR3 memory bandwidth could become a slight bit of an issue for an integrated graphics with higher stream processor count but AMD might have an answer to solve this. HUMA and HSA enhancements would allow the GPU and CPU to dynamically share the same virtual memory address but we don't know if that would help solve the bandwidth issues.Nevertheless, the GCN part on the Kaveri APU sound great since it would atleast allow budget gamers to play the latest titles without spending much on a discrete GPU.

A major feature to be implemented in the Kaveri APUs is the HUMA memory architecture which is part of HSA (Heterogeneous System Architecture) which would allow PCs to make use of a unified memory architecture which would allow cross sharing of system ram between the GPU and CPU. This would enhance the way PCs access and communicate with their memory.

AMD Kaveri APU - FM2+, A88X/A78X Chipset, Backwards Compatibility With Trinity/Richland

AMD Kaveri APU platform would launch with the AMD A88X "Bolton D4" and A87X chipsets on the socket FM2+ motherboards. The socket FM2+ would be backwards compatible with both Richland and Trinity APUs which is a plus point but Kaveri won't operate on the FM2 motherboards due to different pin layout. Kaveri would launch in Dual and Quad core steamroller SKUs with GCN enabled integrated graphics processors. Desktop models would range between 100 and 65W TDPs while mobility models would have 35W TDP and even lower for specific APUs, Kaveri would also arrive in the server market as the Berlin platform. Expect more details in the upcoming months, launch takes place in Q4 2013. Do read the full article at PC.Watch for a detailed intro at Kaveri.