Diving into the Barts Core



Cypress Core​

Unlike what they did with the Evergreen series, AMD isn’t trying to rewrite the book on performance or push new boundaries with their “refreshed” cards. Rather, the dual engine architecture which distinguished Cypress has been generally carried over with a few modifications made along the way. Barts isn’t the focus of a fundamental architectural change in any way, shape or form. It is all about the gradual refinement of an existing design into something with with a smaller die size and superior performance per watt. The end result is that AMD can now push a more affordable high performance solution to end users without sacrificing profit margins.At first glance, there isn’t all that much to distinguish the full Barts XT core from the outgoing Cypress other than the obvious change in the number of SIMDs, which results in less overall SPs (or Stream Cores as AMD calls them). In order to achieve high end performance which is optimized for efficiency, the engineers started with the basic back-end of the Cypress XT and built up from there. This means the graphics engine including the fixed function stages, L2 cache, ROP arrays and memory controllers have gone largely untouched. There were some changes to improve tessellation performance and communication between the different stages within the rendering pipeline but the vast majority of tweaks happened within the SIMD engine layout.Since the Cypress Pro (which AMD is replacing with the Barts XT) used a full Cypress XT core with a few disabled SIMDs, it was inherently inefficient from a number of perspectives. In order to increase performance per watt, the HD 5850 was taken as a benchmark and the engineers set about trying to match the “sweet spot” it occupied in the market with a slightly revised layout.The Barts core in both its XT and Pro forms retains the same 80 SPs along with four texture units, 32KB of Local Data Share and 16KB of L1 texture cache per SIMD as the Cypress series. However, where things have changed lies in the total quantity of SIMDs per core which has shrunk from 20 down to the 14 we now see in Barts. This in effect lowers the maximum possible SPs from 1600 down to 1120 and the number of TMUs from 80 to 56. However, since the render back-ends aren’t touched, the Barts XT has a full 32 ROPs. The memory interface also remains at 256-bit for the GDDR5 which is actually a first for an architecture that is aimed exclusively at the sub-$250 market.From these descriptions and the fact the parts carry the HD 6800 moniker you may assume that Barts is meant to be a direct top to bottom replacement of the HD 5870 and HD 5850, but it isn’t. There will be some overlap, but for the time being the Barts XT -or HD 6870- will take up the HD 5850’s mantle while the HD 5870 will be pushed aside by a completely different beast.Other than the obvious changes to the SIMD layout, there has also been some window dressing going on behind the scenes. The main graphics engine which entails the fixed function stages of AMD’s architecture is for the most part carried over from the HD 5800 series without any significant changes but there is one major addition: an enhanced tessellator.One of the main critiques leveled against Cypress series GPUs was their tendency to choke under heavy tessellation workloads. Through improved thread management in the shader engines as well as enhanced buffering for tessellation draw calls, AMD has been able to manage up to a twofold increase in overall tessellation performance over the HD 5800 cards. We can also see that in an effort to increase rendering efficiency even more, AMD has broken up the Ultra Threaded Dispatch Processor into two with each section having its own instruction and constant cache. This dispatch processor basically acts like a traffic cop, directing draw calls to the SIMD arrays. With each directing its own “half” of the SIMD engine, rendering information can be processed at a much quicker rate without adding to the overall die size of the Barts.To put this into layman’s terms, the Barts architecture is able to remove the tessellation bottleneck which allows more of the rasterizers and SPs to be used more efficiently and as a result DX11 performance in particular has been increased.Basically, the architectural tweaks AMD has made are mainly focused upon improving DX11 and tessellation rendering efficiency but in doing so there have been a number of tertiary benefits as well -such as a large increase in geometry performance.With the introduction of the Barts core, AMD has taken the first step towards what they call the balancing and evolution of the HD 5000 series architecture. At 1.7 billion transistors Barts is actually smaller than Cypress and uses roughly 25% less silicon yet can achieve better performance than the HD 5850. Naturally, most of the performance per area increases have been attained through higher core and memory clocks but the added efficiency brought about by less SIMDs and streamlined communication between the numerous stages in the rendering pipeline have a significant effect as well. AMD hopes that all of these minor changes have the ability to augment performance to better compete with NVIDIA’s current offerings.