Oxide Games have pinpointed Asynchronous Shaders as one of the main reasons AMD hardware showed significant gains vs Nvidia in DX12. More specifically in the recently launched DX12 benchmark for the developer's real time strategy title Ashes of The Singularity, which is set for release next year. The benchmark however, which Oxide Games is adamant is accurately representative of the game's performance, has been available to download for free since earlier this month. We have already ran this test on a variety of graphics cards from both Nvidia and AMD and published our results in an article earlier this month.



What we ,and other publications, had found was that AMD GPUs consistently showed significantly greater performance gains than their Nvidia counterparts and in many instances the AMD cards matched or outperformed more expensive Nvidia offerings. On the Nvidia side the results were fairly inconsistent to say the least, where in some instances we, and other publications, registered a performance loss with Nvidia hardware running the DX12 version of the benchmark compared to DX11. We learned later-on that this was down to a hardware feature called Asynchronous Shaders/Compute.

DirectX 12 Asynchronous Compute : What It Is And Why It Matters

Asynchronous Shaders/Compute or what’s otherwise known as Asynchronous Shading is one of the more exciting hardware features that DirectX12, Vulkan and Mantle before them exposed. This feature allows tasks to be submitted and processed by shader units inside GPUs ( what Nvidia calls CUDA cores and AMD dubs Stream Processors ) simultaneous and asynchronously in a multi-threaded fashion.

One would’ve thought that with multiple thousands of shader units inside modern GPUs that proper multi-threading support would have already existed in DX11. In fact one would argue that comprehensive multi-threading is crucial to maximize performance and minimize latency. But the truth is that DX11 only supports basic multi-threading methods that can’t fully take advantage of the thousands of shader units inside modern GPUs. This meant that GPUs could never reach their full potential, until now.



Multithreaded graphics in DX11 does not allow for multiple tasks to be scheduled simultaneously without adding considerable complexity to the design. This meant that a great number of GPU resources would spend their time idling with no task to process because the command stream simply can’t keep up. This in turn meant that GPUs could never be fully utilized, leaving a deep well of untapped performance and potential that programmers could not reach.

Other complementary technologies attempted to improve the situation by enabling prioritization of important tasks over others. Graphics pre-emption allowed for prioritizing tasks but just like multi-threaded graphics in DX11 it did not solve the fundamental problem. As it could not enable multiple tasks to be handled and submitted simultaneously independently of one another. A crude analogy would be that what graphics pre-emption does is merely add a traffic light to the road rather than add an additional lane.

Out of this problem a solution was born, one that’s very effective and readily available to programmers with DX12, Vulkan and Mantle. It’s called Asynchronous Shaders and just as we’ve explained above it enables a genuine multi-threaded approach to graphics. It allows for tasks to be simultaneously processed independently of one another. So that each one of the multiple thousand shader units inside a modern GPU can be put to as much use as possible to improve performance.

However to enable this feature the GPU must be built from the ground up to support it. In AMD’s Graphics Core Next based GPUs this feature is enabled through the Asynchronous Compute Engines integrated into each GPU. These are structures which are built directly into the GPU itself. And they serve as the multi-lane highway by which tasks are delivered to the stream processors.

Each ACE is capable of handling eight queues and every GCN based GPU has a minimum of two ACEs. More modern chips such as the R9 285 and R9 290/290X have eight ACEs. ACEs debuted with AMD’s first GCN based GPU code named Tahiti in late 2011. They were originally added to GPUs mainly to handle compute tasks because they could not be leveraged with graphics APIs of the time. Today however ACEs take on a more important role in graphics processing in addition to compute.

Asynchronous Shaders Can Provide A 46% Performance Uplift on AMD Hardware With DX12

To showcase the performance advantage that this feature can bring to the table AMD demoed it via a Liuqid VR sample five months ago. The demo ran at 245 FPS with Asynchronous Shaders off and post-processing disabled. However after post-processing was enabled the performance dropped to 158 FPS. Finally when Asynchronous Shaders and post-processing were both enabled, the average FPS went up to 230 FPS, approximately a 46% performance uplift. While this is likely a best case scenario improvement, it isn't too far off the 30% performance boost mark that Oxide Games mentioned other devs achieving with this feature on the consoles.



This isn’t all just a theoretical exercise either, there are a number of games which have already been released with Asynchronous Shaders implemented. These games include Battlefield 4, Infamous Second Son and The Tomorrow Children on the PS4 and Thief when running under Mantle on the PC. Ashes Of The singularity will obviously be joining that list soon as well. AMD always likes to point out that the consoles and the PC share the same GCN graphics architecture. So whatever is achieved on one platform the company can be taken to the other.



Naturally the mentioned demo only showcases the potential performance improvement that can be attained with Asynchronous Shaders and low level APIs such as Mantle, Vulkan and DX12.