One of DirectX 12's many promises is to enable mixed configurations of AMD and Nvidia GPUs to work together through the explicit asynchronous multi-GPU feature. This promise was made a very long while back but it looks like that what Microsoft had promised with DIrectX12 is actually coming to fruition with Oxide's upcoming DX12 enabled Ashes Of The Singularity real-time strategy game.

This essentially means that users should be able to run AMD Radeon graphics cards with Nvidia GeForce graphics cards at the same time in a "Cross-SLI" configuration if you will. This is possible under DX12 because the API allows developers to address each graphics processor in one given system separately and assign tasks to these GPUs independently of one another. So in theory DX12 games, if programmed so, can take advantage of AMD Radeon R7 integrated graphics processors or Intel HD integrated graphics engines in addition whatever discrete GPU or GPUs are installed in the system.

There are two types of Explicit Multi-Adapter enabled through DirectX12. The first is dubbed Linked and the second Unlinked.

Linked mode is only available when there are multiple near-identical GPUs in the system, similar to what Crossfire and SLI setups look like today. It allows for the computing/graphicsc resources and the memory of multiple GPUs to be combined into one larger addressable unit.. So if let's say we have two R9 390 cards in the system, under this mode both GPUs can be combined to form a graphics engine with twice the memory and twice as many GCN shader units. Essentially creating a 16GB, 5160 GCN core behemoth.

This is because instead of tasking each GPU with rendering a whole frame on its own, each GPU would actually render different parts of the same frame using Split Frame Rendering, SFR for short. Or even render different stages of the frame and pass the rest onto a different GPU. Which in turn negates the need of having to mirror the resources in the two separate memory pools of the GPUs. This is the limitation behind the traditional alternate frame rendering - AFR - technique used today. Which prevents developers from addressing the two separate memory pools as a larger common pool of graphics memory.

Unliked mode is designed to take advantage of completely different GPUs in the system, even ones from different vendors. The mode enables this through a thing abstraction layer allowing the different GPUs to swap data back and forth while giving developers total control. This mode enables developers to make use of discrete GPUs alongisde integrated GPUs or other discrete GPUs. With each GPU being treated as a completely independent graphics engine that the developers can choose to utilize for whatever they want.

DirectX 12 Enables Cross-SLI Between AMD And Nvidia GPUs

I know what you’re thinking, all of this sounds fantastic and wonderful but does it work? Now that’s the one million dollar question, but before we get into the tests and benchmarks we should familiarize ourselves with some of the real-world technical and non-technical hurdles that stand in the way.

Let’s take a step back and examine Mantle, AMD’s own low level API. Mantle actually supports explicit asynchronous multi-GPU control. In fact this feature was used in Civilization : Beyond Earth, which is one of the Mantle's last titles. The developers of Firaxis used explicit asynchronous multi-GPU to implement split frame rendering (SFR) for CrossfireX support under Mantle. While the DX11 implementation of CrossfireX used the standard alternate frame rendering (AFR) technique explained earlier in the article.



While explicit control of multi-GPU configurations is very beneficial it also requires a fairly significant amount of effort from the developer to not only ensure that multi-GPU configurations work as intended but also that there’s a sufficient performance advantage over regular old AFR. So what it essentially does is take some of the responsibility away from the hardware vendor and the driver team and put it in the hands of the game developer to ensure that this feature is leveraged and implemented well.

A good SFR implementation requires a considerable amount of skill and talent but if done right can yield noticeable improvements over AFR, especially in reducing input-latency. It’s a different type of technical challenge for each game to implement SFR. Some game engines and game genres are more suitable for it while others aren’t. For example in a turn based strategy game input-latency wouldn’t be as important as it is in a fast paced FPS game.

Apart from the technical challenge of splitting each frame and assigning the parts to different graphics processors in the system, which can also be challenge. Especially if the different graphics processors in the system have different performance characteristics ( high-end discrete GPU vs an integrated GPU ). There’s also the challenge of dealing with different GPU architectures that support different sets of features. Here the developers could to choose to dive into the nitty gritty details of assigning different elements of each part of the rendering to the most suitable GPU. Or simply program for the biggest common denominator of the different DX12 GPU architectures made by the different hardware vendors. Interestingly, we're told that the code base to achieve this in DX12 is actually fairly clean and isn't as much of a hassle as one would first imagine.

This finally brings us to how this technology holds up today. Ryan Smith from Anandtech.com did a really amazing job of testing different configurations at different settings and resolutions in his full write-up about the technology, which you should definitely go and check out.





Credit : Anandtech.com

Interestingly, the best results are achieved when an AMD GPU is used as the main adapter, while an Nvidia GPU is used as the secondary adapter. This yields the highest average framerates consistently, even compared to multi-GPU configurations that contain two similar AMD GPUs or two similar Nvidia GPUs.

Also of note is that in the case of the GTX 680 and HD 7970. The combination actually results in a drastic performance loss when the GTX 680 is used as the main adapter. And a considerable performance improvement when the HD 7970 is used as the main adapter.

The trend then it seems to be then that AMD GPUs are best utilized as the primary or "lead" adapters as Ryan put it and Nvidia GPUs as the secondary adapters. This yielded the best results in terms of framerates and frametimes. This peculiar pattern emerged consistently throughout all the tests but is not yet well understood.

This is the first ever testable demonstration of DirectX 12's EMA technology, and it's shaping up far better than one would have reasonably expected.