Windows 10 brings a slew of features to the table—the return of the Start menu, Cortana, the Xbox App—but the most interesting for gamers is obvious: DirectX 12 (DX12). The promise of a graphics API that allows console-like low-level access to the GPU and CPU, as well as improved performance for existing graphics cards, is tremendously exciting. Yet for all the Windows 10 information to trickle out in the three weeks since the OS launched, DX12 has remained the platform's most mysterious aspect. There's literally been no way to test these touted features and see just what kind of performance uplift (if any) there is. Until now, that is.

Enter Oxide Games' real-time strategy game Ashes of the Singularity, the very first publicly available game that natively uses DirectX 12. Even better, Ashes has a DX11 mode too. For the first time, we can make a direct comparison between the real-world (i.e. actual game) performance of the two APIs across different hardware. While earlier benchmarks like 3DMark's API Overhead feature test were interesting, they were entirely synthetic. Such tests only focused on the maximum number of draw calls per second (which allows a game engine to draw more objects, textures, and effects) achieved by each API.

What's so special about DirectX 12?

DirectX 12 features an entirely new programming model, one that works on a wide range of existing hardware. On the AMD side, that means any GPU featuring GCN 1.0 or higher (cards like the R9 270, R9 290X, and Fury X) are supported, while Nvidia says anything from Fermi (400-series and up) will work. Not every one of those graphics cards will support every feature of DirectX 12 though, because the API is split into different feature levels. These include extra features like Conservative Rasterization, Tiled Resources, Raster Order Views, and Typed UAV Formats.

Some of those features are interesting and very technical (I refer you to this handy glossary if you're interested in exactly what some of them do). But the good news is that the most important features of DirectX 12 are supported across the board. In theory, that means most people should see some sort of performance uplift when moving to DX12. And AMD has been particularly vocal about the performance of its new API, a move that's undoubtedly tied to its poor DX11 performance (particularly on low-end CPUs) compared to Nvidia.

Before a graphics card renders a scene, the CPU first has to send instructions to the GPU. The more complex the scene, the more draw calls need to be sent. Under DX11, Nvidia's driver tended to process those draw calls more efficiently than AMD's, leading to more consistent performance. However, both were held back by DX11. GPUs mostly consist of thousands of small cores (shaders), so they tend to excel at parallel workloads. But DX11 was largely serial in its thinking: it sends one command to the GPU at a time, usually from just one CPU core.

In contrast, DX12 introduces command lists. These bundle together commands needed to execute a particular workload on the GPU. Because each command list is self-contained, the driver can pre-compute all the necessary GPU commands up-front and in a free-threaded manner across any CPU core. The only serial process is the final submission of those command lists to the GPU, which is theoretically a highly efficient process. Once a command list hits the GPU, it can then process all the commands in a parallel manner rather than having to wait for the next command in the chain to come through. Thus, DX12 increases performance.

In the DX11 era, Nvidia was the undisputed king, but this is great news for AMD. The company's GCN architecture has long featured asynchronous compute engines (ACE), which up until now haven't really done it any favours when it comes to performance. Under DX12, those ACEs should finally be put to work, with tasks like physics, lighting, and post-processing being divided into different queues and scheduled independently for processing by the GPU. On the other hand, Nvidia's cards are very much designed for DX11. Anandtech found that any pre-Maxwell GPU from the company (that is, pre-980 Ti, 980, 970, and 960) had to either execute in serial or pre-empt to move tasks ahead of each other. That's not a problem under DX11, but it potentially becomes one under DX12.

There's another big feature in DX12 that's going to be of particular interest to those with an iGPU or APU: Explicit Multiadaptor. With DX12, support for multiple GPUs is baked into the API, allowing separable and contiguous workloads to be executed in parallel on different GPUs, regardless of whether they come from Intel, AMD, or Nvidia. Post-processing in particular stands to gain a lot from Explicit Multiadaptor. By offloading some of the post-processing to a second GPU, the first GPU is free to start on the next frame much sooner.

Tied into to this is Split-Frame Rendering (SFR). Instead of a multiple GPUs rendering an entire frame each, a process known as Alternate Frame Rendering (AFR), each frame is split into tiles for each GPU to render before being transferred to a display. In theory, this should eliminate much of the frame variance that afflicts current multi-GPU CrossFire and SLI setups.

Finally, DX12 will allow for multiple GPUs to pool their memory. If you've got two 4GB graphics cards in your machine, the game will have access to the full 8GB.

The benchmark

Unfortunately, both Explicit Multiadaptor and Split-Frame Rendering aren't currently supported in the Ashes benchmark (trials for each are due to arrive soon). The rest of DX12 is supported thankfully, and the Ashes benchmark does a surprisingly good job of digging out performance data from the nascent API.

The benchmark runs a three-minute real-time demo that executes live game code, complete with AI scripts, audio processing, physics, and more. That means the benchmark isn't exactly the same each time, but the variations are low enough to reliably establish larger performance trends.

One the benchmark has completed, it spits out a huge amount of useful data. At a high level this includes the overall average frame rate as well as a breakdown by the scene categories of normal, medium, and heavy. The normal scene features a low amount of draw calls, around 10,000. The medium scene doubles this to around 20,000, while the heavy scenes pushes things further still. While the overall average frame rate can be useful as a rough indicator of performance, it's the heavier scenes that are of particular interest for testing DirectX 12.

Because there are more draw calls, the CPU has to do more work dishing them out to the GPU, which is a good indicator of CPU performance. In theory, the faster the CPU and the more cores it has, the more draw calls it can send to the GPU. Ashes also adds a "percent GPU bound" score to the results, which shows if it's the GPU or the CPU that's proving to be the bottleneck in a particular scene. The benchmark tracks the GPU to see if it has completed its work before the game sends the next frame to be rendered. If it hasn’t, then the CPU must wait for the GPU, and thus it's the GPU that's the bottleneck. If the percentage is anything below 99 percent, then it's the CPU that's beginning to struggle to keep up with the GPU.

But what if we had an infinitely fast GPU? Ashes provides that data too via the CPU frame rate. It shows the theoretical frame rate of a GPU that could process all of the data the CPU throws at it, if that CPU is doing so quicker than the GPU can handle. This gives a good indication of the performance gulf between, say, a six-core system with hyperthreading and a quad-core system without or a stock clocked CPU compared to one that's been overclocked.

Other useful data includes weighted frame rates, where the benchmark squares the millisecond timing of each frame and then takes the square root at the end, weighting slower frames more than faster frames.

You can go totally nuts with the data from Ashes of the Singularity if you want to. Among other metrics, the benchmark can record individual frame times for the CPU and GPU, the approximate total frame time spent in driver, the amount of commands sent to the GPU for that frame, and if the game was CPU or GPU bound at that point.