One of the more exciting features built into Windows 10 is DirectX 12, a new programming interface that promises to modernize the way games talk to graphics chips.

Prior versions of DirectX—and specifically its graphics-focused component, known as Direct3D—are used by the vast majority of today’s PC games, but they’re not necessarily a good fit for how modern GPUs really work. These older APIs tend to impose more overhead than necessary on the graphics driver and CPU, and they’re not always terribly effective at keeping the GPU fed with work. Both of these problems tend to sap performance. Thus, DirectX has often been cited as the culprit when console games make a poor transition to the PC platform in spite of the PC’s massive advantage in raw power.

Although, honestly, you can’t blame an API for something like the Arkham Knight mess. Console ports have other sorts of problems, too.

Anyhow, by offering game developers more direct, lower-level access to the graphics processor, DirectX 12 promises to unlock new levels of performance in PC gaming. This new API also exposes a number of novel hardware features not accessible in older versions of Direct3D, opening up the possibility of new techniques that provide richer visuals than previously feasible in real-time rendering.

So yeah, there’s plenty to be excited about.

DirectX 12 is Microsoft’s baby, and it’s not just a PC standard. Developers will also use it on the Xbox One, giving them a unified means of addressing two major gaming platforms at once.

That’s why there’s perhaps no better showcase for DX12 than Fable Legends, the upcoming game from Lionhead Studios. Game genres have gotten wonderfully and joyously scrambled in recent years, but I think I’d describe Legends as a free-to-play online RPG with MOBA and FPS elements. Stick that in yer pipe and smoke it. Legends will be exclusive to the Xbox One and Windows 10, and it will take advantage of DX12 on the PC as long as a DirectX 12-capable graphics card is present.

In order to demonstrate the potential of DX12, Microsoft has cooked up a benchmark based on a pre-release version of Fable Legends. We’ve taken it for a spin on a small armada of the latest graphics cards, and we have some interesting results to share.

This Fable Legends benchmark looks absolutely gorgeous, thanks in part to the DirectX 12 API and the Unreal 4 game engine. The artwork is stylized in a not-exactly-photorealistic fashion, but the demo features a tremendously complex set of environments. The video above utterly fails to do it justice, thanks both to YouTube’s compression and a dreaded 30-FPS cap on my video capture tool. The animation looks much smoother coming directly from a decent GPU.

To my eye, the Legends benchmark represents a new high-water mark in PC game visuals for this reason: a near-complete absence of the shimmer, crawling, and sparkle caused by high-frequency noise—both on object edges and inside of objects. (Again, you’d probably have to see it in person to appreciate it.) This sheer solidity makes Legends feel more like an offline-rendered scene than a real-time PC game. As I understand it, much of the credit for this effect belongs to the temporal anti-aliasing built into Unreal Engine 4. This AA method evidently offers quality similar to full-on supersampling with less of a performance hit. Here’s hoping more games make use of it in the future.

DX12 is a relatively new creation, and Fable Legends has clearly been in development for quite some time. The final game will work with DirectX 11 as well as DX12, and it was almost surely developed with the older API and its requirements in mind. The question, then, is: how exactly does Legends take advantage of DirectX 12? Here’s Microsoft’s statement on the matter.

Lionhead Studios has made several additions to the engine to implement advanced visual effects, and has made use of several new DirectX 12 features, such as Async Compute, manual Resource Barrier tracking, and explicit memory management to help the game achieve the best possible performance.

That’s not a huge number of features to use, given everything DX12 offers. Still, the memory management and resource tracking capabilities get at the heart of what this lower-level API is supposed to offer. The game gets to manage video memory itself, rather than relying on the GPU driver to shuffle resources around.

Asynchronous compute shaders, meanwhile, have been getting a lot of play in certain pockets of the ‘net since the first DX12 benchmark, built around Oxide Games’ Ashes of the Singularity, was released. This feature allows the GPU to execute multiple kernels (or basic programs) of different types simultaneously, and it could enable more complex effects to be created and included in each frame.

Early tests have shown that the scheduling hardware in AMD’s graphics chips tends to handle async compute much more gracefully than Nvidia’s chips do. That may be an advantage AMD carries over into the DX12 generation of games. However, Nvidia says its Maxwell chips can support async compute in hardware—it’s just not enabled yet. We’ll have to see how well async compute works on newer GeForces once Nvidia turns on its hardware support.

For now, well, I suppose we’re about to see how the latest graphics cards handle Fable Legends. Let’s take a look.

Our testing methods

The graphics cards we used for testing are listed below. Please note that many of them are not stock-clocked reference cards but actual consumer products with faster clock speeds. For example, the GeForce GTX 980 Ti we tested is the Asus Strix model that won our recent roundup. Similarly, the Radeon R9 Fury and 390X cards are also Asus Strix cards with tweaked clock frequencies. We prefer to test with consumer products when possible rather than reference parts, since those are what folks are more likely to buy and use.

As ever, we did our best to deliver clean benchmark numbers. Our test systems were configured like so:

Processor Core i7-5960X Motherboard Gigabyte

X99-UD5 WiFi Chipset Intel X99 Memory size 16GB (4 DIMMs) Memory type Corsair

Vengeance LPX

DDR4 SDRAM at 2133 MT/s Memory timings 15-15-15-36

2T Hard drive Kingston

SSDNow 310 960GB SATA Power supply Corsair

AX850 OS Windows

10 Pro

Driver

revision GPU

base core clock (MHz) GPU

boost clock (MHz) Memory clock (MHz) Memory size (MB) Sapphire

Nitro R7 370 Catalyst 15.201

beta – 985 1400 4096 MSI

Radeon R9 285 Catalyst 15.201

beta – 973 1375 2048 XFX

Radeon R9 390 Catalyst 15.201

beta – 1015 1500 4096 Asus

Strix R9 390X Catalyst 15.201

150922a – 1070 1500 8192 Radeon

R9 Nano Catalyst 15.201

150922a – 1000 500 4096 Asus

Strix R9 Fury Catalyst 15.201

150922a – 1000 500 4096 Radeon

R9 Fury X Catalyst 15.201

150922a – 1050 500 4096 Gigabyte

GTX 950 GeForce

355.82 1203 1405 1750 2048 MSI

GeForce GTX 960 GeForce

355.82 1216 1279 1753 2048 MSI

GeForce GTX 970 GeForce

355.82 1114 1253 1753 4096 Gigabyte

GTX 980 GeForce

355.82 1228 1329 1753 4096 Asus

Strix GTX 980 Ti GeForce

355.82 1216 1317 1800 6144

Thanks to Intel, Corsair, Kingston, and Gigabyte for helping to outfit our test rigs with some of the finest hardware available. AMD, Nvidia, and the makers of the various products supplied the graphics cards for testing, as well.

Unless otherwise specified, image quality settings for the graphics cards were left at the control panel defaults. Vertical refresh sync (vsync) was disabled for all tests.

The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Fable Legends performance at 1920×1080

The Legends benchmark is simple enough to use. You can run a test with one of three pre-baked options. The first option uses the game’s “ultra” quality settings at 1080p. The second uses “ultra” at 3840×2160. The third choice is meant for integrated graphics solutions; it drops down to the “low” quality settings at 1280×720.

The demo spits out tons of data in a big CSV file, and blessedly, the time to render each frame is included. Naturally, I’ve run the test on a bunch of cards and have provided the frame time data below. You can click through the buttons to see a plot taken from one of the three test instances we ran for each card. We’ll start with the ultra-quality results at 1920×1080.





Browse through all of the plots above and you’ll notice something unusual: all of the cards produce the same number of frames, regardless of how fast or slow they are. That’s not what you’d generally get out of a game, but the Legends benchmark works like an old Quake timedemo. It produces the same set of frames on each card, and the run time varies by performance. That means the benchmark is pretty much completely deterministic, which is nice.

The next thing you’ll notice is that some of the cards have quite a few more big frame-time spikes than others. The worst offenders are the GeForce GTX 950 and 960 and the Radeon R9 285. All three of those cards have something in common: only 2GB of video memory onboard. Although by most measures the Radeon R7 370 has the slowest GPU in this test, its 4GB of memory allows it to avoid some of those spikes.

The GeForce GTX 980 Ti is far and away the fastest card here in terms of FPS averages. The 980 Ti’s lead is a little larger we’ve seen in the past, probably due to the fact that we’re testing with an Asus Strix card that’s quite a bit faster than the reference design. We reviewed a bunch of 980 Ti cards here, and the Strix was our top pick.

The 980 Ti comes back to the pack a little with our 99th-percentile frame time metric, which can be something of an equalizer. The GTX 980 is fast generally, but it does struggle with a portion of the frames it renders, like all of the cards do.

The frame time curves illustrate what happens with the most difficult frames to render.





All of the highest-end Radeons and GeForces look pretty strong here. Each of them struggle slightly with the most demanding one to two percent of frames, but the tail of each curve barely rises above 33 milliseconds—which translates to 30 FPS. Not bad.





These “time spent beyond X” graphs are meant to show “badness,” those instances where animation may be less than fluid—or at least less than perfect. The 50-ms threshold is the most notable one, since it corresponds to a 20-FPS average. We figure if you’re not rendering any faster than 20 FPS, even for a moment, then the user is likely to perceive a slowdown. 33 ms correlates to 30 FPS or a 30Hz refresh rate. Go beyond that with vsync on, and you’re into the bad voodoo of quantization slowdowns. 16.7 ms correlates to 60 FPS, that golden mark that we’d like to achieve (or surpass) for each and every frame, and 8.3 ms is a relatively new addition that equates to 120Hz, for those with fast gaming displays.

As you can see, only the four slowest cards here spend any time beyond the 50-ms threshold, which means the rest of the GPUs are doing a pretty good job at pumping out some prime-quality eye candy without many slowdowns. Click to the 33-ms threshold, and you’ll see a similar picture, too. Unfortunately, a perfect 60 FPS is elusive for even the top GPUs, as the 16.7-ms results illustrate.

Now that we have all of the data before us, I have a couple of impressions to offer. First, although the GeForce cards look solid generally, the Hawaii-based Radeons from AMD perform especially well here. The R9 390X outdoes the pricier GeForce GTX 980, and the Radeon R9 390 beats out the GTX 970.

There is a big caveat to remember, though. In power consumption tests, our GPU test rig pulled 449W at the wall socket when equipped with an R9 390X, versus 282W with a GTX 980. The delta between the R9 390 and GTX 970 was similar, at 121W.

That said, the R9 390 and 390X look pretty darned good next to R9 Fury and Fury X, too. The two Fury cards are only marginally quicker than their Hawaii-based siblings. Perhaps the picture will change at a higher resolution?

Fable Legends performance at 3840×2160

I’ve left the Radeon R7 370 and GeForce GTX 950 out of my 4K tests, and I’ve snuck in another contender. I probably should have left out the GeForce GTX 960 and Radeon R9 285, which have no business attempting this feat in 4K.













We’ve sliced and diced these frame-time distributions in multiple ways, but the story these results tell is the same throughout: the GeForce GTX 980 Ti is easily the best performer here, and only it is fast enough to achieve nearly a steady 30 frames per second. The 980 Ti’s 99th-percentile result is 33.8 ms, just a tick above the 33.3-ms threshold that equates to 30 FPS.

The Fury X isn’t too far behind, and it leads a pack of Radeons that all perform pretty similarly. Once again, there’s barely any daylight between the Fury and the 390X. The Fiji GPU used in the Fury and Fury X is substantially faster than the Hawaii GPU driving the 390 and 390X in terms of texturing and shader processing power, but its really no faster in terms of geometry throughput and pixel-pushing power via the ROP units. One or both of those two constraints could be coming into play here.

CPU core and thread scaling

I’m afraid I haven’t had time to pit the various integrated graphics solutions against one another in this Fable Legends test, but I was able to take a quick look at how the two fastest graphics chips scale up when paired with different CPU configs. Since the new graphics APIs like DirectX 12 are largely about reducing CPU overhead, that seemed like the thing to do.

For this little science project, I used the fancy firmware on the Gigabyte X99 boards in my test rigs to enable different numbers of CPU cores on their Core i7-5960X processors. I also selectively disabled Hyper-Threading. The end result was a series of tests ranging from a single-core CPU config with a single thread (1C/1T) through to the full-on 5960X with eight cores and 16 threads (8C/16T).

Interesting. The sweet spot with the Radeon looks to be the four-core, four-thread config, while the GeForce prefers the 6C/6T config. Perhaps Nvidia’s drivers use more threads internally. The performance with both cards suffers a little with eight cores enabled, and it drops even more when Hyper-Threading is turned on.

Why? Part of the answer is probably pretty straightforward: this application doesn’t appear to make very good use of more than four to six threads. Given that fact, the 5960X probably benefits from the power savings of having additional cores gated off. If turning off those cores saves power, then the CPU can probably spend more time running at higher clock speeds via Turbo Boost as a result.

I’m not sure what to make of the slowdown with Hyper-Threading enabled. Simultaneous multi-threading on a CPU core does require some resource sharing, which can dampen per-thread performance. However, if the operating system scheduler is doing its job well, then multiple threads should only be scheduled on a CPU core when other cores are already occupied—at least, I expect that’s how it should work on a desktop CPU. Hmmm.

The curves flatten out a bit when we raise the resolution and image quality settings because GPU speed constraints come into play, but the trends don’t change much. In this case, the Fury X doesn’t benefit from more than two CPU cores.

Perhaps we can examine CPU scaling with a lower-end CPU at some point.

So now what?

We’ve now taken a look at one more piece of the DirectX 12 puzzle, and frankly, the performance results don’t look a ton different than what we’ve seen in current games.

The GeForce cards perform well generally, in spite of this game’s apparent use of asynchronous compute shaders. Cards based on AMD’s Hawaii chips look relatively strong here, too, and they kind of embarrass the Fiji-based R9 Fury offerings by getting a little too close for comfort, even in 4K. One would hope for a stronger showing from the Fury and Fury X in this case.

But, you know, it’s just one benchmark based on an unreleased game, so it’s nothing to get too worked up about one way or another. I do wish we could have tested DX12 versus DX11, but the application Microsoft provided only works in DX12. We’ll have to grab a copy of Fable Legends once the game is ready for public consumption and try some side-by-side comparisons.

Enjoy our work? Pay what you want to subscribe and support us.