At the Game Developers Conference this year, Microsoft pulled back the curtain on Direct3D 12, the first major update to its graphics APIs since 2009. The company announced some pretty big changes, including support for a lower level of abstraction and compatibility with not just Windows, but also Windows Phone and the Xbox One. This will be the first version of Direct3D to unify graphics programming across all of Microsoft’s gaming platforms. It may also be the first version of Direct3D to eke significant performance gains out of current hardware.

I already covered some of those developments in a couple of news posts during the GDC frenzy. Now that I’m back home with all my notes from various sessions and meetings with Microsoft and GPU vendors, I can go into a little more detail. In this article, I’ll try to explore Direct3D 12’s inception, the key ways in which it differs from Direct3D 11, and what AMD and Nvidia think about it.

First, though, let’s straighten something out. We and others have occasionally referred to Microsoft’s new API as DirectX 12, but what premiered at GDC was technically Direct3D 12, the graphics component of DirectX 12. As Microsoft’s Matt Sandy wrote on the official DirectX blog, DirectX 12 will also encompass “other technologies” that “may be previewed at a later date,” including “cutting-edge tools for developers.” That’s probably why we haven’t heard anything about, say, the next version of DirectCompute. The news so far has centered solely on Direct3D.

Now that that’s all cleared up, let’s take a closer look at Microsoft’s new graphics API—starting with a little history.

Direct3D 12’s inception

Given the way Microsoft has presented Direct3D 12, it’s hard not to draw parallels with AMD’s Mantle API. Mantle was introduced last September, and much like D3D12, it provides a lower level of abstraction that lets developers write code closer to the metal. The result, at least in theory, is lower CPU overhead and better overall performance—the same perks Microsoft promises for D3D12.

The question, then, almost asks itself. Did AMD’s work on Mantle motivate Microsoft to introduce a lower-level graphics API?

When I spoke to AMD people a few hours after the D3D12 reveal, I got a strong sense that that wasn’t the case—and that it was developers, not AMD, who had spearheaded the push for a lower-level graphics API on Windows. Indeed, at the keynote, Microsoft’s Development Manager for Graphics, Anuj Gosalia, made no mention of Mantle. He stated that “engineers at Microsoft and GPU manufacturers have been working at this for some time,” and he added that D3D12 was “designed closely with game developers.”

I then talked with Ritche Corpus, AMD’s Software Alliances and Developer Relations Director. Corpus told me that AMD shared its work on Mantle with Microsoft “from day one” and that parts of Direct3D 12 are “very similar” to AMD’s API. I asked if D3D12’s development had begun before Mantle’s. Corpus’ answer: “Not that we know.” Corpus explained that, when AMD was developing Mantle, it received no feedback from game developers that would suggest AMD was wasting its time because a similar project was underway at Microsoft. I recalled that, at AMD’s APU13 event in November 2013, EA DICE’s Johan Andersson expressed a desire to use Mantle “everywhere and on everything.” Those are perhaps not the words I would have used if I had known D3D12 was right around the corner.

The day after the D3D12 keynote, I got on the phone with Tony Tamasi, Nvidia’s Senior VP of Content and Technology. Tamasi painted a rather different picture than Corpus. He told me D3D12 had been in in the works for “more than three years” (longer than Mantle) and that “everyone” had been involved in its development. As he pointed out, people from AMD, Nvidia, Intel, and even Qualcomm stood on stage at the D3D12 reveal keynote. Those four companies’ logos are also featured prominently on the current landing page for the official DirectX blog:

Tamasi went on to note that, since development cycles for new GPUs span “many years,” there was “no possible way” Microsoft could have slapped together a new API within six months of Mantle’s public debut.

Seen from that angle, it does seem quite far-fetched that Microsoft could have sprung a new graphics API on a major GPU vendor without giving them years to prepare—or, for that matter, requesting their input throughout the development process. AMD is hardly a bit player in the GPU market, and its silicon powers Microsoft’s own Xbox One console, which will be one of the platforms supporting D3D12 next year. I’m not sure what Microsoft would stand to gain by keeping AMD out of the loop.

I think it’s entirely possible AMD has known about D3D12 from the beginning, that it pushed ahead with Mantle anyhow in order to secure a temporary advantage over the competition, and that it’s now seeking to embellish its part in D3D12’s creation. It’s equally possible AMD was entirely forthright with us, and that Nvidia is simply trying to downplay the extent of its competitor’s influence.

In any event, as we’re about to see, D3D12 indeed shares some notable similarities with Mantle. More importantly, it delivers something developers seem to have wanted for some time: a multi-vendor Windows graphics API that offers a console-like level of abstraction. Whatever part AMD played, it seems developers and gamers alike stand to benefit.

Direct3D 12 on existing hardware

Direct3D 12’s lower abstraction level takes the form of a new programming model, and that programming model will be supported on a broad swath of current hardware. AMD has pledged support for all of its current offerings based on the Graphics Core Next architecture, while Nvidia did the same for all of its DirectX 11-class chips (spanning the Fermi, Kepler, and Maxwell architectures). Intel, meanwhile, pledged support for the integrated graphics in its existing Haswell processors (a.k.a. 4th-generation Core).

Beyond the PC, Direct3D 12’s new programming model will also be exploitable on the Xbox One console and on Windows Phone handsets. Microsoft hasn’t yet said which versions of Windows on the desktop will support Direct3D 12, but it dropped some hints. During the Q&A following the reveal keynote, Microsoft’s Gosalia ruled out Windows XP support, but he declined to give a categorical answer about Windows 7.

Sandy’s blog post identified four key changes that D3D12 makes to the Direct3D programming model: pipeline state objects, command lists, bundles, and descriptor heaps and tables. These are all about lowering the abstraction level and giving developers better control over the hardware. Those of you well-acquainted with Mantle may find that some of those constructs have a familiar ring to them. That familiarity may be partly due to AMD’s role (whether direct or indirect) in Direct3D 12’s development, but I suspect it’s explainable to a large degree by the fact that both D3D12 and Mantle are low-level graphics APIs closely tailored to the behavior of modern GPUs.

For instance, Mantle’s monolithic pipelines roll the graphics pipeline into a single object. Direct3D 12 groups the graphics pipeline into “pipeline state objects,” or PSOs. Those PSOs work like such, according to Sandy:

Direct3D 12 . . . [unifies] much of the pipeline state into immutable pipeline state objects (PSOs), which are finalized on creation. This allows hardware and drivers to immediately convert the PSO into whatever hardware native instructions and state are required to execute GPU work. Which PSO is in use can still be changed dynamically, but to do so the hardware only needs to copy the minimal amount of pre-computed state directly to the hardware registers, rather than computing the hardware state on the fly. This means significantly reduced draw call overhead, and many more draw calls per frame.

Gosalia says PSOs “wrap very efficiently to actual GPU hardware.” That’s in contrast to Direct3D 11’s higher-level representation of the graphics pipeline, which induces higher overhead. “For example,” Sandy explains, “many GPUs combine pixel shader and output merger state into a single hardware representation, but because the Direct3D 11 API allows these to be set separately, the driver cannot resolve things until it knows the state is finalized, which isn’t until draw time.” D3D11’s approach increases overhead and limits the number of draw calls that can be issued per frame.

D3D12 also replaces D3D11’s context-based execution model with something called command lists, which sound pretty comparable to Mantle’s command buffers. Here’s Sandy’s explanation again:

Direct3D 12 introduces a new model for work submission based on command lists that contain the entirety of information needed to execute a particular workload on the GPU. Each new command list contains information such as which PSO to use, what texture and buffer resources are needed, and the arguments to all draw calls. Because each command list is self-contained and inherits no state, the driver can pre-compute all necessary GPU commands up-front and in a free-threaded manner. The only serial process necessary is the final submission of command lists to the GPU via the command queue, which is a highly efficient process.

D3D12 takes things a step further with a construct called bundles, which lets developers re-use commands in order to further reduce driver overhead:

In addition to command lists, Direct3D 12 also introduces a second level of work pre-computation, bundles. Unlike command lists which are completely self-contained and typically constructed, submitted once, and discarded, bundles provide a form of state inheritance which permits reuse. For example, if a game wants to draw two character models with different textures, one approach is to record a command list with two sets of identical draw calls. But another approach is to “record” one bundle that draws a single character model, then “play back” the bundle twice on the command list using different resources. In the latter case, the driver only has to compute the appropriate instructions once, and creating the command list essentially amounts to two low-cost function calls.

Thanks to all of this shader and pipeline state caching, Gosalia says there should be “no more compiles in the middle of gameplay.” Draw-time shader compilation can cause hitches (or frame latency spikes) during gameplay—and developers bemoaned it at AMD’s APU13 event last year. Dan Baker of Oxide Games says that, in D3D12, we “shouldn’t have frame hitches caused by driver at all.”

Both Mantle and D3D12 introduce new ways to bind resources to the graphics pipeline, as well. D3D12’s model involves descriptor heaps, which don’t sound all that dissimilar to Mantle’s descriptor sets. Sandy explains:

Instead of requiring standalone resource views and explicit mapping to slots, Direct3D 12 provides a descriptor heap into which games create their various resource views. This provides a mechanism for the GPU to directly write the hardware-native resource description (descriptor) to memory up-front. To declare which resources are to be used by the pipeline for a particular draw call, games specify one or more descriptor tables which represent sub-ranges of the full descriptor heap. As the descriptor heap has already been populated with the appropriate hardware-specific descriptor data, changing descriptor tables is an extremely low-cost operation.

In addition to the improved performance offered by descriptor heaps and tables, Direct3D 12 also allows resources to be dynamically indexed in shaders, providing unprecedented flexibility and unlocking new rendering techniques. As an example, modern deferred rendering engines typically encode a material or object identifier of some kind to the intermediate g-buffer. In Direct3D 11, these engines must be careful to avoid using too many materials, as including too many in one g-buffer can significantly slow down the final render pass. With dynamically indexable resources, a scene with a thousand materials can be finalized just as quickly as one with only ten.

According to Sandy, descriptor heaps “match modern hardware and significantly improve performance.” The D3D11 approach is “highly abstracted and convenient,” he says, but it requires games to issue additional draw calls when resources need to be changed, which leads to higher overhead.

According to Yuri Shtil, Senior Infrastructure Architect at Nvidia, the introduction of descriptor heaps transfers the responsibility of managing resources in memory from the driver to the application. In other words, it’s up to the developer to manage memory. This arrangement is again reminiscent of Mantle. AMD hailed Mantle’s manual memory allocation as a major improvement and as a means to make more efficient use of GPU memory.

Now, of course, lower-level abstraction of that sort can be a double-edged sword. Because developers have a greater level of control over what happens on the hardware, the driver and API are able to do less work—but this also leads to more opportunities for things to go wrong. Here’s an example from Nvidia’s Tamasi:

Think about memory management, for example. The way DirectX 11 works is, if you want to allocate a texture, before you can use it, the driver basically pre-validates that that memory is resident on the GPU. So, there’s work going on in the driver and on the CPU to validate that that memory is resident. In a world where the developer controls memory allocation, they will already know whether they’ve allocated or de-allocated that memory. There’s no check that has to happen. Now, if the developer screws up and tries to render from a texture that isn’t resident, it’s gonna break, right? But because they have control of that, there’s no validation step that will need to take place in the driver, and so you save that CPU work.

Developers who would rather not deal with such risks won’t have to. According to Max McMullen, Microsoft’s Development Lead for Windows Graphics, D3D12 will give developers the option to use the more abstracted programming model from D3D11. “Every single algorithm that you can build on 11 right now, you can build on 12,” he said.





But getting one’s hands dirty with the lower-level programming model should pay some very real dividends. One of the demos shown at GDC was a custom, D3D12 version of Futuremark’s 3DMark running on a quad-core Intel processor. The D3D12 demo used 50% less CPU time than the D3D11 version, and instead of dumping most of the workload on one CPU core, it spread the load fairly evenly across all four cores. The screenshots above show the differences in CPU utilization at the top left.

Oxide’s Baker mentioned other potential upsides to D3D12, including a “vast reduction in driver complexity” and “generally more responsive games . . . even at a lower frame rate.” D3D12 may not just extract additional performance and rendering complexity out of today’s hardware. It may also make games feel better in subtle but important ways. Also, if what Baker said about driver robustness checks out, PC gamers may waste less time waiting on game-specific driver fixes and optimizations from GPU manufacturers.

Direct3D 12 on future hardware

Developers will be able to exploit D3D12’s new programming model on a wide range of existing graphics processors. In addition to that programming model, however, D3D12 will introduce some new rendering features that will require new GPUs. Microsoft teased a couple of those rendering features at GDC:

I’m not entirely clear on what the new blend modes are supposed to do, but as I understand it, conservative rasterization will help with object culling (that is, hiding geometry that shouldn’t be seen, such as objects behind a wall) as well as hit detection.

Nvidia’s Tamasi told us D3D12 includes a “whole bunch more” new rendering features beyond those Microsoft has already discussed. I expect we’ll hear more about them when Microsoft delivers the preview release of the new API to developers, which is scheduled to happen later this year.

Which next-gen GPUs will support those new features? We don’t know yet. Since the first D3D12 titles are expected in the 2015 holiday season, I would be surprised if Nvidia and AMD didn’t have new hardware with complete D3D12 support ready by then. Then again, neither AMD nor Nvidia have announced anything of the sort yet. We’ll have to wait and see what those companies have to say when Microsoft reveals D3D12’s full array of new rendering features.

AMD’s take and the future of Mantle

From the day Microsoft announced DirectX 12, AMD has made it clear that it’s fully behind the new API. Its message is simple: Direct3D 12 “supports and celebrates” the push toward lower-level abstraction that AMD began with Mantle last year—but D3D12 won’t be ready right away, and in the meantime, developers can use Mantle in order to get some of the same gains out of AMD hardware.

At GDC, AMD’s Corpus elaborated a little bit on that message. He told me Direct3D 12’s arrival won’t spell the end of Mantle. D3D12 doesn’t get quite as close to the metal of AMD’s Graphics Core Next GPUs as Mantle does, he claimed, and Mantle “will do some things faster.” Mantle may also be quicker to take advantage of new hardware, since AMD will be able to update the API independently without waiting on Microsoft to release a new version of Direct3D. Finally, AMD is talking to developers about bringing Mantle to Linux, where it would have no competition from Microsoft.

Corpus was adamant that developers will see value in adopting Mantle even today, with D3D12 on the horizon and no explicit support for Linux or future AMD GPUs. Because the API is similar to D3D12, it will give developers a “big head start,” he said, and we may see D3D12 launch titles “very early” as a result.

Naturally, AMD can motivate developers in other ways, too. While Corpus didn’t address that side of the equation, VG247 reported last year that Battlefield 4‘s inclusion in the Gaming Evolved program—and its support for Mantle—involved a $5-8 million payment from AMD. That figure was never confirmed officially, but it’s no secret AMD’s and Nvidia’s developer relations and co-marketing programs often involve financial incentives. Supporting Mantle may be a financially lucrative proposition for some game studios.

Nvidia’s take

Nvidia seems to see lower-level graphics APIs as less of a panacea than AMD does. Tamasi told us that, while such APIs are “great,” they’re “not the only answer” because they’re “not necessarily great for everyone.” This statement goes back to what we said earlier about developers having manual control over things currently handled by the API and driver, such as GPU memory management. Engine programming gurus like DICE’s Johan Andersson and Epic’s Tim Sweeney might be perfectly happy to manage resources manually, but according to Tamasi, “a lot of folks wouldn’t.”

Nvidia also believes there’s still some untapped potential for efficiency improvements and overhead reduction in D3D11. Since Mantle’s debut six months ago, Nvidia has “redoubled” its efforts to curb CPU overhead, improve multi-core scaling, and use shader caching to address stuttering problems. (Tamasi freely admitted that Mantle’s release spurred the initiative. “AMD and Mantle should get credit for revitalizing . . . and getting people fired up,” he said.)

We saw first-hand the results of Nvidia’s work two months ago. In a CPU-limited Battlefield 4 test, Nvidia’s Direct3D driver clearly performed better than AMD’s. That optimization work is still ongoing:

The performance data above, supplied to us by Nvidia, shows performance improvements over successive GeForce driver releases in Oxide Game’s Star Swarm stress test. That test also supports Mantle, which helps put Nvidia’s D3D11 optimizations in context. Tamasi conceded AMD’s Mantle version “still has less slow frames” and that D3D11 “still [has] some limiting factors,” but he reiterated his overarching point, which is that it’s possible to “do a much better job” with D3D11. Even going by our own, perhaps less flattering numbers, we’d say that’s a fair assessment.

What about OpenGL?

Direct3D 12 holds a lot of promise, but it won’t help folks running Linux-based operating systems like SteamOS. Game developers seeking to write native ports for those OSes will need to use OpenGL, and they will have to extract whatever optimizations they can out of that API.

Tamasi told us Nvidia, AMD, and Intel have all been “working hard” to help developers achieve “super high efficiency” with OpenGL. In a GDC session entitled “Approaching Zero Driver Overhead in OpenGL,” folks from all three companies demonstrated best practices for OpenGL optimizations. The techniques they outlined can be exploited with the current version of the API on today’s hardware with existing drivers, and they can result in large performance gains.

During the session, we saw performance numbers obtained with APItest, an open-source benchmark developed by Blizzard’s Patrick Doane. In Nvidia’s words, APItest is “designed to showcase and compare between different approaches to common problems encountered in real-time rendering applications.” The results showed order-of-magnitude performance differences between a “naive” approach, which Tamasi described as “writing OpenGL like Direct3D,” and the best practices advocated by GPU manufacturers.

In the graph above, the baseline “naive” approach is the top bar, while the last bar is what Tamasi describes as “writing good code.” The difference amounts to an 18X speedup. Obviously, this is an isolated test case rather than a comprehensive, game-like scenario. But I’d say the difference is large enough to make at least some OpenGL developers rethink the way they optimize their code.

The important takeaway here, I think, is that despite their involvement with D3D12, the big three makers of PC graphics hardware—AMD, Intel, and Nvidia—all have a stake in keeping OpenGL competitive. That’s good news for Linux users, and it’s especially good news for those of us hoping to see SteamOS become a real competitor to Windows in the realm of PC gaming.

Of course, SteamOS isn’t due out until the summer, and the first D3D12 titles aren’t expected until the 2015 holiday season. We’ll have to revisit these matters in the future, when we can see for ourselves how next-gen games really perform on the two platforms.