A couple of weeks ago, NVIDIA published a vademecum of DirectX 12 Do’s and Don’ts that went largely unnoticed. However, it actually contains some interesting information on the tips that NVIDIA gave to developers on how to best use Microsoft's new lower level API with their existing architecture.

A couple of them, for instance, seem to confirm two stories we reported last month about Maxwell problems with Asynchronous Compute. In case you don't recall, the reference is to AMD's Robert Hallock saying that Maxwell can't perform Async Compute without heavy reliance on slow context switching; a few days later, Tech Report's David Kanter mentioned that according to Oculus employees, preemption context switching was potentially catastrophic for Maxwell GPUs.

Now, under the Pipeline State Objects (PSOs) section, they were very clear:

Don’t toggle between compute and graphics on the same command queue more than absolutely necessary

This is still a heavyweight switch to make

That's not all they had to say about compute and graphics tasks - under the Work Submission – Command Lists & Bundles section, NVIDIA warned developers as follows:

Check carefully if the use of a separate compute command queues really is advantageous

Even for compute tasks that can in theory run in parallel with graphics tasks, the actual scheduling details of the parallel work on the GPU may not generate the results you hope for

Be conscious of which asynchronous compute and graphics workloads can be scheduled together

Finally, NVIDIA also gave some advice on how to best use Maxwell and DirectX 12 hardware features. They recommend to use Conservative Rasterization, which right now is only available on Maxwell cards, while they are a bit more cautious about Raster Order Views, the other DX12_1 level feature.

Use hardware conservative raster for full-speed conservative rasterization

No need to use a GS to implement a ‘slow’ software base conservative rasterization - See https://developer.nvidia.com/content/dont-be-conservative-conservative-rasterization

Make use of NvAPI (when available) to access other Maxwell features

Advanced Rasterization features:

Bounding box rasterization mode for quad based geometry

New MSAA features like post depth coverage mask and overriding the coverage mask for routing of data to sub-samples

Programmable MSAA sample locations

Bounding box rasterization mode for quad based geometry New MSAA features like post depth coverage mask and overriding the coverage mask for routing of data to sub-samples Programmable MSAA sample locations Fast Geometry Shader features:

Render to cube maps in one geometry pass without geometry amplifications

Render to multiple viewports without geometry amplifications

Use the fast pass-through geometry shader for techniques that need per-triangle data in the pixel shader

Render to cube maps in one geometry pass without geometry amplifications Render to multiple viewports without geometry amplifications Use the fast pass-through geometry shader for techniques that need per-triangle data in the pixel shader New interlocked operations

Enhanced blending ops

New texture filtering ops

Don’t use Raster Order View (ROV) techniques pervasively

Guaranteeing order doesn’t come for free

Always compare with alternative approaches like advanced blending ops and atomics

For more about DirectX 12, you can check our Fable Legends benchmark results, Lionhead's statement on the DX12 features used in Fable Legends and our own analysis on Async Compute in the game.