What is it?

View instancing allows a shader to be run multiple times in a single draw call to draw different instances. The SV_ViewID semantic is provided to the shader which defines the index of the view instance. The advantage to using this API is the support of hardware acceleration and simplicity of multi view shading.

Hardware Support:

Nvidia’s Turing architecture (RTX 20 Series) implements this API as “Multi-View rendering” and supports 4 arbitrary views in hardware and up to 32 software views (see below). Pre-Turing GPUs have up to 32 software views and no hardware views. Interestingly Pascal (GTX 10 Series) does support single pass stereo which is hardware acceleration for 2 views ports however view instancing cannot make use of this, most likely due to a lack of flexibility in the hardware.

Pipeline for a software View Pipeline for a hardware View

Software Views:

A software view sounds largely redundant however they reduce CPU overhead by allowing the application to record commands once and the driver can follow a fast path to reduce CPU overhead further. There is no major advantage on the GPU, the driver just loops the draw calls changing the viewID. (See above)

Setup:

To use view instancing the GPU must support D3D12_VIEW_INSTANCING_TIER_1 or greater. Support can be queried though the D3D12_FEATURE_DATA_D3D12_OPTIONS3 data struct. The API currently limits the number of view instances in a draw command to 4.

Pipeline State Object (PSO):

This API requires the use of the PSO extensions API. It is recommended use the “d3dx12.h” helper header provided here.

The D3D12_VIEW_INSTANCING_DESC struct provides the ability to define a render target or view port index for each view. This index is added to the SV_RenderTargetArrayIndex set in a shader. Both methods are fully supported.

Shaders:

To access the SV_ViewID semantic shaders must be built with Shader model 6.1 or above. This requires the use of the new DIXL complier available here.

View instancing for point light shadow mapping:

In this example, view instancing is used to capture cube shadow maps for a point light. Due to the 4-view limit, two render passes are used with 3 view instances per pass. There are 4 point lights in the scene. The scene contains a number of varied meshes,the object count is the number of shadowing meshes.

Results:

The following results were captured on a GTX 1080 (pascal).

Object count Metric CPU Geometry shader View instancing 36 objects GPU 2.50ms 4.01ms 2.60ms 36 objects CPU time 0.63ms 0.28ms 0.36ms 1036 objects GPU 13.70ms 16.30ms 16.01ms 1036 objects CPU time 12.20ms 4.40ms 6.50ms

In the small scene the geometry shader is slower on the GPU but has the best CPU time. In the large scene it is much closer to the GPU time of view instancing with a slightly better CPU time.

The reduced CPU overhead of view instancing is very apparent in the large scene with an improvement of almost half (5.7ms faster) over CPU rendering.

Instance masking:

The API exposes a function to mask off views from being rendered to. This must be enabled in the PSO first. Then the “SetViewInstanceMask” (DOCS) function can be used to set the mask at draw time.

Further uses:

Single Pass VR rendering.

Single Pass Reflection Probe capture.

Links:

DirectX engineering spec: https://microsoft.github.io/DirectX-Specs/d3d/ViewInstancing.html

An Nvidia presentation on Turing features (and diagram source): http://on-demand.gputechconf.com/gtc-eu/2018/pdf/e8300-nvidias-vr-insights-opengl-vulkan-and-dual-input-hmds.pdf