Vulkan is currently in its final stage of development and we want to share with you what makes it a great graphics API and what some of the differences to OpenGL are. NVIDIA believes strongly that Vulkan supplements OpenGL, and that both APIs have their own strengths.

Vulkan’s strengths lie in the explicit control and multi-threading capabilities that by design allow us to push more commands to the GPU in less CPU time and have finer-grained cost control. OpenGL, however, continues to provide easier to use access to the hardware. This is especially important for applications that are not CPU-limited. Current NVIDIA technologies such as “bindless”, NV_command_list, and the “AZDO” techniques for core OpenGL, can achieve excellent single-thread performance.

To make the introduction a bit easier to follow, we omit some details in both text and illustrations and not all API objects are explained at their full detail.

Command Submission

In this post we want to look at the basic operations that normally happen in a rendering frame and which API mechanisms are used. Let’s look at doing a draw-call in Vulkan.

Where OpenGL’s state and drawing commands are often immediate, for Vulkan most of these operations are deferred. The CommandBuffer hosts the typical set of commands to setup rendering state and is then submitted to the Queue for execution.

The actual operations within the CommandBuffer should not sound too unfamiliar. A RenderPass is similar to framebuffer-object binding, and a DescriptorSet handles uniform bindings (buffer, texture…), more about those later.

Device : The device is used to query information, and to create most of Vulkan’s API objects.

: The device is used to query information, and to create most of Vulkan’s API objects. Queue : A device can expose multiple queues. For example, there can be dedicated queue to copying data, or the compute and/or graphics queue. Operations on a single queue are typically processed in-order, but multiple queues can overlap in parallel.

: A device can expose multiple queues. For example, there can be dedicated queue to copying data, or the compute and/or graphics queue. Operations on a single queue are typically processed in-order, but multiple queues can overlap in parallel. CommandBuffer: Here we record the general commands such as setting state, executing work like drawing from vertex-buffers, dispatching compute grids, copying between buffers… function wise nothing fundamentally different. While there are still costs for building, the submission to the queue will be rather quick.

Command Buffer Usage

We can build and submit multiple CommandBuffers in parallel, and re-use them. Re-use is particularly useful in scenarios that were traditionally CPU heavy. Imagine re-submitting a scene for multiple shadow-maps, or left and right eye for VirtualReality glasses, or submitting multiple complex objects, or entire scenes for several frames at very low CPU cost. The Vulkan driver doesn’t need to do guessing or use heuristics about their usage, as the developer provides the information at creation time up-front. The following illustration shows that Vulkan distinguishes between primary and secondary CommandBuffers.

Primary CommandBuffer always handles RenderPass setup. All the other typical rendering operations can be either directly recorded, or provided by secondary CommandBuffer.

always handles RenderPass setup. All the other typical rendering operations can be either directly recorded, or provided by secondary CommandBuffer. Secondary CommandBuffer can encode a subset of commands.

It is important to note that in core Vulkan there is generally no state-inheritance between CommandBuffers. The only inheritance is that a secondary CommandBuffer does use the active images that are being rendered into, as defined by the primary CommandBuffer.

Common Objects for Rendering

What makes CommandBuffer recording fast? A key aspect of Vulkan is to use more objects with pre-validated state and references those in the CommandBuffers. Therefore, it overcomes some deficits of unextended OpenGL. While OpenGL’s multi-draw-indirect buffer is re-usable and fillable in parallel as well, it doesn’t allow state changes (NV_command_list does to a degree). Going back even further display-lists just allowed way too many changes which resulted in only a subset being implementable fast. Display-lists also stored the data immutable along with them, while the modern way is to reference data. This means a scene represented by a CommandBuffer can still be fully animated, as the referenced data such as matrices or vertices reside in buffers whose contents can be changed independently.

The next image shows what objects are used for the various commands.

Image : Represents formatted data organized in regular grids used in texturing, render-targets… Equivalent of an OpenGL texture.

: Represents formatted data organized in regular grids used in texturing, render-targets… Equivalent of an OpenGL texture. FrameBuffer : A set of Image attachments that are being rendered into. It must match the configuration of the RenderPass it is used with.

: A set of Image attachments that are being rendered into. It must match the configuration of the RenderPass it is used with. RenderPass : In principle encodes the format of the framebuffer attachments, what type of clears, whether we do multi-pass effects, pass dependencies… This is one of the bigger new features that Vulkan has to offer which will be subject of a later blog post.

: In principle encodes the format of the framebuffer attachments, what type of clears, whether we do multi-pass effects, pass dependencies… This is one of the bigger new features that Vulkan has to offer which will be subject of a later blog post. Buffer : Represents raw linear memory used for vertex, index, uniform data… Equivalent to an OpenGL buffer.

: Represents raw linear memory used for vertex, index, uniform data… Equivalent to an OpenGL buffer. Pipeline : Encodes rendering state such as shaders being used, depth-testing, blending operations… All captured into a single monolithic object. Because all important state is provided upfront at the creation time of the object, its later usage can be very quick. OpenGL’s internal validation may have to do state-dependent compilation of shaders that at worst could create stuttering at draw-time. With Vulkan you have precise control over when such validation is triggered.

: Encodes rendering state such as shaders being used, depth-testing, blending operations… All captured into a single monolithic object. Because all important state is provided upfront at the creation time of the object, its later usage can be very quick. OpenGL’s internal validation may have to do state-dependent compilation of shaders that at worst could create stuttering at draw-time. With Vulkan you have precise control over when such validation is triggered. DescriptorSet: A set of bindings for shader inputs. Instead of binding resources individually in OpenGL, Vulkan organizes them in groups. You can re-use such a binding group as well. In a later blog post we will cover the various ways how to provide uniform data to your compute or draw calls.

Allocation Management

There is a new level of complexity to Vulkan, that didn’t really exist in OpenGL before. We will only briefly touch on the topic of allocation management here. In Vulkan various API objects are generated from other resources, as hinted in the image below.

CommandBufferPool : The CommandBuffers and their content are allocated from these pools.

: The CommandBuffers and their content are allocated from these pools. DescriptorPool : Many DescriptorSets can be allocated from a single pool.

: Many DescriptorSets can be allocated from a single pool. Heap : The device comes with fixed amount of limited heaps, which memory is allocated from.

: The device comes with fixed amount of limited heaps, which memory is allocated from. Memory: Buffers and Images are bound to Memory depending on their requirements and the developers preference. This allows manual sub-allocation of resources from a bigger block of memory or aliasing the memory with different resources.

The pools simplify deletion of many resources that were allocated from them at once and they also ensure allocations can be done lock-free by using per-thread pools. For example one can use a different CommandBufferPool per-frame and create all temporary CommandBuffers from it. After a few frames when all these CommandBuffers have been completed by the GPU, the pool can be reset and new temporaries generated from it.

Memory management also allows for greater control and new use-cases such as aliasing memory. A memory allocation is rather costly and some operating systems also have fixed overhead for how many allocations are active at once. We therefore encourage developers to sub-allocate resources from larger chunks of memory.

NVIDIA’s Vulkan Driver

Starting with a new API can involve a lot of work as common utilities may not yet be available. NVIDIA will therefore provide a few Vulkan extensions from day zero, so that you as developer can enjoy less obstacles on your path to Vulkan. We will support consuming GLSL shader strings directly next to Vulkan's mandatory SPIR-V input. Furthermore we leverage our industry leading OpenGL driver and allow you to run Vulkan inside an OpenGL context and presenting Vulkan Images within it. This allows you to use your favorite windowing and user-interface libraries and some of our samples will make use of it to compare OpenGL and Vulkan seamlessly.

With this we conclude our first overview on how Vulkan operates in some basics. We hope some of the core principles of the API were conveyed and didn’t look too complex. While the actual responsibilities in the detailed usage as well as the verbosity of the API may be a struggle at times, the actual principal mechanisms of Vulkan are not that “alien” to graphics programmers.