Debugging and Profiling Qt 3D applications Learn about new 5.15 features and other useful tools

Qt 3D, being a retained mode high level graphic API abstraction, tries to hide most of the details involved in rendering the data provided by applications. It makes a lot of decisions and operations in the background in order to get pixels on the screen. But, because Qt 3D also has very rich API, developers can have a lot of control on the rendering by manipulating the scene graph and, more importantly, the frame graph. It is however sometimes difficult to understand how various operations affect performance.

In this article, we look at some of the tools, both old and new, that can be used to investigate what Qt 3D is doing in the back end and get some insight into what is going on during the frame.

Built in Profiling

The first step in handling performance issues is, of course, measuring where time is spent. This can be as simple as measuring how long it took to render a given frame. But to make sense of these numbers, it helps to have a notion of how complex the scene is.

In order to provide measurable information, Qt 3D introduces a visual overlay that will render details of the scene, constantly updated in real time.

The overlay shows some real time data:

Time to render last frame and FPS (frames per second), averaged and plotted over last few seconds. As Qt 3D is by default locking to VSync, this should not exceed 60fps on most configurations.

Number of Jobs: these are the tasks that Qt 3D executes on every frame. The number of jobs may vary depending on changes in the scene graph, whether animations are active, etc.

Number of Render Views: this matches loosely to render pass, see below discussion on the frame graph.

Number of Commands: this is total number of draw calls (and compute calls) in the frame.

Number of Vertices and Primitives (triangles, lines and points combined).

Number of Entities, Geometries and Textures in the scene graph. For the last two, the overlay will also show the number of geometries and textures that are effectively in use in the frame.

As seen in the screen shots above, the scene graph contains two entities, each with one geometry. This will produce two draw calls when both objects are in frame. But as the sphere rotates out of the screen, you can see the effect of the view frustum culling job which is making sure the sphere doesn’t get rendered, leaving a single draw call for the torus.

This overlay can be enabled by setting the showDebugOverlay property of the QForwardRenderer to true.

Understanding Rendering Steps

To make sense of the numbers above, it helps to understand the details of the scene graph and frame graph.

In the simple case, as in the screen shots, an entity will have a geometry (and material, maybe a transform). But many entities may share the same geometry (a good thing if appropriate!). Also, entities may not have any geometry but just be used for grouping and positioning purposes.

So keeping an eye on the number of entities and geometries, and seeing how that effects the number of commands (or draw calls), is valuable. If you find one geometry drawn one thousand times in a thousand separate entities, if may be a good indication that you should refactor your scene to use instanced rendering.

In order to provide more details, the overlay has a number of buttons that can be used to dump the current state of the rendering data.

For a deeper understanding of this, you might consider our full Qt 3D Training course.

Scene Graph

Dumping the scene graph will print data to the console, like this:

Qt3DCore::Quick::Quick3DEntity{1} [ Qt3DRender::QRenderSettings{2}, Qt3DInput::QInputSettings{12} ] Qt3DRender::QCamera{13} [ Qt3DRender::QCameraLens{14}, Qt3DCore::QTransform{15} ] Qt3DExtras::QOrbitCameraController{16} [ Qt3DLogic::QFrameAction{47}, Qt3DInput::QLogicalDevice{46} ] Qt3DCore::Quick::Quick3DEntity{75} [ Qt3DExtras::QTorusMesh{65}, Qt3DExtras::QPhongMaterial{48}, Qt3DCore::QTransform{74} ] Qt3DCore::Quick::Quick3DEntity{86} [ Qt3DExtras::QSphereMesh{76}, Qt3DExtras::QPhongMaterial{48}, Qt3DCore::QTransform_QML_0{85} ]

This prints the hierarchy of entities and for each of them lists all the components. The id (in curly brackets) can be used to identify shared components.

Frame Graph

Similar data can be dumped to the console to show the active frame graph:

Qt3DExtras::QForwardRenderer Qt3DRender::QRenderSurfaceSelector Qt3DRender::QViewport Qt3DRender::QCameraSelector Qt3DRender::QClearBuffers Qt3DRender::QFrustumCulling Qt3DRender::QDebugOverlay

This is the default forward renderer frame graph that comes with Qt 3D Extras.

As you can see, one of the nodes in that graph is of type QDebugOverlay . If you build your own frame graph, you can use an instance of that node to control which surface the overlay will be rendered onto. Only one branch of the frame graph may contain a debug node. If the node is enabled, then the overlay will be rendered for that branch.

The frame graph above is one of the simplest you can build. They may get more complicated as you build effects into your rendering. Here’s an example of a Kuesa frame graph:

Kuesa::PostFXListExtension Qt3DRender::QViewport Qt3DRender::QClearBuffers Qt3DRender::QNoDraw Qt3DRender::QFrameGraphNode (KuesaMainScene) Qt3DRender::QLayerFilter Qt3DRender::QRenderTargetSelector Qt3DRender::QClearBuffers Qt3DRender::QNoDraw Qt3DRender::QCameraSelector Qt3DRender::QFrustumCulling Qt3DRender::QTechniqueFilter Kuesa::OpaqueRenderStage (KuesaOpaqueRenderStage) Qt3DRender::QRenderStateSet Qt3DRender::QSortPolicy Qt3DRender::QTechniqueFilter Kuesa::OpaqueRenderStage (KuesaOpaqueRenderStage) Qt3DRender::QRenderStateSet Qt3DRender::QSortPolicy Qt3DRender::QFrustumCulling Qt3DRender::QTechniqueFilter Kuesa::TransparentRenderStage (KuesaTransparentRenderStage) Qt3DRender::QRenderStateSet Qt3DRender::QSortPolicy Qt3DRender::QTechniqueFilter Kuesa::TransparentRenderStage (KuesaTransparentRenderStage) Qt3DRender::QRenderStateSet Qt3DRender::QSortPolicy Qt3DRender::QBlitFramebuffer Qt3DRender::QNoDraw Qt3DRender::QFrameGraphNode (KuesaPostProcessingEffects) Qt3DRender::QDebugOverlay Qt3DRender::QRenderStateSet (ToneMappingAndGammaCorrectionEffect) Qt3DRender::QLayerFilter Qt3DRender::QRenderPassFilter

If you are not familiar with the frame graph, it is important to understand that each path (from root to leaf) will represent a render pass. So the simple forward renderer will represent a simple render pass, but the Kuesa frame graph above contains eight passes!

It is therefore often easier to look at the frame graph in term of those paths. This can also be dumped to the console:

[ Kuesa::PostFXListExtension, Qt3DRender::QViewport, Qt3DRender::QClearBuffers, Qt3DRender::QNoDraw ] [ Kuesa::PostFXListExtension, Qt3DRender::QViewport, Qt3DRender::QFrameGraphNode (KuesaMainScene), Qt3DRender::QLayerFilter, Qt3DRender::QRenderTargetSelector, Qt3DRender::QClearBuffers, Qt3DRender::QNoDraw ] [ Kuesa::PostFXListExtension, Qt3DRender::QViewport, Qt3DRender::QFrameGraphNode (KuesaMainScene), Qt3DRender::QLayerFilter, Qt3DRender::QRenderTargetSelector, Qt3DRender::QCameraSelector, Qt3DRender::QFrustumCulling, Qt3DRender::QTechniqueFilter, Kuesa::OpaqueRenderStage (KuesaOpaqueRenderStage), Qt3DRender::QRenderStateSet, Qt3DRender::QSortPolicy ] [ Kuesa::PostFXListExtension, Qt3DRender::QViewport, Qt3DRender::QFrameGraphNode (KuesaMainScene), Qt3DRender::QLayerFilter, Qt3DRender::QRenderTargetSelector, Qt3DRender::QCameraSelector, Qt3DRender::QTechniqueFilter, Kuesa::OpaqueRenderStage (KuesaOpaqueRenderStage), Qt3DRender::QRenderStateSet, Qt3DRender::QSortPolicy ] [ Kuesa::PostFXListExtension, Qt3DRender::QViewport, Qt3DRender::QFrameGraphNode (KuesaMainScene), Qt3DRender::QLayerFilter, Qt3DRender::QRenderTargetSelector, Qt3DRender::QCameraSelector, Qt3DRender::QFrustumCulling, Qt3DRender::QTechniqueFilter, Kuesa::TransparentRenderStage (KuesaTransparentRenderStage), Qt3DRender::QRenderStateSet, Qt3DRender::QSortPolicy ] [ Kuesa::PostFXListExtension, Qt3DRender::QViewport, Qt3DRender::QFrameGraphNode (KuesaMainScene), Qt3DRender::QLayerFilter, Qt3DRender::QRenderTargetSelector, Qt3DRender::QCameraSelector, Qt3DRender::QTechniqueFilter, Kuesa::TransparentRenderStage (KuesaTransparentRenderStage), Qt3DRender::QRenderStateSet, Qt3DRender::QSortPolicy ] [ Kuesa::PostFXListExtension, Qt3DRender::QViewport, Qt3DRender::QFrameGraphNode (KuesaMainScene), Qt3DRender::QLayerFilter, Qt3DRender::QRenderTargetSelector, Qt3DRender::QBlitFramebuffer, Qt3DRender::QNoDraw ]

Hopefully this is a good way of finding out issues you may have when building your custom frame graph.

Draw Commands

On every pass of the frame graph, Qt 3D will traverse the scene graph, find entities that need to be rendered, and for each of them, issue a draw call. The number of objects drawn in each pass may vary, depending on whether the entities and all of their components are enabled or not, or whether entities get filtered out by using QLayer s (different passes may draw different portions of the scene graph).

The new profiling overlay also gives you access to the actual draw calls.

So in this simple example, you can see that two draw calls are made, both for indexed triangles. You can also see some details about the render target, such as the viewport, the surface size, etc.

That information can also be dumped to the console which makes it easier to search in a text editor.

Built in Job Tracing

The data above provides a useful real time view on what is actually being processed to render a particular frame. However, it doesn’t provide much feedback as to how long certain operations take and how that changes during the runtime of the application.

In order to track such information, you need to enable tracing.

Tracing tracks, for each frame, what jobs are executed by Qt 3D’s backend. Jobs involve updating global transformations and the bounding volume hierarchy, finding objects in the view frustum, layer filtering, picking, input handling, animating, etc. Some jobs run every frame, some only run when internal state needs updating.

If your application is slow, it may be because jobs are taking a lot of time to complete. But how do you find out which jobs take up all the time?

Qt 3D has had tracing built in since a few years already, but it was hard to get to. You needed to do your own build of Qt 3D and enable tracing when running qmake. From thereon, every single run of an application linked against that build of Qt 3D would generate a trace file.

In 5.15, tracing is always available. It can be enabled in two ways:

By setting the QT3D_TRACE_ENABLED environment variable before the application starts (or at least before the aspect engine is created). This means the tracing will happen for the entire run of the application.

environment variable before the application starts (or at least before the aspect engine is created). This means the tracing will happen for the entire run of the application. If you’re interested in tracing for a specific part of your application’s life time, you can enable the overlay and toggle tracing on and off using the check for Jobs. In this case, a new trace file will be generated every time the tracing is enabled.

For every tracing session, Qt 3D will generate one file in the current working directory. So how do you inspect the content of that file?

KDAB provides a visualisation tool but it is not currently shipped with Qt 3D. You can get the source and build it from GitHub here. Because jobs change from one version of Qt 3D to the next, you need to take care to configure which version was used to generate the trace files. Using that tool, you can open the trace files. It will render a time line of all the jobs that were executed for every frame.

In the example above, you can see roughly two frames worth of data, with jobs executed on a thread pool. You can see the longer running jobs, in this case:

RenderViewBuilder jobs, which create all the render views, one for each branch in the frame graph. You can see some of them take much longer that others.

FrameSubmissionPart1 and FrameSubmissionPart2 which contain the actual draw calls.

Of course, you need to spend some time understanding what Qt 3D is doing internally to make sense of that data. As with most performance monitoring tools, it’s worth spending the time experimenting with this and seeing what gets affected by changes you make to your scene graph or frame graph.

Job Dependencies

Another important source of information when analysing performance of jobs is looking at the dependencies. This is mostly useful for developers of Qt 3D aspects.

Using the profiling overlay, you can now dump the dependency graph in GraphViz dot format.

Other Tools

Static capabilities

Qt 3D 5.15 introduces QRenderCapabilities which can be used to make runtime decisions based on the actual capabilities of the hardware the application is running on. The class supports a number of properties which report information such as the graphics API in use, the card vendor, the supported versions of OpenGL and GLSL. It also has information related to the maximum number of samples for MSAA, maximum texture size, if UBOs and SSBOs are supported and what their maximum size is, etc.

Third Party Tools

Of course, using more generic performance tools is also a good idea.

perf can be used for general tracing, giving you insight where time is spent, both for Qt 3D and for the rest of your application. Use it in combination with KDAB’s very own hotspot to get powerful visualisation of the critical paths in the code.

Using the flame graph, as show above (captured on an embedded board), you can usually spot the two main sections of Qt 3D work, the job processing and the actual rendering.

Other useful tools are the OpenGL trace capture applications, either the generic ones such as apitrace and renderdoc, or the ones provided your hardware manufacturer, such as nVidia or AMD.

Conclusion

We hope this article will help you get more performance out of your Qt 3D applications. The tools, old and new, should be very valuable to help find bottlenecks and see the impact of changes you make to your scene graph or frame graph. Furthermore, improvements regarding performance are in the works for Qt 6, so watch this space!