In my blog post about single buffered strip rendering I talked about reducing latency by shortening the graphic pipeline length. In the second post I described one way of speeding up the barrel distortion render by moving the distortion transformation out of the fragment shader. Both methods increase the throughput and therefore help in reducing the latency for the VR use case. There are other techniques to even further accelerate VR content creation. For example one interesting approach is to move the distortion transformation into the content generation step which makes a second presentation render basically obsolete. Although this is a huge win, it needs direct support by the VR application and therefore isn’t usable in a generic VR framework.

Another technique is to reproject the final content shortly before presenting it to the user and also to decouple the content generation and the VR post-processing. Like with single buffered strip rendering the correct timing is important here.

In this blog post I will explain why a GPU with fine-grained preemption support is essential when using those advanced techniques in VR.

CPU pinning

Before we look at the GPU side of things let’s talk about the CPU and Linux scheduling first.

A minimum frame rate of 60 fps is required to maintain presence in a VR environment. High-end desktop PC VR platforms target even higher numbers. It is more important to have a smooth and steady frame rate than e.g. high resolution content. Developers should always adjust their content to achieve those frame rates. However, even if they do so there might be times that the content creation can’t keep up and falls below 60 fps. It is important to remember that Android/Linux is a multi-process operation system and realtime behaviour of specific processes/threads is not guaranteed. A process can be interrupted at any time and moving a process from one CPU to another is an expensive operation because, for example, some caches are per CPU and they become invalid when doing so.

One way to partly overcome this is to pin a specific thread to a specific CPU. Using sched_setaffinity is improving the scheduling behaviour of a VR application considerably. Furthermore, Android has started to use cpusets to assign specific processes (such as background services) to selected CPU groups. There is even one top-app cpuset which includes one exclusive CPU to be used by the foreground application only.

Systrace of Linux scheduling with no CPU pinning. The two threads freely change CPUs at any point.

Systrace of Linux scheduling with CPU pinning. The content creation thread is pinned to CPU 2 and the VR post-processing thread is pinned to CPU 3.

Pinning an important thread to a specific CPU or even giving it exclusive access to one CPU is an important step towards a consistent rendering experience. Nowadays SoCs consists usually of 4 or more CPUs, so this isn’t a huge hurdle anymore.

However, even with these precautions in place realtime behaviour can’t be guaranteed.

GPU Preemption

One of the more advanced techniques in VR is asynchronous time warping. In the context of VR this was first implemented by Oculus. It is used to achieve two main goals:

reducing latency increasing frame rate

I will not explain how it achieves those goals; you can read about it in the blog post of Oculus or watch this very informative video on the topic. The key takeaway is that the algorithm waits for the final render to start until a very short period of time before the vsync. It does this to be able to query the sensors again and therefore reduce the motion to photon latency. This final render has a fixed workload and an SoC vendor can approximate the time it takes to normally finish this task.

Furthermore to increase the frame rate the content creation is decoupled from the VR post-processing by using two distinctive threads which operate independently.

Where does GPU preemption fit-in here? As I said, the GPU has only very little time to process the VR post-processing render. An SoC vendor would always choose the smallest possible value to get the most out of the late reading of the sensors. The problem is that at the same time an (asynchronous) content render (or anything else in the system) can have submitted other work too. How can we guarantee that our important post-processing render finish within our target time?

Context Priority

First of all we need to understand that a modern GPU also has a scheduler for distributing multiple render tasks to only one or a small number of hardware units. The scheduler takes into account if a task is ready to run or if it has dependencies which are not fulfilled and therefore another task can run in the meantime. Furthermore the scheduler is also able to interrupt running tasks to switch to a task with a higher priority. The PowerVR Rogue hardware architecture is designed to do this interruption at a very fine-grained level. This allows interrupting between the finishing of one tile and starting the processing of the next tile. Having a tile size of 32×32 pixels for example allows for multiple interruption points while processing a fullscreen render.

To make this advanced GPU scheduling control mechanism available to OpenGL ES (or compute) developers, Imagination Technologies proposed the EGL_IMG_context_priority extension back in 2009. This extension was ratified by Khronos in the same year. It defines three priority levels to differentiate between individual workload requirements. In our VR use case we obviously choose EGL_CONTEXT_PRIORITY_HIGH_IMG for the post-processing thread and EGL_CONTEXT_PRIORITY_MEDIUM_IMG for the content render (which is also the default).

The effect of this can be seen in the following systrace:

Systrace highlighting GPU preemption.

I artificially increased the workload of the content render to see how it influences the post-processing render task. We can see how the content creation thread submits render roughly every 32ms. The VR post-processing thread submits render every 16.7ms to keep a steady frame rate of 60 fps. The green content render tasks gets interrupted by the blue post-processing tasks as highlighted in the “GPU: 3D” row. At one point, the green “Content Creation Task 2” is interrupted three times by multiple blue tasks. This ensures the blue post-processing tasks are able to finish in time for our target frame rate.

Conclusion

Fine-grained GPU preemption make techniques like single buffered strip rendering or asynchronous time warping possible. It helps in balancing the creation of rich content and VR post-processing even at times when the CPU or GPU can’t keep up with its tasks in a timely fashion. Obviously there might be other use cases for GPU preemption and developers are free to make use of the EGL_IMG_context_priority extension for their own applications. VR being such a demanding task for a portable device, developers should also keep an eye on the CPU scheduling and make use of all the profiling tools available.