At Oculus we’re looking to pave the way for consumer augmented and virtual reality, defining the components needed to help both developers and users have the most immersive, enjoyable experience in VR. Today we’re going in depth on one of our recently announced technologies, Passthrough+ . We will provide insights into how it works, powered by the NVIDIA Optical Flow SDK on Turing GPUs.

Passthrough+

Passthrough+ is a feature in Oculus Rift S which channels the front facing cameras into the virtual world in a stereo-correct passthrough mode. Traditional VR passthrough is limited in its quality because the cameras are often not anywhere near the player’s eyes. This is a technical limitation and impractical to place the cameras physically on the player’s eyes. We can, however, approximate this. Passthrough+ is the state-of-the-art approach to projecting the real-world camera images as though the cameras were at the user’s eyes to reduce depth disparity and increase comfort. Despite the lower frame rate of the cameras, the world feels natural and full frame rate. This works because when comparing two frames taken at the same time from different perspectives, we can restore disparity information. From this disparity, measured in pixels, we can, knowing camera location, infer actual distance values from the images. Passthrough+ can do this disparity estimation in real time by repurposing our Asynchronous Spacewarp technology.

Asynchronous Spacewarp 2.0

We recently announced Asynchronous Spacewarp (ASW) 2.0 , and it has become the industry standard for VR reprojection since its 1.0 release back in 2016. ASW works to reduce requirements, both on the user’s machine and for the developer in optimization. It uses optical flow to infer motion within the scene and extrapolate further. This means that if the application can no longer maintain the frame rate of the VR display, we can start synthesizing plausible intermediary frames to fill in. Shown below is an image from Lucky’s Tale, with motion vectors applied to the scene, exhibiting the direction of movement.

Finally, optical flow is a computer vision technique for identifying where the changes are frame to frame. This could be a movie, where motion is tracked to follow objects or the scene as a whole. Or this could be two images shot at the same time but from different cameras for disparity analysis. At a much more basic level, such block matching and tracking is the core component to modern day video encoding. The comparison between optical flow and block matching based motion estimation in video encoding is a good one, but the goals are somewhat incongruent. In video encoding, the goal of block matching is not to find motion in the scene, though that’s an interesting side effect. Instead, with video encoding the goal is to locate similar patterns and exploit the correlation for compression. This is to say video encoders will find the best result which produces the best output for the most condensed stream of data. While this is often the flow of the scene, there’s no guarantee or expectation. Optical flow involves finding the more natural movement of objects within the scene. For compression it would be very sub-optimal, so long as objects themselves move en masse and path trace their real life movement.

Repurposing the video encoder, has until now, been the underpinning of ASW. The major advantage to using a video encoder for ASW is that it operates in parallel with graphics rendering on the GPU. This means the application falling behind in rendering performance is not negatively impacted as we attempt to generate synthetic frames. Instead, the application has all the resources it had before, along with the ability to predict motion. The major downside however, is that prediction in the synthetic frames isn’t as accurate as it could be.

Optical Flow

Enter optical flow. NVIDIA has produced a faster and more precise SDK for estimating plausible optical flow in a scene. NVIDIA announced their optical flow SDK earlier this year, while it operates as a replacement to the previous motion estimation-only mode of the NVENC video encoder. Available on Turing hardware, NVIDIA optical flow quadruples the macroblock resolution, increases motion vector resolution, enables following objects through intensity changes, and emphasizes plausible optical flow over compression ratios. The result is half the average end point error than traditional video encoding motion vectors. The qualitative results are equally impressive. With ASW, near-field objects track more reliably. Swinging flashlights hallucinate motion much less frequently and the increased precision means movement is tracked more accurately to individual particles and objects.

Depth From Disparity

For Passthrough+ this means increased stereo resolution of the projected world, while thin objects are correctly tracked and followed. When faced with low contrast or over-exposed areas, NVIDIA optical flow can still infer meaningful disparity values, preventing visual holes or missing data as we estimate the depth of the scene. Below is a video and depth feed generated from analyzing disparity with the respective hardware. The inset window shows the estimated depth in the scene, while the darker areas are further back. Passthrough+ operates acceptably with video encoder motion vectors. The improvement with NVIDIA Optical Flow is visible in the depth buffer, showing a much more precise approximation of depth.

Launching Soon!

While ASW and Passthrough+ run well on existing minimum and recommended specification hardware across vendors, the best motion estimation is available on NVIDIA Turing hardware through the NVIDIA Optical Flow SDK. As a user, there is nothing to be done except have Turing hardware and the latest NVIDIA drivers when running the Oculus runtime. The Oculus runtime containing the integration will be in the 1.38 version, available through our runtime in June, 2019.

Future Work

This isn’t the end of the line for ASW and optical flow. As demonstrated here, optical flow doesn’t just track movement, but also informs how scenes are arranged spatially. There’s a great deal of information we can glean about environments using optical flow, none of which we can do well with a traditional video encoder approach. The optical-flow SDK opens up many opportunities and research, so be sure to check back for more learnings + insights!