Today we are going to focus on the second major cause of discomfort when using VR headsets: The inaccuracy of sound and the mismatch between sound and vision when streaming.

Why sound is important

As with the lag in vision we discussed in the last post, your brain perceives VR as “real” and thus any oddness it perceives in the simulation will cause you discomfort.

Notably, humans have become absolutely desensitized to static visual abnormalities. We spend so much time with illustrated, painted or rendered media and very few of us still have averse reactions to any of it. If you are over 30 and grew up around very old relatives you may still have met people getting motion sickness from TV though. But I digress.

The second most dominant sense in most humans is sound. We hear all the time and sound helps us in a myriad of ways – from communication to location tracking. This also means that if sound goes “wrong” it can cause nausea or other discomfort very quickly.

Wrong surround sound

Humans have two ears and earphones have two speakers. Perfect, right? Well, not quite. By modulating sound intensity between the two ears we can very accurately simulate if a sound is coming from directly to our left or directly to our right. It is not a good system to simulate if a sound is coming from the front, back, above or below however. To process this sound, our ears perceive minute delays and modifications in sound that occur when the sound wave gets bounced into our ear canal from different angles. The stereo signal used by earphones and the vast majority of games cannot replicate this information.

Luckily, modern software has become surprisingly good at simulating surround sound on stereo earphones. It’s far from perfect, but games that make use of such technology can give you a reasonably good impression of where a sound originates from.

Unfortunately, many many games do not do this. Small studios shall be forgiven. Investing time and money into surround sound definitely comes after polishing the visuals. But even larger studios make the same mistake. For example SkyrimVR does not support surround sound simulations without mod support. This leads to a constant slight disconnect between where a sound appears to be coming from and where our eyes tells us that the sound source is located.

Practically speaking, the sound you hear when talking to an NPC will always roughly match up with the location of said NPC, but locating a dragon you haven’t yet spotted using sound alone is all but impossible in the vanilla game.

Streaming and Delayed Sound

When streaming VR games to wireless headsets like the Oculus Quest, another problem enters the equation: Encoding video and encoding audio are two very different tasks. When you transcode a video file, the problem is a minor one. Usually the video is encoded before the audio and then both are joined together perfectly. When streaming something that is generated on the fly (like a game) however, the problem becomes much more difficult.

I won’t go into too much technical detail here, but the TL;DR summary is that it is surprisingly difficult to align audio and video when real-time encoding. The are usually off by a factor of about 100ms. That doesn’t sound like much, but it can have a huge impact.

Imagine you are in an immersive 3d world like that of Skyrim, Nympho Trainer or Amoreon. You are sitting in front of a constant sound source such as music player of some sort. You turn your head, your visuals change, the location of the sound changes – but they are slightly out of sync.

Trust me when I say that this will drive you up the wall within minutes. It can also introduce nausea in some individuals. The solution to this problem is tricky. First of all, there are a myriad of ways that video actually gets encoded. Some Quest users on AMD cards encode H264 video on their CPU. Other users on Nvidia Cards encode it using NVENC on their GPU. There are dozens of current GPUs and hundreds of CPUs. All with their own performance profiles when it comes to encoding sound and video. It is impossible for streaming solution developers to predict how an individual player’s system will behave.

Some users get lucky and the sound and video will align by chance. Others will get very unlucky instead. Many streaming solutions offer configuration settings that introduce an additional delay in the sound. However this only really helps if the sound arrives before the video and the opposite is just as common. Ultimately, this is a challenge that will need to be solved by the creators of encoders. We have seen great strides in the realms of real-time encoding thanks to game streamers gaining popularity on sites like YouTube and Twitch so it can be expected that in the coming months and years the increased popularity of streaming to VR headsets will incentivize vendors to reduce the drift between audio and video in real time encoding.

As a last resource, changing the bitrate of the video that is streamed to your device shifts a myriad of parameters that determine how quickly audio and video are encoded and may help in extreme cases. This is nothing but a hack however.

Closing Thoughts

Hearing is the second most important sense for most people. Issues in audio perception therefore often hamper VR experiences. Streaming lag is a currently unavoidable result of encoding technology. It should get better in the future however. If you develop immersive games, try to invest some time into simulating surround sound on stereo speakers. Many frameworks such as unity have ready-made plugins for this purpose.