So you just built yourself a CAVE. Or maybe you already have one, but you just finished implementing an awesome CAVE application. Either way, you want to tell the world about your achievement. You can invite people to come over and experience it, and while that’s the best way of going about it, you can only reach a handful of people.

So what do you do? Well, you make a movie of course. A movie can be seen by millions of people, and they can all see how awesome CAVEs are. Just one tiny little problem: how do you actually go about filming that movie?

The most obvious approach is to take a video camera, fire up the CAVE, put someone in there, point, and shoot. Since the CAVE’s walls are projection screens, the camera will record what’s on the walls, and that should be good enough, right? Well, I don’t think so. There are plenty of such videos out there, and I picked one more or less at random (no offense to whoever made this):

Now, I honestly don’t think this is convincing at all. If you haven’t already experienced a CAVE first hand, can you actually tell what’s going on? The images on the screens are blurry due to the stereoscopic display, severely distorted due to head-tracked rendering, and move and wobble around as the user moves in the CAVE. It is impossible to tell what the 3D model displayed in this CAVE really looks like, and I believe it is impossible to understand — for a lay audience — what the person in the CAVE is actually experiencing. The user being all “woo” and “aah” and “awesome” doesn’t really help either — looking at this with a cynical eye, you’d rightfully feel someone is trying to pull wool over your eyes. If I didn’t already know better, and saw this video, I wouldn’t be keen on seeing a CAVE in person. It looks rather lame and headache-inducing, to be honest.

So what’s the mistake? A CAVE works because the images on the screens are generated specifically for the person viewing them, which is why positional head tracking is a required ingredient for any CAVE (and why CAVEs are in principle single-user environments). But in this case, the person in the CAVE is not the intended audience: the people watching the movie are the audience. The solution is very simple: instead of head-tracking the user in the CAVE, you have to head-track the camera. And turn off stereoscopy while you’re at it, because the video camera is only monoscopic after all.

Here is one of my old movies, showing what that looks like:

This is a bit better. The movie is not particularly good quality (it’s old; I need to record a few new ones), but at least viewers can see what’s going on. The difference between this and the previous movie is that here the images are correctly projected onto the CAVE walls. Looking closely, you will see the seams where the walls meet, but you will also see that the 3D images cross those seams without being broken up or distorted. If you squint enough no longer to see the seams, it will seem as if the CAVE is one big flat screen. As a result, virtual 3D objects show up at their proper size and in proper relation to the real user in the CAVE. If the user touches a part of the data using the hand-held input device, this will show up properly in the video. In the latter parts, where a yellow “selection sphere” is attached to the input device, the sphere’s image in the video shows up in exactly the right place and size.

There are still two things wrong with this video: for one, it appears as if the user is having trouble working with the CAVE. We’ll address that later. The second issue is that the CAVE doesn’t look particularly dynamic. This is because the camera is on a tripod and doesn’t move throughout the movie (the movie only cuts between two different camera setups). One very strong depth cue that particularly applies to 2D video is motion parallax, where 3D objects move in a very particular way as the camera filming them is moved, and our brains are particularly good at picking up on that. Because the camera here doesn’t move, there is no motion parallax, and the CAVE looks somewhat “flat.”

This next movie addresses that issue by using a hand-held camera, which is still tracked by the CAVE as in the previous movie:

This looks a lot better. In fact, it looks so convincing that I have gotten many questions asking how I made the video to be 3D. Answer is, of course, that it’s not a 3D movie; it’s a regular 2D movie exploiting motion parallax. The trick is that I’m moving the camera as much as I can, to show how the virtual 3D objects appear to move in the exact same particular way as the real objects (in this case the user). Real filmmakers would tell me to cool it, but in this case egregious camera movement is a necessary evil. While I’m panning the camera, the virtual and real objects move in exactly the same way; in other words, the virtual objects appear real, which is exactly how and why a CAVE works.

But even this does not address the remaining issue: the 3D interactions captured in these movies seem awkward, as if the users didn’t know what they were doing, or, worse, as if a CAVE or VR in general were very hard to use, and not particularly effective. I call this the catch-22 of filming VR. Above I mentioned that in a CAVE, the images on the walls are generated specifically for one viewer, in order to appear real. But in these movies, the images are generated for the camera — and not even in stereo, to boot.

This means the actual user in the movies does not see the virtual objects properly, and is essentially flying blind. That’s the reason why the interactions here look awkward. Instead of simply being able to touch a virtual object as if it were real and then interact with it, the poor user has to judge his or her actions against the feedback of the generated images (which, from his or her perspective, don’t look like real objects at all), and adjust accordingly. This is why it was so hard to properly measure a distance on the globe in the second video, it was basically pointing trial-and-error.

So the choice seems to be: let the user in the CAVE be the “main” viewer, allowing them to interact properly and fluently, but create a movie that looks utterly unconvincing, or let the camera be the “main” viewer, capturing beautiful video, but giving the impression that CAVEs are hard to use. If you want to communicate that CAVEs are awesome to look at and easy to use, that’s a lose-lose situation.

Or at least that’s what I thought, until a member of the KeckCAVES group applied some lateral thinking and suggested to “split” the CAVE: on half of the screens, show the images from the user’s point of view; on the other half, show them from the camera’s point of view. Then, if the user only looks at the first half, and the camera only looks at the second half, you can capture good-looking video with fluent interactions. The only thing left for me to do was say “D’oh” and do it right away. Here’s one of the early videos showing this new approach:

This is more like it. For this particular setup, I only set up the right CAVE wall to render for the camera, and left the other three screens (back and left wall and floor) for myself. I put the camera on a tripod and aimed it straight at the right wall, and stood at the very edge of the CAVE to give the camera the best possible view. You’ll notice how I misaimed a bit: at the left edge of the video, the camera is capturing some of the user-centered stereo projection on the back wall, and it looks weird. But it’s not bad enough to warrant a complete re-shoot.

Now, unfortunately this means we’re back to static non-moving cameras. In this particular case, I shot the whole movie by myself, meaning I couldn’t have done hand-held anyway, but I’m not quite sure how to do hand-held in this setup. While moving the camera around, it will naturally capture more than one screen unless one is very careful, and if the camera sees one of the for-user screens, the illusion will break down. In short, I’m not sure. I guess the best compromise for now is to film two versions of the same movie and intercut them (like I did for the LiDAR Viewer movie): one with a hand-held camera to show how a CAVE looks, and one with a split CAVE and fixed camera to show how interactions work. With some clever editing, that could create stellar results — but I haven’t tried it yet, primarily for lack of time. Capturing good CAVE video is not quick.