I have briefly mentioned HoloLens, Microsoft’s upcoming see-through Augmented Reality headset, in a previous post, but today I got the chance to try it for myself at Microsoft’s “Build 2015” developers’ conference. Before we get into the nitty-gritty, a disclosure: Microsoft invited me to attend Build 2015, meaning they waived my registration fee, and they gave me, like all other attendees, a free HP Spectre x360 notebook (from which I’m typing right now because my vintage 2008 MacBook Pro finally kicked the bucket). On the downside, I had to take Amtrak and Bart to downtown San Francisco twice, because I wasn’t able to get a one-on-one demo slot on the first day, and got today’s 10am slot after some finagling and calling in of favors. I guess that makes us even. 😛

So, on to the big question: is HoloLens real? Given Microsoft’s track record with product announcements (see 2009’s Project Natal trailer and especially the infamous Milo “demo”), there was some well-deserved skepticism regarding the HoloLens teaser released in January, and even the on-stage demo that was part of the Build 2015 keynote:

The short answer is: yes, it’s real, but…

The long answer is, well, long. To tackle it, we have to first detail what Microsoft is promising: at the most basic level, to “add holograms to your world” (for the sake of peace, let’s keep the discussion of what is holographic and what isn’t elsewhere). On a slightly more technical level, this means HoloLens is promised as an untethered see-through Augmented Reality (AR) headset that seamlessly merges virtual three-dimensional objects into the real world, and allows users to interact with those objects via gaze, gesture, and voice (“GGV,” as it was called during the demos). On a deeper technical level, this means HoloLens has to have the following features:

A stereoscopic see-through display, ideally one with high resolution, high brightness and contrast, large field-of-view, and a way to remove real-world objects that are supposed to be occluded by virtual objects.

Low-latency 6-DOF (positional and orientational) tracking, to render virtual 3D objects from the users correct point-of-view, and track the user’s gaze direction for interaction.

A real-time scanning system that creates a 3D model of the user’s real environment, to have virtual objects interact with that environment (such as placing a virtual object onto a table, or hanging a virtual picture on a real wall).

A reliable, low-latency, and accurate hand tracker to detect gestures, and allow direct manipulation of virtual objects (such as grabbing an object with one’s hand, and moving it to a different position).

A reliable speech recognition engine.

Display

Let’s start with the display. The biggest question going in was: how big is the field of view? And the answer is: small. As I was stripped of all devices and gadgets before being allowed into the demo room, I had to guesstimeasure it by covering the visible screen with my hands (fingers splayed) at arm’s length, ending up with 1 3/4 hands horizontally, and 1 hand vertically (in other words, a 16:9 screen aspect ratio) (see Figure 1). In non-Doc-Ok units, that comes out to about 30° by 17.5° (for comparison, the Oculus Rift DK2’s field of view is about 100° by 100°). From a practical point of view, this means that virtual objects only appear in the center of the viewer’s field of view, which turns out to be very distracting and annoying. Interestingly, this is compounded by the visor’s much larger physical field of view: on the plus side, the user doesn’t get tunnel vision (it feels a bit like wearing lab safety glasses), but on the other hand, the virtual objects get cut off for no visible reason, such as spectacle frames, and simply vanish into thin air at the edges of the screen.

Another big question was: can HoloLens present opaque virtual objects? As the screen is transparent, it by itself can only add light to the user’s view, which would give virtual objects a ghostly appearance. There was some speculation whether HoloLens has an additional display layer that can turn the screen opaque pixel-by-pixel, but it turns out it does not. The screen is bright enough that, in a controlled environment like the darkened demo rooms, the background is effectively erased by virtual objects, but when viewing objects against a bright background (I used a table lamp), they become barely visible.

And the third, and final, big display question was: does HoloLens provide accommodation cues, i.e., does it present virtual objects at the proper focal distance, like a real hologram or a light field display? This one I can’t answer definitively. I was going to test it by moving very close to a virtual object and comparing the object’s focus against my hand right next to it, but it turns out the HoloLens’ near plane is set at about 60cm, meaning objects can’t be viewed up close. As HoloLens is supposed to augment human-sized environments, it can assume that virtual objects only appear between 60cm (near plane) and a few meters distance, and could get away with a fixed focal distance somewhere in the middle, which I think is exactly what it does. In practice, virtual objects looked sharp and in focus throughout the range that was available. It will be interesting to see how this changes with applications that allow to extend the real space, by virtually punching holes into the real environment. According to attendees of the “holographic academy,” one of the demo applications there did just that, but I did not find any comments regarding focus.

Aside from these points, the display’s overall quality was very good. It had high resolution (individual pixels were barely visible), good brightness and contrast (at least in the darkened demo room), and good sharpness across the entire screen. If I were to guess — and it’s only a guess — I’d say it’s a 1280×720 display. For comparison, the Oculus Rift DK2 has 960×1080 pixels per eye, spread over a 100°x100° field of view, and looks comparatively blocky. Stereoscopic presentation was also very good (the demo team measured and entered my IPD prior to the demo); objects were embedded into the real environment at apparently proper depth and scale. I would have liked to test this thoroughly by moving very close to an object, but alas, the 60cm near plane prevented that.

6-DOF Tracking

On to tracking. I still can’t say with confidence what tracking method the HoloLens uses. It sports four cameras, one (stereo?) pair each on either side of the visor, facing approximately 45° forward-left and forward-right. Until more technical details are released, I am assuming that tracking is based on sensor fusion of inertial dead reckoning and real-time simultaneous localization and mapping (SLAM), which could be the purview of the HoloLens’ mysterious “holographic processing unit.” I tried tripping up the tracking system by covering both camera pairs with my hands, but tracking didn’t break down. This could be due to me getting my hands in the wrong place, or there being additional cameras behind transparent covers, or due to tracking using an entirely different mechanism. It’s pure speculation at this point.

But whatever tracking method is used, it works very well. There is no noticeable tracking noise while holding one’s head still, and there is only little jitter while moving or rotating one’s head slowly. There is noticeable tracking latency, such that virtual objects get visibly dragged along with gaze direction changes. But unlike in head-mounted Virtual Reality, where such lag can quickly lead to simulator sickness, in Augmented Reality lag manifests as virtual objects becoming “untethered,” and wobbling around their intended positions. This is visible, and can be a minor distraction, but doesn’t cause problems as the user’s visual and vestibular system always stay locked to the real environment.

Environment Scanning

The HoloLens’ environment scanner was a prominently featured part of the demo. As part of initial setup, I was asked to look around the demo room to capture it into a 3D model. This process was visualized in a neat way, by drawing the evolving triangle mesh as a semi-transparent virtual object at 1:1 scale with the real environment. The mesh included complex objects such as a potted plant, at a resolution comparable to scanning an environment with a Kinect camera and the KinectFusion software. That said, I did get the impression that the room had been scanned beforehand and was already loaded up as a 3D model, because I only got to scan the forward portion of the room (the demo was very rushed, with a pushy host), but later on, I could hang virtual objects on the side walls.

A big scanning-related question going in was whether HoloLens correctly embeds virtual objects into the real environment, meaning, whether virtual objects are occluded by real objects in front of them. There are two parts to this: static environment occlusion, and hand/arm occlusion. Surprisingly, there was no environment occlusion. I dragged a virtual object onto a part of the side wall jutting out from the main wall, and then walked into the alcove behind it. To my surprise, I could still see the backside of the object on the wall (and yes, the wall was part of what I had scanned during setup). To my dismay, I cannot say with certainty whether body occlusion works as expected. Due to the limited gesture interface (more on that later), I forgot to try and occlude virtual objects with my hands. But as I did not notice any artifacts when using gestures later, I am assuming for now that there is no body occlusion, in other words, virtual objects will appear overlaid onto the user’s hands. The very limited vertical field of view helps here: the “sweet spot” for gesture recognition is somewhat in front of the user’s chest, which usually ends up completely underneath the display area.

A somewhat related question is how deeply virtual objects interact with the environment. Specific object types such as panels or toolboxes snapped to walls, but other objects did not seem to have collision detection. Also, there was no apparent interaction between the room’s real lighting and illumination of virtual objects. In the demo I saw, all objects were rendered in a cartoony style, with very diffuse lighting. There was no visible correspondence to real-world lighting in the room (which, as said, was darkened).

Gaze, Gesture, and Voice

In the demo application I saw, object selection was completely based on head position and orientation, in other words, HoloLens does not provide, or the application did not use, eye/gaze tracking (as an aside, the demo host in charge of group introduction dismissed gaze tracking as an input method). Feedback was provided by drawing a glyph at the intersection point between the view ray and a virtual object, or at some distance along the ray in case it missed all objects. The selection ray exhibited jitter more obviously than the virtual objects themselves, which is expected, but made selecting some of the smaller objects a bit tricky.

Gestures, sadly, were very limited. The only gesture recognized by the demos is the now famous “air tap,” which is detected reliably, but the position of the user’s hand is either not tracked, or not used. In other words, HoloLens treats two hands and ten fingers as nothing but a single button. I can only hope that this was a limitation of current pre-release technology, and will be fixed soon.

On the other hand, voice recognition worked reliably. The demo application understood commands from a limited vocabulary (“movement!” “rotation!” “rescale!” “undo!” etc.), but had some problems with my accent (it didn’t understand me saying “movement”). I’m assuming that some user-specific training would have fixed that.

Comfort and Battery Life

There isn’t much to say about that. The HoloLens attaches to the user’s head via a rigid brace that is tightened via a thumbscrew in the back, and the visor hangs out in front but does not rest on the user’s nose. The device is not noticeably heavy, and was comfortable to wear during the short demo time. It stayed in place even while I was moving around, and I was able to adjust it to see the entire screen (the small screen, that is) quickly. It’s impossible to judge battery life from a 15 minute demo, but as it so happened, mine ran out of juice about halfway through the demo, and I had to start over with a second one (the world I had already created did not persist). Make of that what you will.

Conclusion

So, how does the real HoloLens compare to what was shown during the keynote presentation? Pretty well, actually, with the glaring discrepancy being field-of-view. In the presentation, the view through a HoloLens was simulated by attaching a 6-DOF tracker (hopefully the same one as used by the real HoloLens) to a camera, and then rendering virtual objects for the camera’s point-of-view, and compositing them into the camera’s video stream in real time — in other words, exactly what a standard AR application for a smartphone or tablet does. This created the impression that the presenters on stage were completely surrounded by “holograms,” which is only true from a certain point of view. Yes, they were surrounded, but they could only see the “holograms” when looking directly at them. Most blatant misrepresentation: at some point during the demo, the presenter snapped a video player window to a wall, and enlarged it to fill the entire wall, to simulate a very big screen TV. In reality, the presenter would only have been able to see a small part of the video at a time, and would have had to move his head around to see other parts. As the parts outside the field of view would have been completely invisible, it would not have been possible to watch a video like that.

But aside from the very limiting field of view, HoloLens works. It embeds virtual 3D objects into a real environment convincingly (as long as they are not occluded by real objects), and allows the user to interact with them (in a somewhat limited form at this point). If there was no sneaky trickery in the demo (such as external tracking), then Microsoft is close to a releasable product that I would use. Increasing the field of view is probably going to be very difficult, but it should be possible to address the other outstanding issues (no room occlusion, no hand interaction besides a single non-localized click gesture).

In summary, here are the main take-home points:

Very small field of view (estimated 30°x17.5°).

High resolution, good brightness and contrast, good image quality.

Purely additive display, meaning virtual objects appear translucent, especially when in front of a bright background.

Supports all main depth cues except accommodation, making it a holographic display in my book.

Virtual objects are not occluded by the real environment and/or the user’s arms or hands.

Usable gaze-based selection, but no hand tracking and only a single “click” gesture.

Reliable voice recognition.

Completely untethered, light, and comfortable to wear.

Probably short battery life.

Would I buy it? Not as it is now, but with an improved field of view and reliable hand tracking and more detected gestures, it would be a very good match for most of our applications. I really want to do this with HoloLens, but with the current prototype it’s not yet possible: