Thanks to the Kinect, Microsoft is well known for its expertise in developing motion-sensing systems. While the technology has improved over the years, tracking human hands still has a long way to go. Enter Microsoft Research’s Handpose.

In short, Handpose is a real-time articulated hand-tracker. The system can accurately reconstruct complex hand poses using only a single depth camera (such as the Xbox One’s Kinect). As you might expect, Microsoft Research’s goal is to enable new human-computer interactions.

Notice that Handpose works with a variety of subjects and is capable of continually recovering from tracking failures. Microsoft emphasizes that the Handpose tracker is flexible in terms of camera placement and operating range, meaning this technology has a lot promise for real-world applications.

Tracking a hand, which is smaller and can make highly complex and subtle movements, is much more difficult than recognizing a whole body’s movements. Not only are fingers and wrists smaller than the larger parts of a full human body, but they’re also quite flexible. Fingers can thus be difficult to differentiate from each other and their surroundings. Furthermore, they can also easily be hidden from the camera’s view.

That’s why machine learning, which works great for tracking the whole body, wasn’t enough. Microsoft researchers had to incorporate 3D hand modeling as well to achieve the level of quality you see in the video above.

Let your imagination do the rest: new forms of video games, sign language translation, directing robots and drones, or just more accurately manipulating objects on a screen. If computers begin to understand these more nuanced hand motions, it could also become easier for humans to teach robots how to do perform certain tasks.

The abstract for the project is as follows:

We present a new real-time hand tracking system based on a single depth camera. The system can accurately reconstruct complex hand poses across a variety of subjects. It also allows for robust tracking, rapidly recovering from any temporary failures. Most uniquely, our tracker is highly flexible, dramatically improving upon previous approaches which have focused on front-facing close-range scenarios. This flexibility opens up new possibilities for human-computer interaction with examples including tracking at distances from tens of centimeters through to several meters (for controlling the TV at a distance), supporting tracking using a moving depth camera (for mobile scenarios), and arbitrary camera placements (for VR headsets). These features are achieved through a new pipeline that combines a multi-layered discriminative reinitialization strategy for per-frame pose estimation, followed by a generative model-fitting stage. We provide extensive technical details and a detailed qualitative and quantitative analysis.

You can read the full 10 page paper here: Accurate, Robust, and Flexible Real-time Hand Tracking (PDF).