Enhancing AR with machine learning

Layering ML on top of AR apps extends their usefulness.

As Augmented Reality (AR) technologies improve, we are starting to see use cases that stretch beyond marketing and simple visualizations. These include product visualization, remote assistance, enhanced learning, quality control and maintenance.

Apple’s Measure is one of my favorite AR apps. It’s a simple and reliable way of taking physical measurements with your smartphone and it demonstrates how robust AR tracking has become.

Apple’s Measure in action. The results are accurate within ±1 cm!

At its core, AR is driven by advanced computer vision algorithms. But we can do more. By layering machine learning systems on top of the core AR tech, the range of possible use cases can be expanded greatly. In the following sections, I will show you how.

AR is computer vision.

AR technology often uses SLAM (simultaneous localization and mapping): a computer vision algorithm that compares visual features between camera frames in order to map and track the environment. In combination with sensor data from the smartphone gyroscope and accelerometer, it is possible to achieve very reliable tracking.

Companies like Apple, Google and 6D.ai are restlessly working to improve these algorithms. That means that we, as developers, will be able to

continuously build better and more reliable applications using AR.

6D.ai mapping the world in real time.

You can think of AR apps as a set of technological layers. At the core, AR frameworks like ARKit and ARCore implement the computer vision algorithms to do tracking and mapping. If you use Unity3D, AR Foundation provides a unified API spanning across several AR frameworks. At the edge, there is the code and assets used by your specific application.

A hierarchy of AR technologies, going from the core to the edge.

Computer vision at the edge.

The frameworks at the core of the AR technology stack often provide neat extra features like image tracking, pose estimation and, recently with ARKit 3, people occlusion. Many of these features are powered by machine learning.

But there is potential for more!

In fact, it is fitting to enrich AR applications with machine learning because:

The camera is always open and collecting image data. Objects tracked in AR can reliably be located in the camera image. AR visualization is a great way to display information to users.

Wouldn’t it be nice if this app could learn the difference between rugs, tables, lamps and windows?

Whenever an AR application is dealing with a situation where useful information could be gathered from the environment, a layer of computer vision, driven by machine learning, could be added.

You might want to:

📖 Read text on signs.

👀 Detect what kind of object(s) the user is looking at.

👫 Determine which clothes a person is wearing.

❌ Detect if an object has visual flaws or anomalies.

Auto-filling the inspection report for a damaged sticker in Kanda’s ARC.

At Kanda, we’re making an application for quality control and maintenance with AR (codename ARC). Using a custom-built machine learning layer on top of the AR framework, the app can detect visual flaws for relevant assets to provide a sort of “auto-fill” during the inspection process. This is to make life easier for the frontline workers.

The application exploits the fact that the asset being inspected is already tracked in AR. This provides the positional information needed to crop the object nicely from the camera frame. We then use custom-built neural networks to detect visual flaws in the cropped image.

LEGO Hidden Side using object detection to locate the kit.

This year, LEGO released their Hidden Side AR theme. The accompanying AR app makes use of object detection (with Vuforia) to locate the LEGO kit in the camera frame and render a 3D world on top of it. I think that is pretty neat!

AR demo from Wikitude and Anyline to read & submit utility meter data.

Anyline provides an SDK to do text recognition which can be used in AR. In the demo above, AR is being used to easily locate and select a utility box. Then, its energy consumption meter is automatically read using OCR (optical character recognition).

This is significantly faster (and more accurate) than manual data entry which is why Anyline’s technology is already being used by Edison, Italy’s main energy provider.

Solving technical challenges.

One of the main challenges when enhancing AR apps with machine learning is the requirement for low latency. Most often, a custom machine learning layer needs to make information available in real-time, so the model must be run on the device’s hardware rather than in the cloud.

Sometimes, your problem can be solved with existing software products like the ones offered by Vuforia and Anyline. When you need something customized you’ll have to figure out how to efficiently run machine learning models on your desired application platform.

Kanda’s ARC being developed in Unity3D.

Kanda’s go-to development platform for AR apps is Unity3D. It’s a game engine that allows for efficient 3D programming and computer graphics. After having integrated several edge machine learning systems with this platform though, I am comfortable telling you that Unity3D wasn’t made for machine learning inference.

However, a new project from Unity (codename Barracuda), is on the path to becoming an efficient (and pretty easy-to-use) machine learning inference engine for Unity3D!

Enter The Matrix - Compute Shaders make matrix multiplications real fast!

Barracuda uses Compute Shaders to run models efficiently on device hardware. These are programs that run on the graphics card but (unlike regular shaders) they can be used for non-graphics tasks.

To sweeten the deal, Compute Shaders can utilize the graphics hardware on many platforms: MacOS, iOS, Android, Windows, PS4, Xbox. This means that machine learning inference with Unity3D can be fast AND cross-platform 🤩

In conclusion.

Because of the technological progress made in the last couple of years, AR apps are exceeding expectations in terms of what they can do.

Since the camera image is always available, these apps also provide a unique opportunity for using visual data to enrich the experience: Enabling the app to read text, find objects or detect anomalies, for example. These features can be powered by a custom machine learning layer built on top of the core tech.