Augmented reality (AR) has played prominently in nearly all of Apple's events since iOS 11 was introduced, Tim Cook has said he believes it will be as revolutionary as the smartphone itself, and AR was Apple’s biggest focus in sessions with developers at WWDC this year.

But why? Most users don’t think the killer app for AR has arrived yet—unless you count Pokémon Go. The use cases so far are cool, but they’re not necessary and they’re arguably a lot less cool on an iPhone or iPad screen than they would be if you had glasses or contacts that did the same things.

From this year's WWDC keynote to Apple’s various developer sessions hosted at the San Jose Convention Center and posted online for everyone to view, though, it's clear that Apple is investing heavily in augmented reality for the future.

We’re going to comb through what Apple has said about AR and ARKit this week, go over exactly what the toolkit does and how it works, and speculate about the company’s strategy—why Apple seems to care so much about AR, and why it thinks it’s going to get there first in a coming gold rush.

What ARKit is and how it works

Let’s start with exactly what ARKit is and does. We are going to thoroughly review the high-level features and purposes of the toolkit. If you want even more detail, Apple has made talks and documentation on the subject available on its developer portal.

The simplest, shortest explanation of ARKit is that it does a lot of the heavy lifting for app developers in terms of working with the iOS device’s camera, scanning images and objects in the environment, and positioning 3D models in real space and making them fit in.

Or as Apple puts it:

ARKit combines device motion tracking, camera scene capture, advanced scene processing, and display conveniences to simplify the task of building an AR experience. You can use these technologies to create many kinds of AR experiences using either the back camera or front camera of an iOS device.

Apple initially launched ARKit with iOS 11 in 2017. App developers could use Xcode, Apple’s software-development environment on Macs, to build apps with it. ARKit primarily does three essential things behind the scenes in AR apps: tracking, scene understanding, and rendering.

Tracking keeps tabs on a device’s position and orientation in the physical world, and it can track objects like posters and faces—though some of those trackable items were not supported in the initial iOS 11 release.

Scene understanding essentially scans the environment and provides information about it to the developer, the app, or the user. In the first release, that meant horizontal planes and a few other things.

Rendering means that ARKit handles most of the work for placing 3D objects contextually in the scene captured by the device’s camera, like putting a virtual table in the middle of the user’s dining room while they’re using a furniture shopping app.

ARKit does this by tracking the environment in some specific ways. Let’s review what the initial release supported on that front.

Orientation tracking

In the orientation tracking configuration, ARKit uses the device’s internal sensors to track rotation in three degrees of freedom, but it’s like turning your head without walking anywhere—changes in physical position aren’t tracked here, just orientation in a spherical virtual environment with the device at the origin. Orientation tracking is an especially useful approach for augmenting far off objects and places outside the device’s immediate vicinity.

World tracking

There’s more to world tracking. It tracks the device’s camera viewing orientation and any changes in the device’s physical location. So unlike orientation tracking, it understands if the device has moved two feet to the right. It also does this without any prior information about the environment.

Further, ARKit uses a process called visual inertial odometry, which involves identifying key physical features in the environment around the device. Those features are recorded from multiple angles as the device is moved and reoriented in physical space (moving is required; rotation doesn’t provide enough information). The images captured in this process are used together to understand depth; it’s similar to how humans perceive depth from two eyes.

This generates what Apple calls a world map, which can be used to position and orient objects, apply lighting and shadows to them, and much more. The more a user moves and reorients, the more information is tracked, and the more accurate and realistic the augmentations can become. When ARKit builds the world map, it matches it to a virtual coordinate space in which objects can be placed.

The device needs uninterrupted sensor data, and this process works best in well-lit environments that are textured and that contain very distinct features; pointing the camera at a blank wall won’t help much. Too much movement in the scene can also trip the process up.

ARKit tracks world map quality under the hood, and it indicates one of three states that developers are advised to report in turn to users in some way:

Not available : The world map is not yet built.

: The world map is not yet built. Limited : Some factor has prevented an adequate world map from being built, so functionality and accuracy may be limited.

: Some factor has prevented an adequate world map from being built, so functionality and accuracy may be limited. Normal: The world map is robust enough that good augmentation can be expected.

Plane detection

Plane detection uses the world map to detect surfaces on which augmented reality objects can be placed. When ARKit launched with iOS 11, only horizontal planes were detected and usable, and variations like bumps and curves could easily disturb efforts to accurately place 3D objects in the view.

Using these three tracking techniques, developers can tap ARKit to easily place 3D objects they’ve modeled on a plane in the user’s view of the camera image on the device’s screen.

Features added in iOS 11.3

Apple released ARKit 1.5 with iOS 11.3 earlier this year. The update made general improvements to the accuracy and quality of experiences that could be built on with ARKit without significant added developer effort. It also increased the resolution of the user’s camera-based view on their screen during AR experiences.

Vertical planes

Image recognition

The initial version of ARKit could only detect, track, and place objects on flat horizontal surfaces, so ARKit 1.5 added the ability to do the same with vertical surfaces and (to some extent) irregular surfaces that aren’t completely flat. Developers could place objects on the wall, not just the floor, and to a point, literal bumps in the road were no longer figurative bumps in the road.

ARKit 1.5 added basic 2D image tracking, meaning that ARKit apps could recognize something like a page in a book, a movie poster, or a painting on the wall. Developers could easily make their applications introduce objects to the environment once the device recognized those 2D images. For example, a life-sized Iron Man suit could be placed in the environment when the user points the device’s camera at an Avengers movie poster.

What Apple will add in iOS 12

That brings us to WWDC on June 4, 2018, where Apple announced iOS 12 and some major enhancements and additions to ARKit that make the platform capable of a wider range of more realistic applications and experiences.

The changes allow for virtual objects that fit into the environment more convincingly, multi-user AR experiences, and objects that remain in the same location in the environment across multiple sessions.

Saving and loading maps

Previously, AR world maps were not saved across multiple sessions, and they were not transferable between devices. That meant that if an object was placed in a scene at a particular location, a user could not revisit that location and find that the app remembered it. It also meant that AR experiences were always solo ones in most ways that mattered.

In iOS 11.3, Apple introduced relocalization, which let users restore a state after an interruption, like if the app was suspended. This is a significant expansion of that. Once a world map is acquired in iOS 12, the user can relocalize to it in a later session, or the world map can be shared to another user or device using the MultipeerConnectivity framework. Sharing can happen via AirDrop, Bluetooth, Wi-Fi, or a number of other methods.

ARKit understands that the device is in the same scene as it was in another session, or the same one as another device was, and it can determine its position in that previous world map.

Apple demonstrated this by building an AR game for developers to study and emulate called Swiftshot, which had multiple users interacting with the same 3D objects on multiple devices at once.

But multi-user gaming is not the only possible use case. Among other things, saving and loading maps could allow app developers to create persistent objects in a certain location, like a virtual statue in a town square, that all users on iOS devices would see in the same place whenever they visited. Users could even add their own objects to the world for other users to find.

There are still some limitations, though. Returning to a scene that has changed significantly in the real world since the last visit can obviously cause relocalization to fail, but even changed lighting conditions (like day vs. night) could cause a failure, too. This is a notable new feature in ARKit, but some work still needs to be done to fully realize its potential.