Where does AI end and AR begin?

Because AI can be used above and below AR scenes, it can be difficult to know which tools provide which functionality. When building a mobile app, you’ll be switching back and forth between various APIs to build the experience you want.

Let’s run through a few of the most popular developer tools and when to apply each:

ARKit and ARCore

ARKit and ARCore are the canonical augmented reality SDKs on iOS and Android, respectively. Though they differ slightly in their APIs, they perform the same basic functions. They combine data from a device’s sensors to build the 3D world, track movement, render digital objects, and mediate interactions between digital and physical content. You’ll use them primarily to place and manipulate objects within scenes. Though they may make use of AI, those models are typically abstracted away from users, who are given access to high-level outputs (e.g. occlusion masks for people).

Core ML and TensorFlow Lite

Core ML and TensorFlow Lite are the on-device AI frameworks for mobile devices. They’re used to execute models independent of augmented reality. These APIs provide low-level control of input and output data to models and allow developers to insert their own custom models, which are trained to perform specific tasks relevant to their applications.

The most common way for developers to combine AR and AI models is to take images or audio from a scene, run that data through a model, and use the model output to trigger effects within the scene. Here are a few examples:

Image or scene labeling: A camera frame is run through an AI model that classifies an image. The classification triggers an AR label for that location.

Object detection: A camera frame is passed to an AI model that estimates the position and extent of objects within a scene. Location information is then used to form hit boxes and colliders that facilitate interactions between physical and digital objects.

Semantic segmentation and occlusion: While ARKit may provide generic people occlusion capabilities, a custom AI model can be used to segment and occlude cars or other objects.

Pose estimation: An AI model infers the position of objects like hands and fingers, which are used to control AR content.

Text recognition and translation: An AI model detects, reads, and translates text in an image. Augmented reality APIs are then used to overlay translated text back into the 3D world.

Audio recognition: AI models listen to specific words that trigger AR effects. For example a user says the word “Queen” and a virtual crown appears on their head.

Conclusion

Augmented reality and artificial intelligence are separate but complementary technologies. Smaller, faster, and more accurate AI models will be the engines of AR functionality, given their ability to track and understand the 3D world. They’ll also continue to enhance AR experiences, adding effects and interactivity to AR scenes.