Context — Tags

Part of the power of AR is the ability to discover tasks and information naturally — by encountering them in the spaces we live in. Right now this capability is locked behind the limitations of mobile OS.

Mobile AR’s biggest challenge is the friction of needing to download and open a single-task app in the right context. Users may be willing to do it once for a brief branded experience, but as part of day-to-day life it simply isn’t scalable.

Discovery is also a huge issue for task-oriented AR apps in both mobile and an AR-OS. Users need to be prompted in the real world (‘Hey — download our app and scan this!’), or remember seeing some marketing to even be aware an AR experience exists at a location or on an object.

Tags place themselves in the environment automatically, without requiring the user to recognise they would be useful in a particular spot and launch them. Tags could include things like name labels on plants, calorie counts on meals, or historical points of interest.

A video game example of a tag highlighting possible interactions or information (Deus Ex: Human Revolution)

More esoteric tag examples could include:

Removing all advertising from the world around the user

Building maintenance tags that show the age and last certification date of all installed equipment.

Social tags that allow others to create AR fashion & cosmetics which the user then sees on their body

Tags that give overall color grading & aesthetic effects to the world

‘Order more’ tag when seeing a near-empty consumable item

One of my previous startups produced AR cosmetics that let you create an augmented digital self. (Metaverse Makeovers)

On top of user-installed tags, and tags enabled in the AR-OS by default, location-based tags that automatically show content from a location’s owner will be available. Imagine augmented exhibits in museums, or way-finding in a hospital that directs you to your specific appointment.

Context-based augmentation — entering a space and having additional information, interactions and aesthetics layered on top — is mostly uncharted territory in terms of architecture and UX, as it is only really possible via AR (though you could argue precursors include context-based tech like beacons on mobile devices).

Some of the possibilities (including the role of the web as a tag in AR, and ownership of spaces) are covered in the next article in this series.

Managing Resources — Attention & Space

The limitations of phone screens forced a step back from windowed multi-tasking UIs, to single-task. Multitasking on mobile is possible but still awkward to this day — a concession made due to the limitations of the hardware.

Despite having the ability to present information anywhere, in any form, an AR-OS will still be tightly constrained. This time however, the concessions are mainly to the user rather than the hardware — A user has perceptual/mental limitations such as attention and spatial comprehension that can easily be overwhelmed, causing frustration and discomfort.

An AR-OS will need to manage the resources of the user as much as the resources of the device itself, which will require some very careful, empathetic UX design. Attention will need to be treated as a limited resource to be managed, and balanced against the needs of the real world. Physical space will also need to be managed carefully, to ensure the AR-OS is always easily comprehensible.

Attention

With current devices, an OS can assume that if the user is looking at the display, they are devoting most of their attention to the task being presented. If the user wishes to shift their attention, it’s as simple as looking away.

In AR, the ‘display’ consists of everything the user can see, so the same assumptions cannot be made. If you fill a users space with whirling information while they try to do another non-AR task, they will become distracted and frustrated.

Users have a finite amount of attention they can pay to the world, so an AR-OS must manage attention as a resource. Its goal is to present only the information the user needs in the world, with as little mental burden as possible.

So, how do we manage the users attention in a way that minimises distraction? Some rules can be easily intuited:

Intensity

Fast or constant motion will be distracting, especially if they are towards the user or in the periphery of vision.

will be distracting, especially if they are towards the user or in the periphery of vision. Objects within an arms-reach distance have much greater physical presence (as anyone who has tried VR can confirm)

have much greater physical presence (as anyone who has tried VR can confirm) Bright or saturated colors will naturally draw more attention

will naturally draw more attention Objects that are large in scale will take up more attention

Complexity

Objects overlapping in the same spatial volume will need more mental effort to understand

will need more mental effort to understand Complex shapes will require more attention to understand, such as highly detailed icons or interfaces.

Again, since we are dealing with a user’s visual cortex rather than their display, there is more complexity than traditional UI design. As just one example, designers will need to avoid startling their users:

Sudden, small-scale motion in peripheral vision nearby will startle the user, as we have evolved to reflexively think there is a creature that needs our attention to judge if its dangerous or not.

Similarly, any sudden large-scale motions may also startle users

These concepts are intuitive to reason about, but they can demand different ways of thinking from 2D UI design, encompassing ideas from architecture, way-finding, product-design, psychology and neurology.

Attention and Tags

Attention is simple to manage when the user is focused on a single activity — because the user has launched a task, we can assume they are intending to give it attention, which the creator of the task-oriented software can then design for. For example, a 3D editing tool would expect to take up most of the users focus and visual field, but an IKEA build guide might reasonably expect the furniture it refers to to be present and need to be visible.

However, a lot of the usefulness of an AR-OS comes from its ability to offer a range of information and actions based on the world around the user. In order to avoid distraction & annoyance this ambient UI’s goal should be to reach the lowest-attention state possible while still delivering the desired information.

It then follows that the ideal ‘low-energy’ state of an AR-OS is to look as close to plain reality as possible. It’s worth nothing that this is directly opposed to our movie-influenced vision of how future AR UIs look — filling the space in front of a person, or with densely-designed panels pinned all around a user.

This also suggests information presented by tags should be temporary but retrievable later- for example upon recognising a dish, a calorie-count tag may only have access to a limited space around the dish for a set period of time to show the count. The tag may also be limited to food the user is directly interacting with. As a result of these restrictions, this tag would not be able to permanently put counts above every object in a supermarket at once. This prevents both spammy and poorly-designed tags.

A notification system will be needed to manage these events and subtly prompt the user that something is available within a particular space even after the initial notification has passed. This notification system will be expanded on later in the series.

Assumed attention burden for the UI modes discussed

Space

The OS will be aware of the available space around the user, which will vary greatly depending on their location (e.g. a phone booth vs a park).

Placing virtual objects ‘inside’ real objects or walls may be visually uncomfortable, especially if the real object is heavily patterned, as the disparity in depths creates a ‘cross-eyed’ sensation. Avoiding this means that the space that tasks and tags can occupy are limited, and vary between environments.

The OS will be expected to choose the best place to locate information and activities to minimise discomfort and distraction. Since so much of AR is contextual, many use-cases rely on information being presented next to or on the object being referred to, so activities will need to be able to request specific volumes of space.

As mentioned, overlapping experiences need to be avoided, so the OS will need ways of managing requests for access to the same volume, or presenting the user with a notification that information/an activity is available.

Details like activity type, total volume requested, anchor-point requested, task priority and current context will all need to be considered by the OS when assigning space.

This UI needs to allow many experiences in one location without taking up the user’s attention. Some potential solutions will be discussed in the next article.

Trust

The user understandably wants discretion over their privacy and how much data third-parties have access to. Trust will be a huge factor in how an AR-OS is structured, given the presence of cameras, microphones and other sensors in a device that may be worn continually throughout the day.

This issue of trust extends to others around the user as well. Although most of us already carry microphones & cameras on us (and use them continually), the backlash to Google Glass shows that HMDs inhabit a different mental category to smartphones (partly due to their intrusive & outlandish appearance). An AR-OS has an uphill battle with already-negative public perception, and needs to be far more restrictive with sensor access than smartphones in order to win any trust at all.

It’s also worth noting trust is also being placed in the system due to its ability to alter visual perception. Malicious (or simply ill-designed) AR software can obscure dangers in the environment, or cause dangerous sudden visual distractions.

For full-time use, the user needs to have a level of trust in the OS close to airline autopilot or banking software. As previously mentioned, most of us have been carrying cameras and microphones in our pockets for a decade, and staring at them most of the day without much security beyond the occasional sticker on a laptop webcam. The head-mounted nature of AR glasses is much more viscerally personal, and any intrusion would be immensely disturbing. Someone with access to those sensors is hijacking our entire life experience, hitching a ride in our head.

Seemed unpleasant

Permissions/Access

The default for any third-party code on the device is that it does not have access to raw sensor data, including video from the cameras, depth/environment data, and microphone. This is essential to preserve the privacy of the user.

An environment mesh built from the users’ surroundings can be accessed by apps at a limited rate, and within a particular volume assigned by the OS’s spatial management. This prevents malicious code from building high-resolution maps of a user’s environment, while still allowing placement of objects on surfaces or other basic environment interaction.

The most common use of the environment mesh is for occlusion — blocking 3D rendered objects when they are behind something in the real world (like under a table). This is handled securely at the OS level by the renderer/compositor, allowing for high resolution visual interaction with the full environment, without exposing the data itself.

Third-party apps have access to low-security data, but not high-security sensor or render data

By default, tags have access to a narrow range of notification/display methods standardised by the OS — A current example would be how push notifications work on mobile; they have a fixed UI design and their behaviour is limited to whats allowed by the OS.

Tags can request additional notification privileges in ambient-use mode to allow for more dynamic behaviour. This allows tags to place content in volumes immediately adjacent to objects of interest, or location tags to display automatically.

A travel tag with default privileges would pop up an OS-provided text box saying ‘Flight Boarding at Gate 11’, whereas a tag with more privileges could show a glowing pathway to the gate.

Apps in task-oriented mode can also request extra privacy permission, allowing access to raw video, depth and microphone data instead of only the default position-tracking data.

Apps can also request access to full environment meshes and social permissions, which would allow features like pinning an object in your home and having a family member see it in the same place.

Above-all the user must be aware an app has direct access to sensor data for the entire duration of its access. A visual indicator will probably be required at the OS level — think the now-universal ‘red dot = recording’.

A Note on Machine Learning & Security

Machine learning is a powerful tool for the AR-OS, but also presents its own challenge due to needing full access to sensor data and having a huge range of possible uses. Common ones include recognising and labelling objects, performing visual searches, and anticipating the user’s needs/actions.