I have been working on human computer interaction for almost 30 years. I spent a few decades in the game industry as a gameplay and AI programmer, a designer, and a general instigator. I have made everything from a Sega Genesis JRPG, to an arcade game about dinosaurs, to a billion dollar online phenom.

I have spent my life trying to feel what others feel, to respect their ways and how they see things. I have tried to also show people new experiences, so that they will remain open to better and more compelling ways of doing things…whether that’s entertainment or productivity or any other activity.

The emergence of wearable XR computing technology has always turned my head. In the 90s I experimented heavily with early VR equipment, and continued to tinker as the technology evolved over time. I immediately saw how XR devices fundamentally changed the equation between person and the computer, and I have worked hard to try and find the edges of this new possibility space.

I currently work at Magic Leap, the primary reason being that I see XR as the predominant way that people will use computers in the future. I sometimes joke that “the world will soon be covered in water, and Magic Leap is one of the few places teaching people how to swim.” However, this article highlights is my thoughts on the matter, and not a declaration of Magic Leap’s intentions or official point-of-view.

Summary

Spatial computing may be a new concept to you, or maybe you’ve been working inside of AR/VR/MR/XR or some other newly defined space. For this article, let’s simplify this to XR which we will define as “all of these new spaces, unless specifically called out.” A lot of definitions get tossed around, and a lot of different philosophies exist XRabout what it is, or isn’t, or should be. In this article, I hope to push the definition a bit, in order to frame what I feel is the most important aspect of this new generation of computing.

A spatial computer is just like any other computer, in that it provides computational resources, as well as the ability to interface with a person, which means inputs and display technology. Where it differs from other computers, like your laptop for example, is that instead of using a keyboard and a touchpad as its only interface, 0 it has sensors that can perceive the user, and the world around it. Also, dependent on the device, it can display pixels in a more integrated 3d manner instead of a screen.

Too many people that are working in the industry think of “spatial computing” as JUST pretty pictures: a visual lightfield. This misses a lot of the point. I would say that a quality lightfield is third on the list of most important things that spatial computing brings.

In my opinion, the top three most important things a wearable wearable device will deliver are:

Understanding the humanity of the wearer at a deep level for the first time in computing. Providing computing resources, while allowing users to act like people, as opposed to computer users. Not human-centric. Human FIRST. Allow for visuals and other feedback at the right place and time.

Said another way:

Intent driven, goal-based user model. Human first computing interface. Spatial lightfield and soundfield.

Being wearable allows us to work on #1. A change in philosophy around computing, made possible by #1, then gets us #2. It also gives us a bunch of new super powers for work, play, and everything in between. The hardware to get us #3 is thus vital, but is more of a service to #1 and #2.

A fundamental change in form factor

When you take a powerful computer and mount it directly to a person, you make it wearable. That’s a huge gift to computing. Because that means the sensors are now a direct proxy for the person, since they now share the natural movements and environmental stance as the wearer. We can start the long process of forging the massive streams of data necessary to understand the complexity and nuance of what it means to be a human into a cohesive behavioral and mental model of humanity that future computers can use to interface with people, instead of the other way around.

It’s a shift towards human first focus. Here’s an analogy that makes this distinction more clear:

● If you have a small heater that you can carry, you have mobile heating. But the same product features that make it small and portable mean that it has a narrow Field of Heating, and you have to use your hands to carry it around, or a part of your table to support it. A mobile heater is built with the “function of heating” as its focus and the secondary focus of portability.

● However, if you have a wearable heater, it’s called a coat. It’s built directly for humans and let’s you do all the things you want, just warmer. The fact that humans need to be able to be humans is the first goal of this device. It’s Human First.

Just know that this is where computing is heading, not just for XR. The notion of a human first input model of computing doesn’t require it. If you had some way of using this type of integrated input solution on your current device, it would add massive value.

So far, computers have been Chicken and Egging their way upwards in input complexity as the years go by. We went from wired connections, to knobs and switches, to the keyboard, and then the mouse, and finally touch surfaces. Each one took us a tiny step forward in reclaiming more natural ways of using our hands…but always to explain input to a device rooted in the notion that the end product was displayed on a 2D screen. Hence our UX and our inputs have slowly optimized around capturing the small part of human behavior that governs how we think and communicate 2D content. We never really needed all the extra sensing and compute necessary to capture more 3D human behavior because our computers were natively 2D, and as such most of the tasks we used them for and the ways that we communicated to our computers were also 2D.

Spatial computers change all that. Wearable spatial devices require native 3D sensing and deep human integration in order to function at all (because of headpose, the fundamental spatial computer feature…make sure you check out my other posts on headpose and all the ways it can be used), and as such it’s the first time that we’ve had all the right precursors all in the same place. So, we not only get a computer that speaks 3D natively, we suddenly get a computer that can speak Human natively.

Don’t get me wrong, the display makes spatial computing way better. It was a necessary tool to have first, because it let us see and work in a natively 3D environment. It also allowed us to more naturally see content that has always been 3D but has up until now been imprisoned on a 2D screen.

The manner of interacting with that content was still rooted in the past, however. Many people immediately filled the room with buttons, lists, big flat grids of objects and all manner of UIs (User Interfaces) carried over from our screen based paradigms. It very quickly became apparent that it could be a more human friendly computing environment, since the content was now living in the same 3D space that the user was, and also the huge amount of human context data that the device can capture while you work. Over the last few years a lot of progress has happened.

That being said, there is still a lot of work to do:

● Feedback and actual motion profile on the users inputs to have as close to zero latency as possible. Right now when there’s latency on headpose, it will make you nauseous…and hence why XR companies have worked hard to eliminate this problem. In contrast, latency on hand motion might not make you sick, but it very similarly affects the usability and overall enjoyment of the system. People will not want to continue if every interaction is muddy and not real. Thus, hand tracking needs the same level of careful thought and optimization in order for us to get past the toy nature of it, so that we can count on it to not 2 make our hands “ill.” We also need to pave the way for other inputs, like voice, eyes, and everything else we can easily sense in our wearable form factor.

● A human first model that fully encompasses how humans do work. People are not logical, or rational, or physically modeled, or hierarchical, or any other classical computer organizational method that we have attributed to them. For the most part, none of those are really how people do things. We tend to organize around singular tasks. We collect the info we need, we work socially with others to try and break down those tasks. We tend towards lots of sideways leaps of thought. We have messy brains, and it tends to make our thought processes themselves messy, and non euclidian, and that’s ok. But our notion of how a computer supports us needs to change to take that into account.

● Crossover interactions and practices for use with the general XR device ecosystem. We need to develop this new paradigm such that similar classes of interaction can happen no matter what device you are currently computing with. Content creators can then build a single app that can interface with users in as much a human first way as possible, and not have special one off builds for various devices.

Human First API

Make no mistake, we are working on the API for humans, so that we can give it to computers and say “this is your new interface.” No longer are people going to have to learn programs and hardware. Computers are going to come to us. Before now, devices had no ability to do so at the fidelity level we required. The computer mouse, while wondrously low latency and human centric (in that it is a 1-to-1 hand mimic), still abstracts the hand into a 2D coordinate and a few digital buttons. In contrast, the goal should be to build for humans first, and computers second. We can sacrifice some of the computing power to do the heavy lifting of understanding the situation, instead of making the user do all that work themselves. This is fundamentally different than how things are typically built now.

I work in XR and spatial computing because I want to help change how humans use computers. I want people to be able to use computers without sacrificing so much of their humanity. I’ve always been somewhat weirded out by the fact that with most of their tools, humans work hard bending the technology to their will. We shape it to fit our hands and our weird human ways until it becomes an extension of us…and yet with computers, I feel that we’ve taken a different path.

Computers are not shaped to fit anything about us. They’re optimized around being computers. Recently they’ve at least been optimized around being mobile, so that we can carry them around more easily. But they’re still mostly really good computers. Not really good humanistic devices, as anybody looking on a group of people in a public space can clearly see: instead of people fully in the world empowered by advanced technology, you have a bunch of humans bent over their tiny mobile computers, sucked into the screen and barely in the world at all. But we can change that. We are changing that.

This is not just a chunk of AI, by the way. While it will probably be called “the new pinch and zoom” it is a lot more than just some new interaction mechanics. A full Human First API includes a number of different things:

● A full Intent model for the user. We don’t want to know that you pushed a button. That’s a reactive model. We want to know that you intend to push a button next. We want predictive power so we can support the effort in the best way and don’t have to wait until it’s done.

● A full Attentional model for the user. The focus of attention that humans employ at any moment is complex, very contextual, and task specific. This isn’t about a smart cursor. It’s about again knowing what the user is attempting and keeping track of all the important elements directly, so that the system is considering things in the same way as the person. You’re reading email, and you’re on step 4 of the process? Ok, then you most likely need this set of tools. Step 5? A different set.

● New UI elements that take all of this into account, as well as respect the physical world if need be, are necessary. We need to marry the strengths of digital stuff with the strengths of physical stuff, and right now UI is built around screens and pixels. It’s also not built around using physical organizational systems that we all know well, and that we would intrinsically know how to use if they were presented in the digital world. We have had some of this happen…like the date picker wheels from iOS. They look like lock tumblers, and as such we easily know how to swipe them and make what we need happen. But they’re not real…any given column on the tumbler could have radically different numbers of values than any other. Our brain can use the physical affordance we know, and not worry about the digital powerup of the same affordance as it ignores some of the limits of the physical.

New low level Superpowers

So what does all this tech buy us as people? It effectively multiplies many of our natural efforts and minimizes many of our natural deficiencies. Let’s consider just some of the biggest wins: better feedback, frictionless compute, and higher quality collaboration.

Realtime Intelligent Feedback

Many of the current use cases being explored in XR involve more directly interacting with datasets that are not completely intuitive on a screen. Native 3D content at scale, and using a lot more axes of control than a mouse can offer comfortably are both very common. Before now we had very few ways of visualizing these datasets in real time.

Complex datasets require lots of control in order to manipulate fully. Most computing systems haven’t had enough easily controlled inputs that an average person could control the fields they need to in order to consume the kinds of complex systemic data we are talking about. But by directly using the high degree of freedom, natural physical affordances of your body and senses, we can give people the control they need without requiring them to learn some abstracted scheme. In games, we found that a casual user could control around 1–3 axes of control at 4 any given time comfortably. A hardcore player could go upwards of 7–9 simulaneous axes of control. But, each of your hands is actually a 26 degrees-of-freedom (DOF) object, you can easily control a 6 DOF pose with barely a thought. When you don’t need to learn an abstraction, suddenly you can control a lot more by piggybacking a human system.

Wearables will allow the perceptual gap between action and feedback to be made smaller. When you do something, you can get feedback that happens IMMEDIATELY and in the specific location of the activity. Anyone who’s ever tried to train their pet will tell you…if you can give them praise the instant they do the good behavior, they make the connection faster, and we operate the exact same way. This will vastly improve our ability to learn, make proper associations in the mind, and determine proper cause and effect. This should improve education for a huge number of subjects and/or systems that suffer from loose or slow feedback loops.

Lastly, there’s a percentage of people that suffer from aphantasia…a condition where one does not possess a functioning mind’s eye and cannot voluntarily visualize imagery. Current research shows that approximately 1–3% of people suffer from this disorder. Ed Catmull wrote about its high occurrence in the Pixar staff. It may be why they worked so hard to make the tools necessary to visualize things on a computer. Having a wearable device can give these people (and those of us working directly with them) the tools to allow them to see hazily defined concepts more realistically and quickly. Remember that while that number seems low, that’s the percentage that is unable to visualize imagery. There’s a full range of this ability, and a MUCH larger percentage of people are just “low to medium visualization skill.” Leveling the playing field will make it easier for all sorts of folks along the spectrum of this skillset to suddenly communicate better.

Frictionless Imbued Compute

Like most people, I typically have a laptop and a smartphone nearby at all times. . What this means is that while I’m working, I can multiply my efforts by looking up facts/figures, and use my calculator or calendar or any other app I need. I have computing-level needs, and so I carry around computing resources. Things will be a heck of a lot better once I have computing available “on tap” at all times, when I’m wearing the computer directly.

Instead of stopping what I’m doing, pulling out my phone, unlocking it and searching for something…what if all those steps were removed or minimized?

I might just look at it something and push a button, or directly inquire with voice. I might set up certain classes of things to get auto computed/queried (people’s names in the area, or anything around me written in spanish, for example). I could look over to the right and see a growing word cloud of the keywords in my email inbox and I choose one directly as a search filter. And on and on.

Compute tasks of all kinds can be streamlined and made frictionless with a wearable computer. It will actually be the first big reason why people keep the device on for long periods.

Collaboration

Once you have advanced inputs, feedback, visualization, and have immediate access to heavy duty compute…then suddenly it becomes a heck of a lot easier to have complex conversations with other human beings.

Right now, with a whiteboard and a smartphone, communication can happen (although slower or at much lower fidelity if I’m not an artist), but in the future I can quickly pull up high quality imagery, spawn a complex simulation because it will help me make my point, mass communicate what I’m currently looking at without letting go of what I’m holding, and ask for immediate feedback, and on and on. Wearable computers will make collaboration not just easier, but infinitely more powerful.

Conclusion

I want more people to see what I see. A world where computers do the heavy lifting of interfacing with humans. Where people can effortlessly use computing when it makes sense, and not be taken out of the real world in doing so. Where the compting doesn’t remove us from the conversation, but rather makes the conversation better and more powerful.

Part of the reason tech currently feels so isolating is because task switching between reality and computing is currently expensive, both in time and attention. I have to stop talking to you, pull out my phone, search for the fact, adjust the search and keep trying until I find it, and then re-engage with you. People have begun to optimize their productivity and communication by just staying in their screens. If people can use those tools friction free while they rub elbows directly with coworkers, fellow students, family…it will help stem the tide of the current tech immersion that keeps people isolated from each other.

What we are primarily doing in spatial computing is bringing HUMANS to the forefront.

That’s magical.