Why Apple’s glasses won’t include ARKit

It could be 2021 before they do

About the only thing more exciting than ARKit is the thought of Apple Glasses that run ARKit. There are rumors and rumors-of-rumors all claiming “when not if” they are coming. I’m going to attempt to answer the “when” by looking in depth at the “what” technology is needed. I know glasses are being worked on at Apple, and the prototypes are state of the art. I also know what it takes to build a full-stack wearable AR HMD, having built fully functional prototypes from scratch. There are a bunch of elements that need to work before a consumer product can exist. These elements don’t all exist today (even at the state of the art).

Freudian slip?

This post is a “bottom-up” look at what is needed before Apple (or anyone) can ship the consumer product we all want. I’ll avoid hyperbole & reliance on magic R&D breakthroughs. I’ll list what tech problems still need to be solved, give some indication of where the state of the art is today for each element, and make an educated guess as to when it will be “consumer ready”.

I don’t have any specific knowledge of Apple’s roadmaps. However I do have a great deal of knowledge of the underlying enabling technologies & design problems, a large number of friends who have seen small unreleased pieces of Apple’s current AR work, and enough time in the tech industry to see how history rhymes wrt platform transitions. From there I’m just joining dots based on my experience designing/building similar things over the last 9 years.

First lets define the end state. ARKit Glasses for the sake of this post means fashionable glasses that we want to wear in public, that can render digital content as if it’s part of the real world, and are internet connected with an App ecosystem. There’s no point getting pedantic about this definition. There will be great products that we buy & love that deliver part of this vision before it arrives (we all loved our Palm Pilots & Blackberries at the time) and there will be better & better versions that solve problems shipped in version 1. What I’m trying to explore is: when could Apple ship their version of AR Glasses that “just work” and consumers desire them.

As an aside, when it comes to glasses/eyewear/HMDs the term “AR” is muddied. There’s a ton of confusion about what is a wearable Heads-Up-Display (HUD) and what are AR Glasses. They both put a transparent display on your face (maybe you look right through it or maybe it’s off to the side. Maybe it covers one eye or both), but the UX (and overall product) is completely different. Marketing departments have been calling HUDs AR Glasses, and calling AR Glasses things like Mixed Reality or Holographic displays. The difference is simple to detect: if all the content is “stuck to the display” i.e. you turn your head and all the content stays on the display, it’s a HUD. Google Glass is a good example, also Epson Moverio (in fact most head mounted displays are HUDs, though you can often add an AR SDK like Vuforia and turn them into simple AR Glasses. This is what ODG do for example). A HUD alone isn’t AR. It’s just a regular display you wear on your head. If the content can be “stuck to the real world” like ARKit enables with a phone, or Hololens, or Meta, then you have a pair of AR Glasses. AR Glasses are a superset. They are a HUD with a see-through lens plus 6dof tracking plus a lot more…

FYI this AR Glasses buyers guide is the definitive list of AR Glasses & wearable HUD’s that are on the market

This is the UX of AR Glasses. The content stays in the world where you put it. When you turn your head, the TV stays in place on the wall. If you were using Google Glass (which is a HUD, not AR), the TV would stay in the display above your eye all the time as you move around.

8 problems stopping us from having AR Glasses right now

Here’s 8 big problems that still need to be solved for Apple (well, anyone) to ship Consumer AR Glasses. The important thing to understand here is why each problem needs to be solved.

Fashionable and cool hardware. This is most important IMO. It’s hard to think of anything that is a more personal wearable object than what is worn around our eyes. Whatever we wear says a lot about us (good or bad). There’s nothing stopping great designers designing great fashionable eyewear today, the constraint here today is the tech. Enterprise AR will get traction first purely because people will be paid to wear ugly versions that have the tech problems solved but not the design problems.

A high-res OLED microdisplay from e-magin, suitable for a HMD (or a Watch). The image is projected into the waveguide which spreads it over the plastic lens in the glasses so you can see it in front of your face.

Optics that fit inside a consumer frame and are bright enough for daylight use, sharp enough to easily read text & have a wide enough field of view. There’s really not much point to AR products that only work indoors. The whole point of AR is that you can get out & about & wear them as part of your daily life. The optics need to work outdoors on sunny days, and be sharp enough to read text. FoV is less important to AR than VR as the utility use-cases aren’t as immersive as games which depend on peripheral vision. FoV mostly matters to avoid the distration from clipping the content. Today the state of the art that can be manufactured in volume is an OLED microdisplay powering an injection molded waveguide. Single focal plane. We’re a year or two away from nicely functional better displays, and probably another year or so after that before those better displays can be manufactured in volume. I heard a story a couple of years ago about a very technical strategic investor from a large OEM who passed on investing in Magic Leap early on. Not because he thought the displays were snake oil, they weren’t, they worked & the demos were amazing, but because of the challenges he understood in scaling up the manufacturing volume and maintaining a good yield without optical defects. I haven’t heard that ML has solved these challenges, and The Information reported recently they may be using a different & easier to manufacture type of display. This is the reason the military still uses OLEDs & waveguides today (not for a lack of funding & research into better systems!). Also the unit cost to produce a wave guide is very low ($1–2) if it is injection molded. There’s a 7 figure up-front one-time optics design cost, and the hi-res micro-oleds have been expensive but they are dropping in price fast. Also as yet no one has applied smartphone scale production economics onto hmd manufacturing.

Some of Mark Billinghurst research from 2014. Five years out from then would mean…

Hardware to capture “natural” input from a user and software to reliably determine the user’s intent from the input. This is big and not close to being solved. The more you think about it the more difficult you realize the problems is. Lots of efforts are going into perfecting single modes of input (perfect voice recognition, perfect gestures, perfect computer vision etc) but even if you can perfect one mode (I doubt anyone can), there are going to be lots of circumstances where the user will never want to use that mode. Eg voice input during a movie (a watch tap is better) or gestures while in public (voice might be better). To avoid being horribly compromized & embarrassing to use (at least some times) a mutli-modal system is needed with an AI to choose which input system best captures the users intent at that time. The state of the art here is research done by my SV partner Mark Billinghurst in his lab, on multi-modal input for AR. We understand Apple and MSFT and Google (I recently heard a rumor of a motion tracked ring) are all working on multi-modal systems. I’d guess these are 9–12 months away from shipping in products (in a simple form). Airpods/Siri + iPhone are probably the best example of sort-of multi-modal input that is widely available on the market today. You can control your phone by either tapping the display or your ear or talking. It’s still very basic. You are the AI deciding which mode to use.

Sensors and Processors that give high enough performance-per-watt to work for long periods without heat/weight ergonomic concerns. Better integration also means more freedom to be fashionable for the designers. Moving from “works on my phone” to “works on a see through display” means finding big improvements in power-per-watt and motion-to-photon engineering. Just because some AR feature works nice on my iPhone, doesn’t mean it can just be copied over to glasses. 3D reconstruction, Machine Learning applications and coherent rendering are examples where the state of the art can (just) be achieved today on a phone, but they are quite heavy users of CPU/GPU which drive up heat and suck battery life. Allow 12 months from “works on phone” until you could expect it to “work on glasses”. In some cases custom silicon may be the best solution (eg Apple’s W1 chip, or Movidius’ CVGPU or MSFTs forthcoming HPU v2), which might mean an even longer wait.

With semantic segmentation, the AR system can begin to distinguish individual objects in a scene, label them and pass that information up from ARkit to the App so it can decide appropriate interactions with content.

An ability for the system to understand the structure & semantics of the world in 3D in real-time (ARKit is the very beginning of this, providing a 6dof pose relative to yourself, and a basic ground plane). People are only just becoming aware of how important this is, and it will become a widely understood problem over the next 12 months. There’s no point building an AR App unless it interacts with the physical world in some way. That’s the definition of “AR-Native”. If there’s no digital+physical interaction, then a regular smartphone app will give a better UX. Tabletop 3D games on a flat table are the perfect example of “why bother doing this in AR”? In order to interact with the world, the system needs to capture or download a virtual 3D model of the world in front of me (see my prev post). Once this problem is understood by developers & systems start to serve up the world to them (like Tango, Hololens & Occipital can do today), the next problem of “how do I layout my cool content onto a 3D scene when I don’t know in advance what the scene will look like?” becomes the big problem! Right now very simple heuristics can be used, but they are really crude and developers need to roll their own. Someone needs to develop a procedural content layout tool which is simple enough for developers to use (MSFT had a good shot at this with FLARE a few years ago. They really are way ahead!). Film SFX software which can procedurally generate armies of digital Orcs in a scene is today’s state of the art. I know of one startup that has this tech in an architecture product & is thinking about how to apply it to AR. I haven’t seen any other solutions or people working on the problem. The state of the art today in 3D reconstruction is real-time dense large-scale 3D reconstruction on a phone with a depth camera. Doing this without the depth camera is about 9 months of research away from working, and another 9–12 months away from productization. 3D Semantic segmentation state of the art is that it works in a basic way on a phone today via academic code, but again at least a couple of years to be really good enough for consumers. Large-scale tracking and 3D Apple Maps type integration is still research today. Some research implementations exist, and Google Tango VPS is state of the art today. Probably 12 months until we see basic versions of this in consumer phones.

A poor quality image of two players who can see each other’s buggy’s in real-time racing around on the same table. Multi-user shared virtual AR environment in 2013

An ability to share/communicate our experiences with others. At Dekko we built one of, if not the first commercially available IOS multi-player AR games (Tabletop speed — a digital R/C buggy which could collide/occlude with real-world objects). The lift in enjoyment playing with others who also could see your buggy was exponential. Right now nearly all AR demos and apps assume you are the only AR user in the world. When lots of people have AR capable devices we will want to share “what we are seeing” and also share an “augmented view of ourselves”. Charlie Sutton & Cliff Warren figured this out at Samsung and it was an extremely compelling (and obvious in hindsight) product prototype. Imagine that virtual hat from a Snap Filter being something you could virtually wear all day, and everyone (or only people you filter) else wearing AR Glasses could see it on you. Basic tech to do this is understood but needs more robust outdoor localization SLAM tech & an ability to share “SLAM maps” with each other so we are both using the same coordinate system relative to each other. Tango VPS is state of the art today, still a year or more until it’s really solid across the industry. Consumer UX/Apps and APIs etc that take advantage of this are another 6–12 months after that. One of the most exciting use-cases here is the concept of Holoportation.

Why wouldn’t I want to wear this all day, not just in my selfie app? This is how I want everyone else (who arewearing AR glasses) to see me!

An AR-Native HMD “GUI” which means an entirely new paradigm for “applications” as the Desktop metaphor we’ve used for the past 40 years doesn’t really hold anymore. This is such a huge design rabbit hole to go down… suffice to say a 4x6 grid of square icons filling our transparent display FoV won’t work. There’s an opportunity for an entirely new app eco-system apart from IOS/Android to emerge, as almost none of the prior 10 years of work building smartphone apps will come across into AR. Interestingly I found that designers who have an industrial design background picked up AR design better than App designers (or god-forbid game designers, who really struggled with empathizing with real-world problems people have). I figured this is because industrial designers are trained to solve real-world problems for real people via physical 3D products. With AR you just leave the product/solution in its digital state (and maybe give it a bit more interactivity). State of the art today is in the military, with interesting work going on with self-driving car UIs, and some 3D gaming UIs. Hololens is the best readily available AR GUI to try. Try it & you’ll see how far there is still to go. Academic Research is pretty solid (Doug Bowman is the man. Not the Doug Bowman @stop who used to work at twitter, but the 3D UI Professor from Virginia Tech) but research needs to find its way into products. Rumors are that Apple has a mature 3D GUI solution that works nicely in their labs today.

Magic Leap’s version of an AR GUI. Some elements are in the HUD (music controls in the corner, the clock), some elements in the world (YouTube & toy). Mail app could be either. How the heck do you choose to close & swtich to a new “app”? Maybe pointing at the physical coffee cup is the way you tell the “starbucks app” to deliver me a coffee? Smartphone app eco-systems won’t work here.

An ecosystem of apps to enable useful and entertaining use-cases. Like the smartphone, there’s no “killer app” for AR. There needs to be a great UX/Gui/Input, which then lets us connect to the internet & do what we already like to do, in a way that takes advantage of what only AR can deliver.

This is a long list of problems still to be solved before ARKit Glasses can exist. I don’t think Apple will take the Hololens approach & try to solve everything at once in a single product. To *really* understand this list, get hold of a Hololens and look at it through the lens of the 8 items above. Hololens is amazing because it was the first product to actually solve ALL the problems above and ship in a single integrated product for enterprise (not military) pricing. That had never been done before! It’s the Nokia Communicator of AR! Not very useful, but it proves it can be done. Yes, all the solutions to those 8 points are very simple and way short of where they need to be, but they did what no one had ever done before. It also means Microsoft probably understands these problems better than anyone else…

Two tracks for Apple to build out to ARGlasses

It makes sense to me that Apple will take a two-track product strategy to eventually get to an ARKit Glasses product. Apple is famously design led, and historically their products solve only one design problem at a time, as user behavior is hard to change. A great example of this is the original iPhone, which solved smartphone input via Multi-touch. This changed everything. All the other capabilities of the original iPhone already existed in other phones on the market.

One set of problems are technology problems. Computer Vision, sensors, SDKs, 3D developer tools etc. The other set, which are just as difficult to solve (and under-appreciated in the AR community), are Design problems. This includes what use-case(s) do the products address, how do I interact with the UX, how does this product express my identity (I am going to wear it on my face!), how is this product desirable?

I see Technology problems being solved on the iPhone platform via an increasingly more capable ARKit SDK.

I see Design problems being solved by evolving Apple’s wearable products (watch, airpods, non-ARKit glasses… and HomePod to a degree) into a “constellation” of products that all work seamlessly together.

Then they’ll merge both tracks to create ARKit Glasses.

Some really interesting predictions are what will be the intermediary products & market opportunities between now & then. Products from Apple and from others. Blackberries, Nokias, Palm Pilots were all hugely successful products in the “transition years” between being able to connect a phone to the internet and iPhone/Android dominance.

Apple’s track to solve AR Technology problems

ARKit is a big deal for the AR industry. Not just because it’s got Apple’s marketing fairy dust, but because it means that developers now have a high quality “6dof pose” to use in AR Apps. EVERYTHING in AR is built on top of a high quality pose. Solid registration of digital assets in the world. Indoor & outdoor navigation. 3D reconstruction algorithms. Large scale tracking systems. Collision & Occlusion of digital & real. Gesture detection. All these only work as good as the pose the tracking system provides. My earlier article explains more about how Apple did this & how it all works.

Apple launched ARKit on the iPhone hardware platform, being well aware that a hand-held form-factor is severely compromised for AR use-cases (too long a topic to cover here, I’ll write more about this). Apple knows this because many of the leaders of their AR teams knew this before they joined Apple! The reason to launch ARKit on iPhone is that consumer expectations will be much lower, as ARKit v1 is really quite a limited SLAM system (though excellent at what it does do). Developers have the opportunity to learn how to build AR Apps (very different than smartphone or VR apps!). Consumers see lots of great demos on YouTube & start to be educated about the potential of AR. Apple gets to bring better algorithms to market on the device that has the most CPU/GPU/Sensor power without trying to deal with wearable hardware constraints.

Lots of processing power. Compromized form-factor.

When will ARKit tech be good enough for ARGlasses? My view is that the system will need to handle large scale / outdoor tracking & localization, as well as dense 3D reconstruction in real-time via a monocular RGB camera. This is the minimum for ARKit to enable “AR as it’s popularly imagined to be”. Note also that tech problems to enable natural input also need to be solved.

Apple’s track to solve AR Design problems

This is the fun stuff. At heart I’m a technologist who understands business. But I’m married to an AR Designer (@silkamiesnieks) and both my AR product development teams were co-led by Designers (Silka at Dekko, and @cactuswool at Samsung) who taught me a lot. As an aside… I still haven’t seen other AR product teams do this, which I think is a missed opportunity (I bet Apple & Snap have designers involved at the top of the AR product org). I believe (from hands on experience!) that building AR Glasses is more of a “Design Problem” than a technical problem, even though the tech problems are about as hard as they get in tech. Prioritizing only the tech leads to a “boil the ocean” scenario, which one or two AR OEMs seem to be struggling with. The super-hard tech problems will be solved before the super-harder-er design problems are solved, but it’s the design decisions that determine which tech problems to solve.

I had the pleasure of seeing computer vision experts sitting across the table from senior product designers, learning each others language, and observing how it changed how both teams approach AR problems.

I’ve been very impressed with Apple’s softly-softly approach to AR hardware. Airpods are IMO the first successful AR hardware product. I usually get blank looks when I state this, but they solve 2 critical problems for AR Glasses. They enable voice input/output for a natural mode of interaction (plus basic touch/tap). They also let us “augment” our surroundings with sound, a simple example would be the audio guides at every museum. This is audio AR. I’ve taken to wearing mine continously while out & about. In restaurants & bars etc. No one cares, no one has punched me, and staff/friends speak to me assuming I can hear them (which I can as I pause music in advance). It surprised me how quickly they have been accepted as an all-day product. I predict we will see more “AI enabled headphones” (hearables) from other AR platform OEMs in the next 12 months. Startup acquisitions happening in 3, 2, 1…

Augmented Reality that is cool to wear. Augmented Audio, who cares about graphic displays?

The other significant aspect of Airpods is that they are physically separate from the Glasses. This reduces the size and cost of the eventual Glasses product. It should be obvious that we all wear multiple pairs of regular glasses (sunglasses, reading glasses, sports glasses etc) so why wouldn’t we want the same with our AR Glasses? The industry has (probably correctly) assumed we won’t buy a $1500 face-computer multiple times for fashion, but if the glasses are little more than a plastic frame, an injection moulded waveguide and a beefed up W1 chip (which can handle video), plus maybe smartphone camera & IMU, the BOM unit cost could be in the low tens of dollars…

…of course, as Tim Cook said earlier this week that the iphone will be “even more essential” for augmented reality

…also OLED micro-displays are today’s best way to drive a wearable display, so iPhone 8 shifting to OLED is also useful. Soon OLED microdisplays will be superseded for HMDs by fancier display technology with multiple depths of focus, maybe retinal projection & other magic, but today its OLED.

I don’t think Apple will ship a camera in their first Glasses. They will be a HUD. More bluntly… an Apple Watch on your face. Solve one design problem at a time (new display form factor), and re-use an existing use-case. I’ve spoken to people who have held & used Apple’s prototypes, and they didn’t have a camera. Allowing for the 3D printed frame, they looked good, a lot like RayBan Wayfarers. The recent Reddit thread was accurate. The use-cases being tested were in regard to “notifications”. This would be a natural fit for a Siri service similar to Google’s Assistant (and also synergistic with Airpods). I heard about 9 months ago that there were no glasses on Apple’s 12 month marketing roadmap. I expect we’ll see some Glasses ship next year (not AR capable, but with a Heads Up Display for notifications), with this year taken up with marketing ARKit technology on iPhone. There’s a decent chance a camera could ship in Apple Glasses v1 (no technical reason not too) but that would be unlike Apple when a simpler product could ship which only solves the single design problem of getting people to wear a wearable display.

Marc Newson’s recent high-end fashionable glasses range

The most important advantage re the Glasses is that Apple has Marc Newson on the team. Look him up if you don’t really know who he is. He knows how to design cool glasses. One lesson I learnt at Samsung, is that Industrial Designers view glasses as one of the most difficult physical products to design, mostly because everyone’s face shape & taste is different, and there’s physically almost nothing to them (ie can’t add features to keep everyone happy, the design has to be very pure & simple).

But what about the tech specs??!!!

Apple has also been learning how to sell Fashion. The Apple Watch has been invaluable in this regard. I still don’t know any tech specs about the watch, but I know Beyonce wore a launch edition in Vogue. Apple’s also learnt how to sell a range of colors and bands (and price points) so that people can express their individuality. No other tech company working on AR (except maybe Snap, who is learning fast) understands this.

Anyone who thinks the mass market won’t view AR Glasses as first & foremost a fashion purchase is probably still wearing Google Glass. They need to be designed & marketed & priced with that in mind.

The Bottom Line

This is pretty much what I think Apple’s AR “constellation” will look like to wear…

I don’t find it hard to imagine in a few years that people will be wearing their (cool) Apple Watch along with some (cool) Apple Glasses and Airpods all day. They may have a 2nd pair of glasses in their bag. And an iPhone in their pocket. The Watch can serve as a secondary way to give input (tap the watch to select in the glasses). It’s easy to take your Glasses off when you walk into a bar, and still be connected via Airpods & Watch. You can still pull out your iPhone if you need a display to share (or a keyboard), or want some more privacy.

In terms of when? …

late 2018

I could easily imagine non-ARKit Apple Glasses (display, some basic input, mostly controlled from the phone or AirPods, no camera) in 2018. Very simple (“Trivial! Wasted opportunity!!” the AR industry will cry). Emphasis on fashion, lots of frame styles & price points. Marketed in fashion magazines. ARKit on iPhone expanded to support 3D reconstruction, plus some features to improve realistic content rendering (light source detection?) & multi-user AR experiences.

2019

non-ARKit Glasses now with with Camera (for Photos/Video, more like Snap spectacles). Look for a “W2” wireless chip that supports video. ARKit on iPhone expands to support large scale tracking & large-scale mono RGB 3D reconstruction. Integrates deeply with Apple Maps & Siri.

Probably another company ships their version of “full-stack” consumer AR glasses, which work OK. Enterprise AR really starts to gain traction. A Few Mobile ARKit apps show strong metrics.

2020

Version 3 of the Glasses with lots of issues fixed, better power, displays, wireless, input, GUI etc. ARKit gets lots of tweaks & enhancements mostly to handle larger areas and more simultaneous users and content creation tools are mature enough that most developers can easily use them. People complain that Apple is missing the AR Glasses market as decent competitor products start to ship.

2021

The merged ARKit Glasses product range ships to huge success! We arrive at the Rainbows End.

“Obvious” questions that don’t have obvious answers

What about ARkit on iPhone8, it’s awesome and will be on 400 million phones this year and a gazillion billion next year?

Yes and Yes. However… Handheld AR on Mobile has a number of inherent UX challenges to overcome (its handheld for a start), and ARKit itself is limited today in the UX it can enable (eg no collision/occlusion, no absolute coordinates). I think there are some great use-cases that are possible with Mobile AR, but they aren’t the type of thing to capture 10's of millions of daily users next year. You could still build a great startup on iPhone ARkit that exits for $100m+ in the next few years, but unlikely you’ll be the next Google. Topic for another post. Update: here is that post…

AR will always be a niche product. The market potential is similar to Games Consoles (10’s of millions of devices). How can AR Glasses possibly be a smartphone sized market (billions of devices)?

Looking at AR as fundmentally an entertainment product (like VR is mostly viewed as) does restrict the market size. However I have believed for many years that the real potential of AR is as a communication and information device. These are the things we use our smartphones for today, and if AR products can deliver compelling user experiences that let us communicate better & understand the world better, then it’s a smartphone sized market.

Alternatively if you look at VR as something that lets me “escape” to another place, where AR enhances where I already am, then just by looking at how a typical person spends the hours in their day tips the balance towards AR. We maybe spend a coupe of hours a day “escaping” into a book or TV etc, but most of our hours are spent engaging with the world.

Whether we prefer to communicate with each other in VR (FB spaces or RecRoom type of thing) or AR (Holoportation) is hard to predict. My bet is on AR, though its harder to build the tech.

What about VR, that is where all the action is?

Prior to ARKit I would have agreed & said “just wait for AR”. Now I don’t have to say that… :-)

Why won’t other companies solve these problems before Apple?

Microsoft is years ahead of everyone else, and has all the software & hardware assets to win this race (An operating system + developers, Bing Maps, xbox 3d graphics, oem hw partners, msft research). Google has great AI & device ecosystem advantages. I think other companies will ship products with all the features needed to be called consumer AR glasses, but I don’t think they’ll be able to sell them, as it will be a fashion led buying decision by the consumer.

I’m wrong because…..

There are so many ways I could be wrong with these predictions. Most likely is that Apple has come up with a creative design solution to deliver a user experience without needing the complete tech problems fully solved.

There’s also a pretty good chance that one of the “outsiders” in mobile wins a dominant market position (eg like Google usurped MSFT & Intel & the P.C. OEMs). This could be Snap or a startup in a garage, or even a MSFT comeback. This sort of potential industry shakeup hasn’t been viable for over a decade. These are exciting times. Facebook is interesting… topic of another post. The scale of the smartphone supply chain eco-system & the fact that better hardware integration enables a more fashionable design gives an advantage to players who can leverage this. The transition to a fashion led purchase with completely new App paradigms creates an opportunity for disruptive new entrants.

I’d love to hear other evidence based reasons that I could be wrong about these predictions….