At Snips, we have spent the past year building technologies for assistants, from analyzing your location patterns to extracting relevant stuff from your email and chat messages. But we don’t want to be just a technology company. We want to build products on top of our technology, simplifying the way we interact with our devices on a daily basis.

Compared to product categories like calendars or photo-sharing apps, assistants are relatively new. As such, designers around the world have been experimenting with various ideas, from bots to voice assistants, proactive suggestions or physical devices.

It’s still unclear which UX will end up being the right one, and some of us now believe that an assistant isn’t a product in itself, but rather something you add to existing products.

One thing in particular we are currently exploring at Snips is how we can add an assistant to existing chat apps instead of building our own chat app (try it here on Android). This would enable us to focus on the assistant, and not worry about the chat part or getting hundreds of millions of users to switch app.

The issue though is that most chat apps don’t have an API, and thus, we can’t access the conversations. Fortunately, we found a way around that on Android by leveraging two specific platform features: the ability to read the content on the screen, and the ability to draw things on top of an existing view.

To test our ideas, we chose to focus on a very specific use case: extracting places mentioned in conversations, showing information about them, and deeplinking into transit apps to go there.

Extracting places from conversations

This section is a bit technical, so skip to the next one if you want UX stuff!

The core piece of technology behind this location assistant is the ability to read conversations and extract places mentioned in it. Whether it’s a restaurant, a city or a street, it should be identified and linked to the corresponding entry in Google Maps. This is a specific field of NLP called Named Entity Recognition (NER).

Extracting places in particular is one of the hardest tasks, due to the high variability of ways we refer to them, as well as the fact that place names are often a combination of common words. Thus, a robust model should be able to learn the representation of places in sentences instead of relying on a possibly infinite set of rules.

This is where machine learning comes into play. By giving examples of messages containing places, we can teach assistants to recognize the structural patterns in which they occur, and automatically identify them in new messages. Although a number of techniques exist, including using deep learning, we had very specific constraints for our algorithm:

it should run on device to avoid sending text messages to our servers, and protect the user’s privacy

it should be very fast, since we analyze every single message in the background

it should have limited impact on battery, memory and CPU

it should learn from a very small number of examples (we have about 20k messages in our dataset, but only about 600 contain places)

it should work on chat messages, which are typically lacking proper structure such as punctuation or capitalization that you would find in emails or books.

This pretty much rules out off-the-shelf solutions, which either rely on huge models that you can’t download on a phone, require large amounts of training data that we don’t have, are too computationally or memory intensive, or simply go through a server and thus wouldn’t scale easily to the amount of messages we need to process or protect privacy.

In the end, we opted for an approach that combines two steps:

a first model to identify messages containing a place. This model is optimized for speed (since there are so many messages to filter) and for low false negative (only about 3% of messages contain places, so we want to keep them!). We managed to filter out 70% of messages without a place, while keeping 95% of those that contain one.

a second model that looks at each of these messages, and identifies words that correspond to the actual place. If the message doesn’t contain a place, it is discarded. Combined with the previous step, we get over 80% precision.

It is important to keep in mind that everything the assistant does, every message it analyzes, is done directly on device. Nothing is sent to the cloud, which is how it can monitor the screen content 24/7 without any impact on privacy.

Integrating our assistant into existing user flows

Being able to recognize relevant places in conversations isn’t useful in itself of course. Theses places need to be surfaced somehow, which is where the assistant comes into play.

As we mentioned earlier, we don’t see assistants as standalone products. Rather, we want to integrate them into existing user flows, so that we can remove friction without changing habits. As a reminder, our use case is as follows:

User receives a message containing a place. He wants to see where it is, and go there, now or later.

If the user is already in a conversation, and wants to lookup a place mentioned and go there now, he would typically go through a copy-pasting flow, removing superfluous text after pasting it in a transit app:

The friction increases dramatically though when the user wants to retrieve a location from a message later. In that case, he has to open his chat app, find the conversation, find the actual message, and then go through the whole copy-paste flow:

The reason we have so much friction is simply that apps don’t communicate with each other, forcing users to manually pass data around while juggling multiple apps. Surely we can do better!

Initial design: long pressing the home button

Our first design solved each problem differently. For the “now” flow, we created a feature similar to Google Now on Tap, where long pressing the home button would analyze the content on the screen, extract places mentioned, and show cards with information and deeplinks to transit apps.

Although this worked well on paper, it turned out to be hard to setup, hard to discover and hard to remember using. As such, we had very low retention, even though we had good feedback during user testing sessions. In retrospect, this is sort of obvious, and is due to the fact that:

we were competing for the long press with the native Google apps that offer similar functionalities

changing the behavior of the long press required going through multiple settings screens, with no shortcut available

there was no visual cues to remind the user he could long press to trigger Snips and see something smart, so we couldn’t induce a behavior change

We tried addressing the discoverability/retention issue by popping notifications when we found something on screen, but this quickly became annoying and thus resulted in users uninstalling the app.

We then had a different flow for when the user needed to retrieve a location mentioned in a previous message, and go there. Since the assistant is able to read and analyze messages in the background, it is able to keep track of places mentioned in any chat app, and resurface them later.

This was achieved by long pressing the home button while being on the home screen (as opposed to being in a conversation). A screen would popup with a list of cards containing all places that were retrieved passively. Clicking on one of those cards showed information about the place, and deeplinks to transit apps.

Furthermore, these places were indexed and made searchable, including searching by contact and words in the message itself (e.g. “door code”). This is handy not just as a way to retrieve a place and go there, but also more generally to search content from any chat app.

This was quite promising, but unfortunately we still had the setup issues (since it used a long press), and even in the case where the user opened the app via the icon directly, we still didn’t find a way to change behavior and make them go through us instead of going through their usual flow. Putting an app in the middle simply was too much friction in itself!

The contextual bubble: a new hope

With these issues in mind, we started experimenting with a radically different UX: a bubble that would offer contextual assistance whenever relevant. Inspired by what Google Translate does by popping a bubble when you copy a text, we inserted our bubble assistant inside the existing user flows.

Depending on context, the bubble will offer different functionalities. For instance, if the user copies a message that contains a place, the bubble will appear on screen, extract the place mentioned, show a map of where it’s located and offer deeplinks to transit apps. This means the user never has to actually paste the text into a transit app himself.

This flow solves the issues around discoverability and retention, since it integrates into a common user flow that users would do regardless of remembering to use Snips. Furthermore, it remove obvious friction, since it is no longer necessary to go through the home screen, open a transit app, paste the content and delete the extra text. All that is done in a single step via the bubble, directly from the conversation!

The bubble also works nicely when the user isn’t in a conversation, but wants to go to a place mentioned in one. In this case, simply opening a transit or map app will trigger the bubble, which will then suggest places that were either copied before, or extracted passively from reading the screen. Clicking on one of these suggestions will automatically pre-fill the textfield of the app with the address.

This removes a lot of friction, since it is no longer necessary to find the original message, and copy-paste stuff around. The simple fact that the user read that message means the bubble will surface the place it contains when the user needs it later, aka in a transit app!

The actual Snips app itself now serves purely as a journal of all places copied and seen, as well as a search engine, in case the user didn’t want to use a transit app, but just wanted to lookup information about the place.

A quick word about Privacy

By reading this post, you were probably wondering about the privacy implications of accessing the user’s messages and reading his screen. After all, the assistant has access to everything the user sees: every message, email, website, or app. And because this content is analyzed passively, it means every single screen is fed to the assistant.

From a privacy perspective, this is a complete nightmare of course, which is why we built our assistant to be private by design. By running fully on devices, we ensure that no one can access the users’ data but them. This was really hard to build, but we felt that it was a really important issue to tackle if we want our users to feel safe about using our products.

What’s next

The contextual bubble is currently an experiment, so we would love for you to try it and give us some feedback!

Although it only works with copy/paste and transit app flows, our goal is to quickly expand its range of functionalities to include more intelligence, such as recognizing when people are asking your availabilities, and suggesting free times from your calendar, extracting movies and showing you rating as well as cinemas you can go to watch them, etc..

Hopefully, the more useful the assistant becomes, the less time we will have to spend on our phones. Because let’s face it: there are better things we can do with our time!

If you enjoyed this article, it would really help if you hit recommend below and tried the app :)

Follow us on twitter @randhindi & @snips

If you want to work on AI + Privacy, check our jobs page!