That’s fine for pure exploration or seeing what a technology can do, and often inspires new product thinking. However, if you aren’t aligned with a human need, you’re just going to build a very powerful system to address a very small — or perhaps nonexistent — problem.

So our first point is that you still need to do all that hard work you’ve always done to find human needs. This is all the ethnography, contextual inquiries, interviews, deep hanging out, surveys, reading customer support tickets, logs analysis, and getting proximate to people to figure out if you’re solving a problem or addressing an unstated need people have. Machine learning won’t figure out what problems to solve. We still need to define that. As UXers, we already have the tools to guide our teams, regardless of the dominant technology paradigm.

2. Ask yourself if ML will address the problem in a unique way

Once you’ve identified the need or needs you want to address, you’ll want to assess whether ML can solve these needs in unique ways. There are plenty of legitimate problems that don’t require ML solutions.

A challenge at this point in product development is determining which experiences require ML, which are meaningfully enhanced by ML, and which do not benefit from ML or are even degraded by it. Plenty of products can feel “smart” or “personal” without ML. Don’t get pulled into thinking those are only possible with ML.

Gmail looks for phrases including words like “attachment” and “attached” to pop a reminder when you may have forgotten an attachment. Heuristics work great here. An ML system would most likely catch more potential mistakes but would be far more costly to build.

We’ve created a set of exercises to help teams understand the value of ML to their use cases. These exercises do so by digging into the details of what mental models and expectations people might bring when interacting with an ML system as well as what data would be needed for that system.

Here are three example exercises we have teams walk through and answer about the use cases they are trying to address with ML:

Describe the way a theoretical human “expert” might perform the task today. If your human expert were to perform this task, how would you respond to them so they improved for the next time? Do this for all four phases of the confusion matrix. If a human were to perform this task, what assumptions would the user want them to make?

Spending just a few minutes answering each of these questions reveals the automatic assumptions people will bring to an ML-powered product. They are equally good as prompts for a product team discussion or as stimuli in user research. We’ll also touch on these a bit later when we get into the process of defining labels and training models.

After these exercises and some additional sketching and storyboarding of specific products and features, we then plot out all of the team’s product ideas in a handy 2x2:

Plot ideas in this 2x2. Have the team vote on which ideas would have the biggest user impact and which would be most enhanced by an ML solution.

This allows us to separate impactful ideas from less impactful ones as well as see which ideas depend on ML vs. those that don’t or might only benefit slightly from it. You should already be partnering with Engineering in these conversations, but if you aren’t, this is a great time to pull them in to weigh-in on the ML realities of these ideas. Whatever has the greatest user impact and is uniquely enabled by ML (in the top right corner of the above matrix) is what you’ll want to focus on first.

3. Fake it with personal examples and wizards

A big challenge with ML systems is prototyping. If the whole value of your product is that it uses unique user data to tailor an experience to her, you can’t just prototype that up real quick and have it feel anywhere near authentic. Also, if you wait to have a fully built ML system in place to test the design, it will likely be too late to change it in any meaningful way after testing. However, there are two user research approaches that can help: using personal examples from participants and Wizard of Oz studies.

When doing user research with early mockups, have participants bring in some of their own data — e.g. personal photos, their own contact lists, music or movie recommendations they’ve received — to the sessions. Remember, you’ll need to make sure you fully inform participants about how this data will be used during testing and when it will be deleted. This can even be a kind of fun “homework” for participants before the session (people like to talk about their favorite movies after all).

With these examples, you can then simulate right and wrong responses from the system. For example, you can simulate the system returning the wrong movie recommendation to the user to see how she reacts and what assumptions she makes about why the system returned that result. This helps you assess the cost and benefits of these possibilities with much more validity than using dummy examples or conceptual descriptions.

The second approach that works quite well for testing not-yet-built ML products is conducting Wizard of Oz studies. All the rage at one time, Wizard of Oz studies fell from prominence as a user research method over the past 20 years or so. Well, they’re back.

Chat interfaces are one of the easiest experiences to test with a Wizard of Oz approach. Simply have a team mate ready on the other side of the chat to enter “answers” from the “AI.” (image from: https://research.googleblog.com/2017/04/federated-learning-collaborative.html)

Quick reminder: Wizard of Oz studies have participants interact with what they believe to be an autonomous system, but which is actually being controlled by a human (usually a teammate).

Having a teammate imitate an ML system’s actions like chat responses, suggesting people the participant should call, or movies suggestions can simulate interacting with an “intelligent” system. These interactions are essential to guiding the design because when participants can earnestly engage with what they perceive to be an AI, they will naturally tend to form a mental model of the system and adjust their behavior according to those models. Observing their adaptations and second-order interactions with the system are hugely valuable to informing its design.

4. Weigh the costs of false positives and false negatives

Your ML system will make mistakes. It’s important to understand what these errors look like and how they might affect the user’s experience of the product. In one of the questions in point 2 we mentioned something called the confusion matrix. This is a key concept in ML and describes what it looks like when an ML system gets it right and gets it wrong.

The four states of a confusion matrix and what they likely mean for your users.

While all errors are equal to an ML system, not all errors are equal to all people. For example, if we had a “is this a human or a troll?” classifier, then accidentally classifying a human as a troll is just an error to the system. It has no notion of insulting a user or the cultural context surrounding the classifications it is making. It doesn’t understand that people using the system may be much more offended being accidentally labeled a troll compared to trolls accidentally being labeled as people. But maybe that’s our people-centric bias coming out. :)

In ML terms, you’ll need to make conscious trade-offs between the precision and recall of the system. That is, you need to decide if it is more important to include all of the right answers even if it means letting in more wrong ones (optimizing for recall), or minimizing the number of wrong answers at the cost of leaving out some of the right ones (optimizing for precision). For example, if you are searching Google Photos for “playground”, you might see results like this: