Sorting shopping lists with deep learning using Keras and Tensorflow.

Shopping for groceries is hard.

Stores are large and have complex layouts that are confusing to navigate. The hummus you want could be in the dairy section, the deli section, or somewhere else entirely. Efficiently navigating a store can be a daunting task.

At Instacart, our customers can order millions of products from hundreds of retail partners. Our fleet of tens of thousands of personal shoppers must find these items at thousands of store locations. We are always looking for opportunities to enable our shoppers to move faster.

Shopping with Deep Learning

Enter Deep Learning

By observing how our shoppers have picked millions of customer orders through our app, we have built models that predict the sequences our fastest shoppers will follow. Then, when a shopper is given a new order to pick, we use this predicted fastest sequence to sort the items for them.

This approach has reduced our shopping times by minutes per trip. At scale, every minute saved will translate into 618 years of shopping time per year.

So how do we do it? First, we can’t build warehouses, get accurate store data or map each store location. Also, traditional machine learning approaches (we ❤️ XGBoost) don’t work either due to the sequential nature of the problem.

So instead, we spread some deep learning on it.

What We Tested

We ran a test where every batch (a set of items to be picked by a shopper) was randomly assigned to one of four list sorting algorithms:

Control (red): departments sorted in a random order; items sorted alphabetically within departments

Human (green): aisles sorted by humans using store layouts; items sorted alphabetically within aisles

TSP (teal): a traveling salesman solution using average inter-department picking times; items sorted alphabetically within departments

Deep (purple): our final deep learning architecture, which directly sorts items in the batch

We then look at how each sort performed in terms of shopper speed (y-axis) as a function of the size of the batch picked (x-axis):

Note that we mask the exact picking speeds as this is a competitively sensitive KPI for Instacart.

The control sort performed the worst (as expected). The TSP and Human sorts performed significantly better, but were statistically no different from each-other.

The deep learning model beat them all by a large margin — the increase in picking speed from human to deep learning is 50% higher than from control to human at large batch sizes.

In the remainder of this post, we will define the problem (using emojis of course), and then introduce a naive initial architecture. We will inspect that architecture and point out some key flaws, and then conclude with a final architecture that is more efficient and effective.

List Sorting in Keras

Problem Definition

Suppose a customer ordered 10 items and their personal shopper picked those items in this sequence:

We can observe this sequence, as our shoppers weigh or scan bar codes for every item picked. In order to learn the sequence, we need to formulate this as a supervised learning problem. Suppose that we are looking back in time at this order, and we pause after the shopper picks the 🍞:

We want to predict the next item that the shopper will pick (a 🍪 in this case), given they just picked the 🍞 and can choose from one of the five candidate products remaining (🍪🍫🍕🍖☕). Are you getting hungry yet?

In emoji-math (one of the benefits of working at Instacart is doing emoji math), we can re-write this as:

Note that this probability is non-trivial to compute. It’s not enough to ask how often cookies are picked after bread. Cookies may be incredibly common (they are in my household), so this naive probability might be biased high. We want to measure how likely cookies are to be chosen given we can only choose from a fixed set of remaining items.

Initial Architecture

Our initial deep learning architecture, implemented in Tensorflow using Keras, was the following: