SQL Recipes Marketing Funnel Analysis

Funnel Analysis in SQL

Most websites, applications and business processes have a sequence of steps (events) that you'd expect a customer to take to achieve an ultimate goal or conversion. This is described as a goal or conversion funnel. Tools like Google Analytics lets you perform a funnel analysis on your checkout process or marketing lead-capture page quite easily. Funnels are not limited just to those processes and at times you'll have to get to the raw data to get the answers you want.

There are three use-cases for building your own custom funnel analysis:

You want to analyze a flow that doesn't fit one of the models provided by other tools out of the box, for example customer onboarding, filling out forms or questionnaires, etc. Those other tools require you to instrument your funnel beforehand, making retroactive funnel analysis cumbersome. You want to combine your funnel analysis with custom activation programs within your organization for the full acquisition, engagement and retention parts of the customer lifecycle.

This is where funnel analysis on raw, row-level data comes into play and the full power of SQL can be unleashed.

1. The data schema

Consider an event table with the following schema:

timestamp user_id event properties 2017-10-01 user_1 View Product {product: Boots, category: Footwear} 2017-10-02 user_1 Add to Cart {...} 2017-10-01 user_1 Checkout {...} 2017-10-01 user_2 View Cart {...} ... ... ... ...

The above data could come from an e-commerce site and a customer goes through the three-step funnel: "View Product" ➡ "Add to Cart" ➡ Checkout flow.

If your data is organized as a set of normalized tables – common in event analytics tools – you might have to union the tables to create a schema like the above.

The definition of a funnel step

There are two parameters that we'll use to define a funnel step:

Users who have performed a particular event The event is performed within a particular time frame

These two conditions can be implemented using two where clauses.

2. Constructing the funnel steps

We'll build the funnel steps as three successive queries that build upon each other:

First, the distinct users who've viewed any product (event: 'View Product') on our site since the start of September 2017:

select distinct user_id from events where event = 'View Product' and timestamp between '2017-09-01'::timestamp and now()

Next, we'll construct the second step with the users who've added products to the cart (event: 'Add to Cart') within 7 days of viewing a product. We'll accomplish this using an inner join and an additional clause on the timestamp.

select distinct user_id from events s2 inner join ( select distinct user_id from events where event = 'View Product' and timestamp between '2017-09-01'::timestamp and now() ) s1 on s1.user_id = s2.user_id and s2.timestamp < (s1.timestamp + '7 days'::interval) -- and s2.timestamp > s1.timestamp where s2.event = 'Add to Cart'

(we're also adding an additional clause to make sure the events in s2 come after the events in s1)

We'll repeat the same pattern to construct step 3 of the funnel for users who've reached the checkout page.

select distinct user_id from events s3 inner join ( select distinct user_id from events s2 inner join ( select distinct user_id from events where event = 'View Product' and timestamp between '2017-09-01'::timestamp and now() ) s1 on s1.user_id = s2.user_id and s2.timestamp < (s1.timestamp + '7 days'::interval) -- and s2.timestamp > s1.timestamp where s2.event = 'Add to Cart' ) s2 on s3.user_id = s2.user_id and s3.timestamp < (s2.timestamp + '7 days'::interval) -- and s3.timestamp > s2.timestamp where s3.event = 'Checkout'

3. Charting the Funnel count metrics and drop-off rates

Once we have the aggregate tables at each funnel step with the above queries, it's straightforward to chart the counts and the step-over-step drop-off/conversion rates. If the results of the above queries are stored as tables s1, s2 and s3 (could use a view or a CTE here):

select 'View Product' as step, count(*) as cnt from s1 union select 'Add to Cart' as step, count(*) as cnt from s2 union select 'Checkout' as step, count(*) as cnt from s3

To get the step-over-step drop rates, we can use a window function to find the difference from the previous row as a percentage change. To demonstrate the lag window function:

select step, cnt, lag(cnt, 1) over () from ( select 'View Product' as step, count(*) as cnt from s1 union select 'Add to Cart' as step, count(*) as cnt from s2 union select 'Checkout' as step, count(*) as cnt from s3 ) x

step cnt lag View Product 100 Add to Cart 60 100 Checkout 35 60

Now we just have to incorporate our percentage formula in the lag function:

select step, cnt as user_count, coalesce(1.0 - cnt::float/lag(cnt, 1) over () , 1) as drop_off_rate from ( select 'View Product' as step, count(*) as cnt from s1 union select 'Add to Cart' as step, count(*) as cnt from s2 union select 'Checkout' as step, count(*) as cnt from s3 ) x

Conversion is simply 100% - drop-off rate.

4. Customizing the funnel steps

The join definition in the above queries is the key to the flexibility and customizability of the funnel analysis. By changing the join definition, you can answer questions like:

4a. Who are the users who performed the first event and the second event?

Join the event tables with an inner join as demonstrated above. This is the base case.

4b. Who are the users who performed the first event and not the second event?

This query can be used to find users who have dropped off. Join the first event table with the second event table with a left join and filter the resulting table for a row that has the second event table join key that is null .

5. More advanced customizations

After you're satisfied with the basic funnel analysis, you'll want to extend it to further understand your data. We present four ideas to do this and will go in-depth in a future recipe.

5a. Adding a time-dimension to the funnel

While the basic funnel analysis is powerful on its own, you'll want to figure out how the funnel is improving over time. This study can help you quantify the results of your A/B testing, personnel, and process changes. You'll want to present your data as a pivoted table for clarity.

Time period Users Step 1 Step 2 Step 3 Month 1 100 - 60% 30% Month 2 120 - 57% 27% Month 3 140 - 55% 32% Month 4 160 - 60% 33% Month 5 180 - 65% 35% ... ... ... ... ...

5b. Adding a dimension segmentation to the funnel

We overlooked the properties column in the event table so far. You can add additional where clauses to further segment your data. It's also useful to present side-by-side comparisons between two properties:

Channel Users Step 1 Step 2 Step 3 Channel_1 40 - 60% 30% Channel_2 60 - 50% 26%

You can read our recipe on Querying JSON data types in PostgreSQL if you're looking for tips on working with schemaless column types with PostgreSQL.

5c. Adding a session restriction to the funnel

Sometimes you want to restrict a funnel to a single user session. You need to make a couple of changes to the queries: first, you need to group by both user_id as well as session_id . Secondly, you will want to include the session_id in the join clauses.

We have a SQL recipe on sessionization in the works and we'll release it shortly.

5d. Rolling conversion rates

The funnel analysis queries so far have used a hard cutoff of 7 days in order to determine the next funnel step. In practice, you'll want to create a variety of cutoffs, for example windows of 1 day, 7 days and 30 days in order to compute conversion rates. This is known as rolling conversion rates.