Multi-Channel Marketing Attribution using Segment, Google BigQuery, dbt and Looker

If you’re a marketer responsible for acquiring new users for your online service, you don’t want to rely just on the reporting and attribution models provided by your ad networks. Why? Because they only show the effectiveness of each channel in isolation and don’t really consider the way in which visitors interact with a number of different channels and touch-points on the path to conversion. What you really want is attribution and conversion reporting thats independent of the networks themselves, uses a multi-touch attribution model that considers all the touch-points in the conversion journey and is under your control, not the advertisers. When you control your own attribution model and how those numbers are reported you can switch between models that align with the goals that your business has at particular points in-time; for example using a first click attribution model if your priority is lead generation, last click if conversions are your priority or even time-decay, if you want to place more emphasis on touchpoint interactions closer to conversion. And just as importantly; if you connect your downstream operational systems to your attribution model and user tracking service you can use the actual revenue or contribution each customer has delivered rather than guessing at an average contribution, something particularly important if one of the channels you’re spending your acquisition budget on brings in users that churn quickly whilst another brings in loyal, long-term and valuable customers.

For example, an online marketplace may use Facebook Ads, Google Ads and other ad networks to recruit new users who then, to some degree or another go on to generate revenue for the business each time they transact. Like most high-growth online businesses they’re spending a significant amount of their current funding round on growing their user base and wanted to know which channels brought in the ones that actual generate revenue. They use Segment Connections connected to a downstream Google BigQuery data warehouse to record all their website visitor page views and interactions, and we deployed dbt, Stitch and Looker a while ago to provide their team with point-and-click operational analytics dashboards using the same BigQuery data warehouse to hold their analytics data. We therefore added the Segment package to their dbt project and used it to stitch together all of the interactions by a site visitor both when they were unauthenticated as a visitor to the site and after they’d registered and authenticated, so that we could look at the complete history of a user’s sessions including each session’s UTM campaign parameters using a BigQuery SQL query like this:

Now because we’d previously centralised this customer’s operational system’s data into the same BigQuery data warehouse, it was a simple matter to join the blended user ID coming from Segment to each users’ ID coming from their main operational system to retrieve their date of registration. Adding a few more columns to the query output to give us the first page in session and other potentially useful information and filtering to only return those sessions taking place up until the time of registration, the SQL query we then used against the centralised Segment and operational dataset now looked like this:

To return a user-specific measure of value that we can then use in our attribution model, we’ll create a simple query that returned each user’s ID and the total number of transactions they’d completed on the platform.

Now comes the part when we start adding attribution logic into the SQL query using the following rules: For each of the sessions for a given user, use the FIRST_VALUE analytic SQL function to work out if that session is the first one in sequence for the user; if it is and we’re using the first click attribution method then allocate 100% of the conversion value to that session and none to the ones that follow, an attribution method that places all the value on the acquisition channel that first brought that user to our website

Similarly, use the LAST_VALUE function on each user session row and allocate 100% of the conversion value to that row if we’re using the last click attribution method, and none for the ones that preceded it, placing all the value on the channel that eventually converted the visitor into a paying user

If we’re using the even click attribution method, count how many sessions the user took in-total to eventually convert and allocate the conversion value equally to each of them

And if we’re using the time decay attribution method, work out how many days did the visitor take to convert and then allocate 100% of the value to sessions in the 7 days up to conversion, 50% of the conversion value to those session between 7 and 14 days and halve and halve the value attributed again for sessions up to 21 and 28 days before conversion. Added to our SQL query as a CTE with the user transactions SELECT as a second CTE, the query to implement these attribution rules looks like this:

Then we complete the attribution model query with a final SELECT from both CTEs at the end, multiplying the total transactions for each user by the various model percentages for each user session row coming from Segment, like this:

The final, compete SQL query used for the attribution model is also available for download as a Github gist here. Finally, we can add this query into dbt as a model or just plug it directly into Looker as a SQL derived table, then create reports such as this one that aggregate each week’s registrations by week and channel to show where each week’s new users came from, using just one of the attribution models:

Or you can create a Looker dashboard with attribution models selectable via a dashboard filter, allowing users to show the value of each channel, campaign and source over time and by share of revenue generated using either of the first click, last click, even click or time-decay attribution methods.