Make sense of your Google Ads data in BigQuery using data build tool Patryk Buczyński Follow Mar 8 · 8 min read

Photo by Webaroo on Unsplash

Let’s take another step in the quest of bringing don’t repeat yourself attitude in digital and marketing analytics. Previous attempt was about making life easier when using Google Analytics BigQuery data. Now it’s time to make something out of Google Ads and their 88 tables exported to BigQuery with Data Transfer.

Yes, that’s right. When you use Data Transfer service (recently made free by Google) to get your Google Ads data, you end up with 88 tables in your BigQuery project as a result. And it’s even not that strange. When you think about it, Google Ads as a system is just a collection of data points and metrics that you can use to tweak your campaigns so it’s understandable that the result of the export will be huge. But that doesn’t change anything: the schema might be overwhelming, which probably prevents some analysts from actually using the data. The fact that Google Ads metrics are built based on some hidden logic, which is usually not apparent and require some working experience with Google Ads to guess, doesn’t help.

But there are some good news. Digging into the data structure, it becomes more apparent that it’s actually a decently designed star-schema, built around some common principles. Google Ads data is by nature much more standardized than Google Analytics and do not depend on the client-setup of the tool that much. You do not have events, custom dimensions or custom metrics with varying names that you might need to include in your queries. The only thing that might change between different accounts is conversion names.

All of that makes that case perfect for some do-once-and-reuse approach.

Large Google Ads accounts tend to create performance issues when reporting. Of course, there are several options available: predefined or custom downloadable reports or BI connectors. Heck, you can even use a script to get the data to Google Sheets and you’ll be fine if you work on some account of moderate size. But, as you probably guessed by now, it doesn’t scale. After all, we’ve all been in the place where Excel or BI tool works as we were playing some turn-based-strategy where each player takes all the time in the world to think about the next move.

BigQuery (+ BiEngine?) powered, fast dashboard is a perfect solution. But to build it you need to transform the data from the initial state of 88 tables to something that looks like a real Google Ads report. Let’s make some example model that just does that.

I think that I don’t need to add that we’ll use a data build tool for this.

Prerequisites and notes

In order to be able to use the model you need to understand how the data build tool works (check here). Some SQL and BigQuery fluency is also required.

The source needs to be Google Ads data transferred to BigQuery using the BigQuery Data Transfer service.

Using the model might overwrite your BQ data if you have tables named as the tables in the model (“campaignData”, “adGroupData”, “creativeUrl”, “textAds”). Also, it might incur some storage costs, so as always: please use the model only if you know what you’re doing.

The model

You’ll find the full model here: https://github.com/patbuczynski/googleAds-dbt along with basic documentation and configuration instructions.

All of the tables in a model contain a standard set of metrics: Impressions, Clicks, Cost, Conversions, CTR, Conversion Rate, CPC, Cost Per Conversion. They need to have the same set of time columns: Year, Month, Week and have both the names and id’s of dimension they describe. But the most important requirement is very obvious: the data must be exactly matching what you see in Google Ads.

The model creates 4 tables (campaignData, adGroupData, textAds, creativeUrl) based on some arbitrary choice. Here I’ll focus on AdGroupData only. It should become more evident later on, but logic used to create the tables is very universal. Working with just one example should give you enough intuition to build your own models with ease. Sounds fun, right?

Step 1: Understand the data structure

Even though it looks complicated at a first glance, the data structure of Google Ads data in BigQuery revolves around only 2 types of tables:

meta tables (i.e. p_AdGroup, p_Ad, p_Keyword)

stats tables (i.e, p_AdGroupBasicStats, p_AdGroupConversionStats, p_CampaignStats)

They are applied to different dimensions (just a Google Ads entity like AdGroup, Campaign, Keyword, Budget or Placement).

This has a lot of very important implications, understanding of which will suddenly make working with Google Ads data in BQ much easier:

As a rule of thumb each of those dimensions will have a meta table and at least one stats table. You can expect some inconsistencies (i.e. meta table for AccountStats_ tables is called p_Customer) and exceptions like standalone p_PaidOrganicStats table which doesn’t come with a meta table. But the rule applies to most of the tables and the most important ones.

To some degree you can expect the same kind of metrics in every table of the same type:

BasicStats tables will contain most typical Google Ads metrics (clicks, impressions, conversions) for certain dimension.

ConversionStats will contain more detailed conversion information (like conversion types, values and categories) for certain dimension. NonClickStats will contain mostly video-related information (and that’s why they are available for some dimensions only)

and so on. So if you want to get basic clicks and cost stats for AdGroups you can guess that you’ll find it in AdGroupBasicStats table.

Any descriptive data like the campaign names, ad group names, keywords are available only in meta tables. Stats tables only contain the IDs (CampaignStats contains Campaign ID and so on). Also, as you probably expect, meta tables do not store any stats.

All of this paints a picture of star-schema that needs to be normalized. Or, in plain English, we will need to get the metrics and the ID from one table and get the names or other descriptive data from other one.

Step 2: Let’s do some lookups

Now we know we’ll need to join the data from Stats and Meta tables to get names of ad groups and other dimensions, we can create simple lookups to use in our future models.

AdGroupLookup.sql

This is the staging model that gets you a lookup table of AdGroupId and AdGroupName. Sadly, this also is a first demonstration of underlying Google Ads logic that we need to guess and recreate. The LAST_VALUE bit is needed in order to get only the last name of the Ad Group. Otherwise, if the AdGroup name was changed at one point, the result will contain duplicate rows.

Other lookups are to be found in the Staging folder and follow the same or very similar logic.

Step 3: Get metrics

In the requirements we’ve outlined that those metrics are needed:

Impressions

Clicks

Cost

Conversions

CTR

Conversion Rate

CPC

Cost per Conversion

There are 8 of them, but only Impressions, Clicks, Costs, Conversions need to be taken directly from the source. Rest can be calculated. But that won’t change the fact that we’ll need to create 3 separate staging models to get them:

adGroupBase.sql

This gets cost and clicks from a base table. Nothing surprising, model uses some Jinja macros to get the cost and date values. You can look them up in the macros folder.

adGroupConversions.sql

This is required if you need to have a number of conversions split between different defined conversion names. The names of conversion need to be the same as the ones defined in Google Ads and should be specified in the dbt_project.yml file.

adGroupImpressions.sql

In theory we could get the impressions in the base query, just as we did with cost and clicks. But it’s one of those times when the underlying Google Ads logic comes into play. If you’d look into the source table you’d see something like this:

The Impressions column is repeated across different click types (which tell you if a user clicked in the main area of the ad or the sitelink extension). This makes some sense as splitting the number of impressions based on following click could be very hard conceptually. But we need to deal with that, otherwise our number of impressions will be nowhere near the true one. This is what WHERE clickType = “URL_CLICKS” does, and this is why we have separate staging model for impressions (applying the same logic to clicks would result in deflated numbers).

Having done that we’re done with stagings which means that we can…

Step 4: Get it all together

adGroupData.sql

After joining and calculating metrics that needed to be calculated we’re done. There’s nothing overly complicated here and those few steps seem to be a decent framework for working with Google Ads data in BigQuery. If you check other parts of the model, you’ll see that they follow the exact same logic, creating staging models that get metrics, lookup variables to get the names and merging them in one final query.

That approach should help you get most of what you need to analyze your campaigns. With one reservation: the dreaded underlying Google Ads logic. Sometimes it might be hard to guess what is missing, but there are two directions that I can point you to whenever you have one of those frustrating moments when you have no idea what’s wrong:

1) Just as in the impression case, sometimes the logic of Google Ads requires repeating some data in source tables. It used to be a problem when getting the cost data, which was repeated over Location (TargetLocation?) field but it seems to be fixed in the current export schema. However, if you encounter some strangely inflated numbers, it could be a good idea to check if you can find some suspicious dimension in the source table that can cause this problem.

2) In some cases the data seems to be built based on the backend Ads logic and does not follow common understanding of the tool. This is evident when it comes to keywords — they are one of the most important features of Ads and have separate place in the interface with a high visibility. But, presumably, in the backend they are treated as just one of possible targeting criteria. This finds a representation in the exported data. You won’t find a “KeywordName” field similar to the one you can use to get the Ad Group name. Instead, you need to query “Criteria” field (and “CriteriaId” if you need keyword ID for joins). In general, trying to predict how the Google Ads data is built on the other side might help, but to do that you probably need to have some experience working with the tool.

What’s next?

You can get the full model from Github, update the variables in the dbt_project as specified in configuration instructions, run it and get all of the tables.

But this is just a demonstration that can be used for very specific cases. You’ll be much better off getting familiar with the logic and adjusting the model to get any Google Ads data much faster and easier. You can make it much more optimal, add other dimensions or metrics, make it customized and fitting to your needs.

The wonder of DRY approach for data transformations is that once you get through the first phase of building the model, adding new stuff on the top of that is extremely easy, especially compared to digging into huge SQL queries. If you’re not using dbt (or other tool supporting similar approach) for your data transformations, maybe it’s time to consider it.