Adapting an existing Python script to pull bulk data from Google Search Console into Google BigQuery.

Update February 2020: I’ve written a whole new approach to getting data from Search Console into BigQuery, check out my new post; Revisiting Google Search Console Data into BigQuery.

Google now allows users of the Search Console API to pull back 16 months of data, making year-over-year comparison easy and saving us all from having to store this data ourselves in huge Excel spreadsheets.

You can explore this data through the new and improved beta administration panel (below), but you may also be interested in pulling this data out in bulk, to do some more advanced analysis.

I was in this situation, and came across a neat Python script which automates API queries and returns bulk results as .csv files. Here’s a link to this code on GItHub:

https://github.com/stephan765/Google-Search-Console-bulk-query

And also to the original blog post, which gives some really excellent advice on getting it up and running, activating the API and setting up OAuth credentials:

https://moz.com/blog/how-to-get-search-console-data-api-python

Awesome, so we can query all this data and get it all back in many CSVs — then what? Upload them all to BigQuery manually? It’s possible, but with one file per day and the new ability to query 16 months of data, nobody wants to be doing that!

I took the excellent work already done and simply added in the option to push the results into BigQuery too.

If your interested in pulling bulk Google Search Console data into Google BigQuery then check out my code on GitHub:

https://github.com/benpowis/Google-Search-Console-bulk-query-to-GBQ

In order to setup the BigQuery export open up the main .py file and replace the placeholder credentials with those of your account:

project_id = “PROJECT_ID”

= “PROJECT_ID” pkey = ‘MY_PRIVATE_KEY.json’

= ‘MY_PRIVATE_KEY.json’ my_table = ‘MYDATESET.MYTABLE’

= ‘MYDATESET.MYTABLE’ import_action = ‘append’ # ‘replace’ will overwrite the table if it exists, ‘fail’ will not overwrite the table if it exists.

The BigQuery table: Column headers are not included in the data sent to BigQuery, so make the table before you begin, with the following Schema:

Additional changes

Further details on the changes I have made to the original code:

We now import `pandas` and `gbq` moduels.

By default the code now creates a .csv output file AND pushes to BigQuery, if you want one of these processes removed simply comment out the appropriate row of the loop around line 285.

I’ve added some additional columns to the data, so by default you will get keyword , page , clicks , impressions , clickthrogh rate , average position .

, , , , , . Since all your data goes into the same table (by default), I have also added a Date column to the output.

That’s all, let me know how you get on pulling all your historic data out of Search Console!