Alright. Now, when the web is nearly overburdened with the abc solutions simply explaining how to use built-in analytics tools brought almost to perfection by Google engineers, I guess it’s time for some really advanced analytics stuff. So, in this post I will guide you step-by-step how to export data from Google Analytics to Google Bigquery.

Hopefully, this is what you get if you implement the solution correctly. This is a table containing pure nonsampled Google Analytics data!

Post content

Before we begin

I’d like to express my profound gratitude to Alexander Eroshkin, a project manager at Yandex and the author of the solution described in this post, for his comprehensive aid while implementing and testing this solution!

I presume there’s no need in describing the advantages you gain when you use Google Bigquery for big data analytics. However, there are certain use-cases you might find valuable for your analytics tasks/goals. You can find them here.

Required tools

As a matter of fact, there are not many tools required to export data from Google Analytics to Google Bigquery. All you need is the following:

Google Analytics account Google Cloud platform account and of course Google Tag Manager account

The solution described in this post involves the use of GTM, so if you have GA snippet hard-coded, leave a request message in the comment section, and I will send you a detailed implementation instruction.

A step-by-step guide for exporting raw data from Google Analytics to Google Bigquery

Now, when we’re done the preliminary part, let us linger no more and delve right into it! Please remember that the solution is not the easiest web-analytics tool in the world to implement/use. So, should any issues/questions arise, please let me know in the comment section.



Step 1: create GCP account

If you don’t have an account at Google Cloud Platform (GCP), go and get one right here.

Please note that you should use real bank card details. Don’t worry, you’re NOT going to be charged without your permission. Check GCP pricing for more details.

Step 2: download and install Python

If you don’t already have it, download and install Python 2.7

Step 3: download project’s zip

Go to Alexander’s github and download the zip file of the project. With the help of this app we’re going to export data from Google Analytics to Google Bigquery.





Step 4: download Google App Engine SDK for Python

One last thing to download is the GAE SDK for Python. You can find it by clicking on the “Or, you can download the original App Engine SKD…” link.





Step 5: create a new project in GCP console

Try to give your project a descriptive yet short and readable name. Perhaps, something like mysite-ga-datastreaming. We’re going to use that name for our app settings.





Step 6: unzip the project you’ve downloaded on step 3

Unzip the content of the project and then paste all the content to a folder named after you project’s id in GCP. So, according to the recommendation given in step 5, the folder should be named like “mysite-ga-datastreaming”.



Step 7: configure the app.yaml file

Open the app.yaml file (you can find in the folder with the app content) and type in your project’s id just like it is shown below.





Step 8: create a custom js variable in order to export data from Google Analytics to Google Bigquery

In my previous post I shared a custom js variable that allows you to export data from Google Analytics to Google Bigquery (or anywhere you like). We’re going to need it right now!

So, copy carefully the JS content below, then create a custom js variable and paste the code right into it.

IMPORTANT! Paste your project id instead of “your-project-id-here” fragment. Also, if you want to have GA clientId to be sent with each hit, create a custom dimension and change the cd’s index right in this place '&cd1=' + model.get("clientId"); to whatever index you’ve received after creating the dimension.

Here’s the code

function(){return function(t){function e(t,e){var n=!1; if("onload"in new XMLHttpRequest)try{var o=new XMLHttpRequest; o.open("POST",e,!0),o.setRequestHeader("Content-Type","text/plain"), o.send(t),o.onreadystatechange=function(){ if(4==this.readyState)return 200!=this.status?n:void(n=!0)}} catch(t){}else try{var r=new XDomainRequest;r.open("POST",e), setTimeout(function(){r.send(t)},0),r.onerror=function(){return n}, r.onload=function(){n=!0}}catch(t){}return n}function n(t,e){ var n=!1;try{document.createElement("img").src=e+"?"+t,n=!0} catch(t){}return n}function o(t,o){var r; if(!(2036>=t.length&&n(t,o))){r=!1;try{navigator.sendBeacon&& navigator.sendBeacon(o,t)}catch(t){}}return r||n(t,o)||e(t,o)} var r=t.get("hitPayload")+"&cd1="+t.get("clientId"); o(r,"https://your-project-id-here.appspot.com/collect"), o(r,"https://www.google-analytics.com/collect")}} 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 function ( ) { return function ( t ) { function e ( t , e ) { var n = ! 1 ; if ( "onload" in new XMLHttpRequest ) try { var o = new XMLHttpRequest ; o . open ( "POST" , e , ! 0 ) , o . setRequestHeader ( "Content-Type" , "text/plain" ) , o . send ( t ) , o . onreadystatechange = function ( ) { if ( 4 == this . readyState ) return 200 != this . status ? n : void ( n = ! 0 ) } } catch ( t ) { } else try { var r = new XDomainRequest ; r . open ( "POST" , e ) , setTimeout ( function ( ) { r . send ( t ) } , 0 ) , r . onerror = function ( ) { return n } , r . onload = function ( ) { n = ! 0 } } catch ( t ) { } return n } function n ( t , e ) { var n = ! 1 ; try { document . createElement ( "img" ) . src = e + "?" + t , n = ! 0 } catch ( t ) { } return n } function o ( t , o ) { var r ; if ( ! ( 2036 >= t . length && n ( t , o ) ) ) { r = ! 1 ; try { navigator . sendBeacon && navigator . sendBeacon ( o , t ) } catch ( t ) { } } return r || n ( t , o ) || e ( t , o ) } var r = t . get ( "hitPayload" ) + "&cd1=" + t . get ( "clientId" ) ; o ( r , "https://your-project-id-here.appspot.com/collect" ) , o ( r , "https://www.google-analytics.com/collect" ) } }

Next, add a sendHitTask field to each of your UA tags and use the variable we have just created as a value for it. Thus you will have all your Google Analytics raw data sent to Google Bigquery.





Step 8.1 (optional): modify your GA code snippet

Here’s what you need to do if you still have GA hard-coded on your site.

First off, change the gabq.js file so it contains your project id domain

function GaBqPlugin(tracker) { ga(function(tracker) { var originalSendHitTask = tracker.get('sendHitTask'); tracker.set('sendHitTask', function(model) { var payLoad = model.get('hitPayload'); originalSendHitTask(model); var gifRequest = new XMLHttpRequest(); // paste your project ID here var gifPath = "http://your-project-domain/collect"; gifRequest.open('get', gifPath + '?' + payLoad, true); gifRequest.send(); }); }); } ga('provide', 'gabqplugin', GaBqPlugin); 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 function GaBqPlugin ( tracker ) { ga ( function ( tracker ) { var originalSendHitTask = tracker . get ( 'sendHitTask' ) ; tracker . set ( 'sendHitTask' , function ( model ) { var payLoad = model . get ( 'hitPayload' ) ; originalSendHitTask ( model ) ; var gifRequest = new XMLHttpRequest ( ) ; // paste your project ID here var gifPath = "http://your-project-domain/collect" ; gifRequest . open ( 'get' , gifPath + '?' + payLoad , true ) ; gifRequest . send ( ) ; } ) ; } ) ; } ga ( 'provide' , 'gabqplugin' , GaBqPlugin ) ;

Next, modify GA code snippet so it has a link to the plugin AND and the plugin itself. This is how it should look like

<script> (function (i, s, o, g, r, a, m) { i['GoogleAnalyticsObject'] = r; i[r] = i[r] || function () { (i[r].q = i[r].q || []).push(arguments) }, i[r].l = 1 * new Date(); a = s.createElement(o), m = s.getElementsByTagName(o)[0]; a.async = 1; a.src = g; m.parentNode.insertBefore(a, m) })(window, document, 'script', 'https://www.google-analytics.com/analytics.js', 'ga'); ga('create', 'UA-XXXXXX-X', 'site.ru'); // plugin required ga('require', 'gabqplugin'); ga('send', 'pageview'); </script> <script async src="http://your-project-domain.appspot.com/js/gabq.js"></script> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 <script> ( function ( i , s , o , g , r , a , m ) { i [ 'GoogleAnalyticsObject' ] = r ; i [ r ] = i [ r ] || function ( ) { ( i [ r ] . q = i [ r ] . q || [ ] ) . push ( arguments ) } , i [ r ] . l = 1 * new Date ( ) ; a = s . createElement ( o ) , m = s . getElementsByTagName ( o ) [ 0 ] ; a . async = 1 ; a . src = g ; m . parentNode . insertBefore ( a , m ) } ) ( window , document , 'script' , 'https://www.google-analytics.com/analytics.js' , 'ga' ) ; ga ( 'create' , 'UA-XXXXXX-X' , 'site.ru' ) ; // plugin required ga ( 'require' , 'gabqplugin' ) ; ga ( 'send' , 'pageview' ) ; </script> <script async src = "http://your-project-domain.appspot.com/js/gabq.js" > </script>

Please mind that this step is required only in case you have GA hard-coded.

Step 9: GCP auth set-up

Now go hit this page – https://console.cloud.google.com/apis/credentials – and set the auth configuration as shown on screenshots below.





Step 10: create a new dataset in the Bigquery console

Create a new dataset in your project. See the screenshot below.





Step 11: set dataset and project id names

Open the bqloader.py file and paste the names of your project id and dataset just like it shown below. In addition, you can aslo rename the table or leave it as it is in the file.

Finally, we’re done with the setup. Now it’s time for the action!



Step 12: exporting your app

Open the Google App Engine Launcher and then click File -> Add Existing Application. Set all the required fields like it is shown below.

Next, recheck if everything is set correctly. If it is, click deploy. In case you did everything right, this is what you should see in the console.





Step 13: creating a data table for your Google Analytics data

Now, here’s a final step. Open your browser and go to the following page:

[Project ID].appspot.com/tasks/create_bq_table

By entering this page you trigger the creation of the table for your GA data. It should say “ok”.

Step 14: be patient

You won’t see data immediately appearing in the table. Since it usually takes from a couple of minutes to an hour for data to appear, please be patient and don’t beat false alarm!

Alright. Can you see it now? Yes? YES?! YEAH!





Reasons to export data from Google Analytics to Google Bigquery

As a result, you get a table containing all the raw Google Analytics data. Not only this allows you to solve a couple of common GA issues, but also gain a really helpful insight on what is really going on there on the site. Here’s a short list of advantages you receive.

No more data sampling! This is a huge step-up for those sites having really heavy traffic on a daily basis. An opportunity to build ANY reports with ANY segments using ANY configuration you like Implement big data analytics tools for your site or mobile app Visualize data ANYWHERE you like (not only in Google Data Studio) Should any issues occur with you GA property, you will always have a back-up data source

Certainly, there are some challenges you should be ready for. Say, composing advanced SQL queries. But I’m sure if you got this far, that won’t be a problem for you!



Working with the table

In conclusion, I’d like to show you a couple of simple queries you can use for mining precious insights.

So, here’s a query showing a number of page views your site received for certain dates.

select count(type) as pageviews, date(sec_to_timestamp(timestamp)) as hitDate from [mydata:mysite_ru_data_streaming.GoogleAnalyticsData] where date(sec_to_timestamp(timestamp)) in ('2017-07-28', '2017-07-29') and type = 'pageview' group by hitDate; 1 2 3 4 5 6 7 8 9 10 11 select count ( type ) as pageviews, date (sec_to_timestamp( timestamp )) as hitDate from [mydata:mysite_ru_data_streaming.GoogleAnalyticsData] where date (sec_to_timestamp( timestamp )) in ( '2017-07-28' , '2017-07-29' ) and type = 'pageview' group by hitDate;

And here’s another query, more complexed, showing numbers of pageviews, users and sessions in a single table.

select count(htable.type) as pageviews, count(distinct htable.clientId) as users, count(stable.sessionId) as sessions from [mydata.streaming_20170704] as htable join [mydata.session_streaming_20170704] as stable on htable.clientId = stable.clientId where htable.type = 'pageview'; 1 2 3 4 5 6 7 8 9 10 11 select count (htable. type ) as pageviews, count ( distinct htable.clientId) as users, count (stable.sessionId) as sessions from [mydata.streaming_20170704] as htable join [mydata.session_streaming_20170704] as stable on htable.clientId = stable.clientId where htable. type = 'pageview' ;

These queries are just a very simple example of how you can use you SQL knowledge in order to gain data you need. If you want more complexed (interesting!) examples, let me know in the comment section below.



Summary

WOW! Now this is what I call a detailed step-by-step guide. Seems like this post turned out to be the longest one for the whole blog.

As always, if you’re having any troubles while implementing this solution, let me know in the comment section! This is not an easy-to-do one, but the result you get in the end does worth it!