Google Analytics is lying to you (massively)

20 Aug 2018 Edward Kay

Summary

My research reveals Google Analytics (and other tools) only receive a tiny fraction of real traffic levels. I’ve found over 78% of real site visitors are not being tracked.

This presents serious challenges when using such data to inform decisions.

What’s more, I’ve even seen Google Analytics report the opposite trend to reality. Analytics data is now too unreliable for any trend analyses.

In this article you’ll discover:

Real data from three separate membership organisations showing declines in tracked visits.

The reasons for these declines.

The data you can still trust, and the data you cannot.

Read on...

Initial data discrepancies

Year Out Group have been concerned their website visitor numbers in Google Analytics have been dropping year-on-year since 2013. This is despite strong search engine rankings and a new website launched in March 2018.

The new Year Out Group website is hosted with WP Engine. WP Engine provide detailed analytics on the site traffic based on their log files. These log file stats are processed to filter out non-human requests (e.g. search engine bots) from real visitors – just like Google Analytics tries to do.

I decided to compare the traffic reported by WP Engine’s log files with that in Google Analytics.

Web traffic analysis is complex. Each available data set will include some form of filtering or processing. I fully expected some variation between the two data sets in absolute terms, but thought the general trends would be broadly consistent.

I was wrong.

And this is where it gets interesting.

The WP Engine logs show a strong trend of visit numbers increasing, while Google analytics shows a continued gradual decline:

Year Out Group stats. Google Analytics shows a decline while WP Engine logs show a growing audience.

Some variation in the actual numbers is one thing.

But when the two sets of data are reporting fundamentally different trends, there are huge implications.

Digging deeper

The next step was to see if other clients with the same WP Engine logs available showed similar characteristics.

And they did.

I ran the same analysis for Scottish Association of Landlords and Professional Speaking Association. These two sites had the added advantage of being able to go back a little further with the hosting logs to provide a larger data set.

All three datasets show large differences between visitor numbers. Google Analytics is only capturing 20-80% of the visit numbers from the log analyses.

More crucially, the proportion of log file visits reported in Google Analytics is decreasing for all three sites. The sites with the highest percentage of log file visits in Google Analytics show the sharpest declines:

Proportion of log file visits recorded as Google Analytics sessions.

Note the downwards trend across three separate websites.



(The data points with ratios over 100% are due to Cloudflare caching. See notes for details).

Over the same time period, all three sites show strong growth in visits from the log file data while their corresponding Google Analytics data report visits as either static or in decline.

Possible causes

Incorrect data

We have to trust that data available from Google and WP Engine are accurate. There will always be an element of filtering and processing of these data.

I am working on the assumption that all such processing is applied consistently.

Ad blockers, privacy settings and network filters

Google Analytics relies on the user’s browser to send data to it: How Google Analytics works under normal circumstances

Clearly if this information is not sent, their usage will not be tracked in Google Analytics.

Many ad blockers – whose purpose is to hide adverts on the websites you visit – also block tracking and analytics services, including Google Analytics.

Firefox even includes a tracking protection option without the need for any extensions. When enabled, this explicitly stops data being sent a huge list of services, including Google Analytics.

Network administrators can also filter out traffic. Net neutrality laws (in the UK and EU at least) prevent telecoms companies from blocking adverts. These laws thwarted the attempts by mobile operator Three to block adverts at the network level. But there is nothing to stop corporate network managers filtering out such traffic before it reaches the public internet.

Visit data in server log files are not affected by the use of an ad blocker:

How ad blockers stop Google Analytics collecting data. Note how the log file data are not effected.



(Fun fact: the URL of this image initially included google-analytics , but was then blocked by my ad blocker!)

The use of ad blocking technology is difficult to quantify. Various reports from 2017 (the latest available) suggest ad blockers are used by between 11% and 58% of users.

A study by Jason Packer – How Many Users Block Google Analytics, Measured in Google Analytics – goes into much more detail on the challenges of measuring the number of visitors who are blocking Google Analytics. His experiments, albeit on relatively small sample sizes, suggest Google Analytics is under reporting real traffic levels by around 8-11%.

Another study – Ad Blockers Can Affect Analytics Reporting from Practical Ecommerce – builds on Jason Packer’s work and found up to 42% of visits were not being tracked by Google Analytics.

Implications

The implications of these findings are profound.

It is not just Google Analytics data that are being under reported. Other third-party services that rely on the user’s browser to send them data can be affected by ad blockers. These include:

Remarketing code – used to target ads to people who have already visited your site. e.g. Google Ads remarketing tags and Facebook Pixel.

– used to target ads to people who have already visited your site. e.g. Google Ads remarketing tags and Facebook Pixel. Marketing automation tools – used to manage marketing activity based on website interactions e.g. Drip, HubSpot, Pardot, Eloqua, ConvertKit, Infusionsoft

– used to manage marketing activity based on website interactions e.g. Drip, HubSpot, Pardot, Eloqua, ConvertKit, Infusionsoft Content personalisation and split-testing tools – used to adjust website content based on user data and/or to test content variations e.g. OptinMonster, RightMessage, Optimizely

– used to adjust website content based on user data and/or to test content variations e.g. OptinMonster, RightMessage, Optimizely Other web analytics tools – e.g. Mixpanel, Piwik, Segment, New Relic, CrazyEgg, Hotjar

– e.g. Mixpanel, Piwik, Segment, New Relic, CrazyEgg, Hotjar All code managed through Google Tag Manager

None of these tools should ever be considered completely accurate. There are many reasons why they don’t work all the time: e.g. dropped connections; users having Javascript disabled; someone using multiple browsers/devices; navigating to a new page before the code has executed. This has always been the case.

The problem now, based on these data, is that the proportion of our visitors for whom these tools actually work can be tiny. Less than 22% in the case of the PSA.

I now believe trend-based tools, including analytics, are now largely meaningless.

Tools that work on single sample points, e.g. heatmaps and visitor recordings, are less affected. We just have fewer data points to use.

The value in web analytics comes from using relative changes (i.e. trends) in the data to drive further action.

This works fine if we are confident the tools are capturing data from a large and consistent proportion of our audience. We have now shown this is not the case.

But there is more to it than this.

The proportion of visitors being tracked is not consistent. All of the datasets evaluated here show strong declines in the percentage of visitors being captured by Google Analytics – especially Scottish Association of Landlords and Year Out Group.

For Year Out Group in particular, the rate of decline in tracking exceeds the visitor growth rate. This means Google Analytics is showing a downward trend in visitors when visitor numbers in the logs are growing.

What to do about ad blockers impacting your analytics

It can be tempting to follow the path of various media outlets and put up a fight against ad blockers. But I believe that’s a losing battle.

I love my ad blocker. I’m not going to turn it off just so the sites I visit get better stats.

My motivations for using an ad blocker are better security, removing interruptions and improving speed. These mirror the findings from PageFair’s 2017 report into ad blocker usage. Only 6% cited privacy as a reason for blocking ads.

Threats from malware are increasing so the ad blocker is here to stay.

I recommend two routes of action:

1: Keep the analytics, but understand the limitations

Web analytics still have a place, despite these serious shortcomings. They still provide some useful data on the visits that are tracked, such as measuring the relative performance of different content, e.g. Google Ads.

We just need to remember that we’re only seeing a relatively small portion of the overall traffic.

The proportion of traffic not captured by your analytics is unique to your site. Running a log file comparison like this will provide an indication of the accuracy of your data.

2: Focus on the metrics that actually matter

Web analytics has long been a black hole. It is full of fascinating but often useless stats. It takes sustained discipline and effort to use the information effectively.

A far more effective strategy is improving the metrics that actually matter. For the membership sector these include:

Membership numbers

Membership renewal rate (or membership churn)

New member renewal rate – i.e. the number of memberships that continue beyond their first period

Event registrations

Event attendance

Enquiries received

Website logins

Use of self-service functions

Email opens and clicks

Member benefits used, such as: Member-only resources accessed Purchases made with member discounts



The good news is none of these metrics (with the possible exception of email opens) are impacted by the use of ad blockers.

Conclusions

This is a long article with a lot of information. To sum up the key points:

Standard tools used to measure website usage are missing lots of data due to the use of ad blockers.

Blocked tools include Google Analytics (which also reports conversions back to Google Ads), Google Tag Manager, Facebook Pixel and Google Ads remarketing code.

The proportion of visitors not being measured can be massive – over 78% in the case of the Professional Speaking Association for example.

There is a strong downward trend in the proportion of visitors that can be measured.

This presents challenges for all organisations. I recommend:

Understanding and monitoring the level of misreporting of your users . This will inform the level of trust you can have in any tools affected.

. This will inform the level of trust you can have in any tools affected. Focusing on the most important metrics . e.g. email clicks, membership numbers and event ticket sales. These are not affected by ad blockers.

. e.g. email clicks, membership numbers and event ticket sales. These are not affected by ad blockers. Consider using alternative technology. e.g. website log file analysis to understand website use. But only if these data provide insights that drive useful actions.

Notes

Cloudflare and caching

All three sites analysed are consistently using Cloudflare caching. This caching improves performance by serving popular resources directly rather than requesting them from WP Engine.

Cloudflare’s caching reduces the data in WP Engine’s logs but does not affect Javascript-based tools such as Google Analytics.

Real visitor numbers could therefore be much higher, making the difference even more pronounced.

Analytics in Cloudflare show the proportion of requests served from cached data for Professional Speaking Association, Scottish Association of Landlords and Year Out Group were 51%, 50% and 56% respectively.

Cloudflare cache analytics reports on all requests. They don’t filter out non-human traffic like Google Analytics and WP Engine do. These requests also include access to supplementary resources such as images.

Google Analytics spikes for Scottish Association of Landlords

The log file visitors to Google Analytics sessions plot for Scottish Association of Landlords shows a couple of spikes (early May and early July) where the proportion is greater than 100%. This seemed odd, so I looked into these data points more closely.

Both periods show a noticeable increase in traffic in Google Analytics that is not recorded in the server logs. I believe this is down to Cloudflare serving cached content directly.

The early May spike correlates with a lot of pre-GPDR member communication. These directed members to public web pages that are perfect for caching.

The early July spike is matched with a large increase in Google Analytics sessions on a single day: 7 July. 73% of sessions on this day were single-page (0 second) sessions from overseas. This is very different from the normal traffic patterns suggesting some form of unfiltered bot. The content accessed by this traffic was ideal for caching so would also have been served by Cloudflare not WP Engine.

Google Analytics view filters

Neither Professional Speaking Association nor Year Out Group have any view filters configured on the Google Analytics views used in this analysis.

Scottish Association of Landlords have a single filter to exclude traffic from their office IP address. This will result in slightly lower Google Analytics figures compared to their logs. Scottish Association of Landlords still have the largest proportion of visits tracked in Google Analytics out of all the sites analysed however.

Methodology

The process used for producing the graphs above was:

Download usage CSV for the site from the overview page in the WP Engine user portal. This contains daily summary data on site usage. Retain just the date and visitor count columns. Reverse the order of the data so it is listed chronologically. Review the monthly plan usage reports from WP Engine (also in the user portal). These reports show the number of human visits vs bots and other requests. Calculate the percentage of human visits to all visits for each month from the monthly plan usage reports. Add a new column in the spreadsheet next to visitors called ‘WP Engine human visits’. Populate this column as the visitor count × percentage of human visits for the corresponding dates. Download the daily session count report from Google Analytics for the same time period in CSV format. Paste the session numbers into a new column alongside the corresponding date. Add two new columns. Fill these with the 7-day rolling averages for both the ‘WP Engine human visits’ and ‘Google Analytics sessions’ data. This smooths out the spikes caused by differences in weekday vs weekend traffic levels. Add a final column for the percentage of WP Engine human visits tracked in Google Analytics. Use the 7-day rolling average figures for this calculation. Plot the data in a chart. Add a linear trend line.

Credits

Thank you to Year Out Group for the initial questions that led to this research and for allowing me to publish your data.

Thank you also to Professional Speaking Association and Scottish Association of Landlords for your kind permission allowing me use your data for comparison.

Icons in diagrams made by Smashicons from www.flaticon.com