in Tips and Tricks by

After observing an unusual phenomenon in a client’s Google Analytics account (after an hour or so and with the help of a co-worker) I discovered that you can detect when the Google Bot comes to your website from within Google Analytics.

The Mystery

If you don’t want to know my thought process, just skip to the conclusion (after all, the title gives away the culprit).

One Friday afternoon, when things were dying down from the week and I had a spare moment, I delved into Google Analytics. This is one of those things we SEOs always say we should do, but often have a hard time getting to. While diving into a client’s account, I noticed an unusually high amount of traffic- not that this is a problem but I wanted to see from where this traffic was coming, so I could see if I could get more. As I delved-in to Analytics I discovered that a lot of that traffic was from a Direct source. That’s unusual. What was even more strange was that the traffic was directly to specific interior pages.

Direct Traffic, as you know, is the result of people typing in a web address. However, the URLs were very specific (and very long) for someone to remember and type-in later. Direct Traffic can also come from bookmarks in a browser. These could be bookmark visits but there sure was a whole lot of them on one particular day and didn’t seem likely.

In my experience, sometimes Google Analytics will default to Direct Traffic if it is confused. I ran a Screaming Frog to make sure the GA code was on every page. It was.

Hmmmm…. what’s going on here?

I called-in a co-worker to brainstorm. She was also in a Friday kind of mood and had some time to brainstorm. After talking about the possibilities, something struck me.

A few months earlier, Google had announced a feature from where you could exclude Bots from your GA data. This post always struck me as strange: one of the advantages of Google Anaytics as a way to measure website traffic and interaction was that it excluded bots because it used javascript in its tracking. While other platforms couldn’t distinguish visitors and bots (side anecdote: I once had a client who thought they had a million visitors a day thanks to a server-side tracking platform who was counting bots alongside humans- they were very disappointed to find out how few people were actually coming to their site) because GA used javascript, this was easy. Why was GA, then, announcing you could exclude bots from your visitors?

Because of another innovation: the Google Bot was getting better at rendering javascript on web pages. While this innovation was important as more and more websites are using javascript to render parallax pages or built using frameworks like Node, the side-result is that the Google Bot is now executing the GA code and throwing a pageview.

The Conclusion

To see the traffic from the Google Bot I went to Audience > Technology > Network in GA and found a line for “google inc.” under “Service Provider”. I then created a Custom Segment that only showed pageviews from that particular Service Provider.

From there I could view all my GA data in light of only traffic for the Google Bot. This explained the mysterious direct traffic to specific pages on the site.

Curious I tool this a step further and went to Google Webmaster Tools. From there I viewed Crawl > Crawl Stats > Pages crawled per day.

When I compared this graph to Behavior > Site Content > All Pages in GA, I could clearly see the corresponding spikes, convincing me that I was indeed looking at the Google Bot’s visits in Google Analytics.

(I tried to scale these two graphs so you could see the clear comparison, but have failed. The graphs above are for this website, and it’s not as clear as it was in the original website I looked at- my client gets many more bot visits than my site. I hope you get the idea.)

Consequently I setup a Custom Intelligence Alert to email me every time the Google Bot comes to my website:

Interesting sidenote: I am seeing the Google Bot in my analytics (with the above custom segment) as early as Jan 4th of 2014, 5 months before Google’s announcement that it is improving it’s ability to read javascript and 7 months before the ability to exclude robots from GA. I’m curious to find out what other people are seeing here.

Okay, so that’s cool. What can you do with it?

I can think of a couple things:

When ever I launch a new website I’m always trying to fix as many errors as possible before the Google Bot comes for a visit and begins the re-indexing process. This can help me tell if I’m fixing the errors before the bot begins to index the website. By looking at the Behavior > Site Content > Landing Pages report in GA, with the above Google Bot custom segment selected, I can tell the pages on my site for which the Google Bot is coming most frequently. Heck, if I setup a Custom Report (and mail it to myself regularly) I can keep up with this. Could this tell me about referring websites that encourage the bot to visit me? Could this tell me which are the most important or valuable pages on my site, according to Google? I’m not sure, but that would be great. It’s just kinda cool to know that Google has come by for a visit. Stop by anytime. You’re always welcome.

How could/are you using this information? Share it with us, in the comments below.