Google Trends is both fascinating science and a dangerous tool. The following example is lifted from Andrew Sullivan's blog.



First off, this is a supreme example of turning volumes of data into useful information. (If you data-mine, you'll understand the amount of work needed to generate something like this, in automated fashion.) The chart provides a comparison of the volume of traffic for different search keywords over time. The lines are sharp, and some well-chosen amount of smoothing is applied so that some spikes are seen but not too many. The concept of flagging certain "special" points is also admirable. No wonder this caught the attention of lots of marketers!

However, user beware! For unexplained reasons, all of the information required to interpret this chart is missing. The vertical scale is missing, which means that we do not know how many searches include the word "blog". While the relative gap between the lines is large, the absolute difference may in fact be tiny.



Also, what sample size was used? How were the samples selected? This gets even more tricky because Google then categorizes the results by cities, regions and languages. Do they have enough samples to make meaningful statements at that level of detail? Similarly, on the time scale, what kind of smoothing was employed?



The special flags, while a wonderful concept, fall flat in practice, highlighting the limitations of machine intelligence. On the right, I copied the headlines for the flags. You may also be bewildered at the choice: not a one has anything to do with comparing NYT and blogs.



Such half-baked tools are very dangerous indeed, as demonstrated by Andrew's comment. Andrew is one of the pioneers of news blogging who eloped from mainstream media, thus his bias is well known. Using this chart, he proclaimed: "They're [NYT] doomed."



Not so fast. It is unfair to "spread their votes" by using "new york times", "nytimes" and "ny times" as three separate entries. Besides, NYT is only one publication; pitting it against a world of blogs is absurd. Especially when the top 8 regions searching for "blog" are outside North America! (see the light blue bars on the right)



Meanwhile, this bar chart is also impossible to interpret. By "normalization", one assumes they are removing the effect of the total number of searches, or else the US will always end up at the top. Normalization is forever a double-edged sword: if you are the marketer, even if you see Peru as having the highest % of searches using "blog", you can't conclude that Peru is the market you should go after, since you may be worried just how widespread Internet/Google penetration is in Peru. By hiding the scale (again), Google Trends stubbornly remains just a toy.









