[This article was first published on, and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

At JSM 2011 today, three Google employees (amongst the more than 20 Google delegates there) gave a little insight into how statistical analysis with R yields better results for companies using Google’s various advertising products.

Bill Heavlin from Google kicked off the session with a talk about conditional regression models, a statistical technique at Google used to evaluate the factors that lead to user satisfaction of Google products, such as when users are surveyed on satisfaction with search reports, or when users are asked to rate YouTube videos. Google has graciously shared the fruits of Bill’s research by publishing an open-source R package for conditional regression.

Next up was Tim Hesterberg from Google, who talked about how Google determines the effectiveness of display ads for its customers. When a brand-name company places a display (or banner) ad on a popular website like ESPN.com or CNN.com, it can be hard to judge its effectiveness, because a small percentage of visitors will click on a display ad. But that’s not to say that a display ad won’t affect future purchasing behavior, for example by searching for “HTC” or visiting the HTC website a couple of days after seeing a display ad for an HTC phone. Using observational data from more than 10 million web users, Google compares the search behavior of people who were exposed to the display ad (i.e. those that never visited a web page displaying the ad) to similar users who did see the ad, to figure out how many additional people visit the advertiser’s web site as a result of seeing the display ad.

Tim was very clear in pointing out that no private information from any individual web user is used to make this determination, and that several techniques are used to minimise the bias inherent in using an observational, rather than experimental, process to make the estimate of additional visitors. (For example, Google tests the uplift of irrelevant “decoy” phrases, like searching for “wool socks”, to make sure no spurious benefit is detected.) Google runs hundreds of studies each month, using R software for the statistical analysis and visualization, to ensure that its advertisers are always getting the best bang for their marketing dollar.

Finally, John Vaver from Google discussed yet another method Google uses for ad effectiveness, this time with respect to the ads that appear alongside Google searches. For advertisers who buy ads around the world, an elegant statistical trick is used to determine how spending in a geographic region drives additional benefits (as measured by goal completement, such as ordering a product or signing up for a newsletter). By temporararily turning off ads in a given region, and cycling this through all the regions covered, Google can double up on data used to determine the effectiveness of the ad: once when the ad is turned off, and again when it’s turned back on again. This information is then combined to determine the overall effectiveness in the ad. Once again, R was used for the data analysis and visualization.

Overall, the session was a fascinating insight into how advanced statistical analysis on massive data sets, and the R statistical software system, is used by Google to help marketers get the best value out of their advertising.