Google wants to know what you search for, and plenty of people have wondered why. The company's global privacy counsel, Peter Fleischer, recently posted an explanation to this question of Google's official blog, and his answers are quite simple: logging leads to better search, less fraud, and government compliance. Nothing evil about that, is there?

Two months ago, Google announced a plan to anonymize its logs, but only after retaining the data for 18 to 24 months. After that time, user searches will still be stored, but it should be impossible to link search queries up with individual users. Of course, this is what AOL researchers thought when they released their own search logs, but queries often turn out to be highly specific things... the sort of things that can eventually be used to identify individuals.

Commentators generally praised Google for at least taking steps to safeguard the privacy of information, but others wondered why Google truly needed to retain this information at all. According to Fleischer, log data is used to improve core Google search services, including the spell check component. "Google's spell checking software automatically looks at your query and checks to see if you are using the most common version of a word's spelling," Fleischer says. "If it calculates that you're likely to generate more relevant search results with an alternative spelling, it will ask 'Did you mean: (more common spelling)?' We can offer this service by looking at spelling corrections that people do or do not click on. Similarly, with logs, we can improve our search results: if we know that people are clicking on the #1 result we're doing something right, and if they're hitting next page or reformulating their query, we're doing something wrong." Sounds good—though it's not clear why this couldn't be done just as well with anonymous data.

The company also uses the information to deal with fraud and abuse. "Immediate deletion of IP addresses from our logs would make our systems more vulnerable to security attacks, putting the personal data of our users at greater risk," says Fleischer. "Historical logs information can also be a useful tool to help us detect and prevent phishing, scripting attacks, and spam, including query click spam and ads click spam."

But when it comes to the issue of government compliance, the argument gets less straightforward. Fleischer claims that retaining personal data for two years is necessary because of European and US data protection laws, even though those laws do not yet exist. The EU's Data Retention Directive was passed in late 2005 but has yet to be implemented by the various member states (which have until 2009). The law requires each country in the EU to adopt a retention requirement of between six and 24 months.

"Since these laws do not yet exist, and are only now being proposed and debated," Fleischer says, "it is too early to know the final retention time periods, the jurisdictional impact, and the scope of applicability. It's therefore too early to state whether such laws would apply to particular Google services, and if so, which ones." Even though the laws are not yet in force in Europe and won't apply retroactively, Google still uses the law as an argument to retain data now, and to do so for the longest possible period the law provides for.

In the US, no general data retention laws have been passed, though the government has mooted numerous proposals for a two-year retention requirement to combat child pornography and other ills. Fleischer suggests that Google's behavior is proper because the government has simply "called for 24-month data retention laws."

In the past, the company has stood up for user privacy against Department of Justice subpoena requests, and it has adopted a comprehensive anonymity policy. But the company does itself no favors by engaging in some rhetorical sleight of hand and claiming that laws which don't yet exist ought to guide its current behavior; just admit that the reasons are business-related and be done with it.