Not many companies could get away with defending controversial data retention practices by saying that the data is needed to "learn from good guys, fight off bad guys, [and] invent the future." But that's how Google sees itself and its practices—not surprising from a company that would give itself an unofficial motto like "don't be evil."

I had the chance recently to sit down with two of Google's top privacy people: deputy general counsel Nicole Wong and security/privacy engineer Alma Whitten. While the "good guy/bad guy" and "don't be evil" quotes may seem too cute by half to some, Wong and Whitten made a strong pitch for the truth of both slogans. In their view, Google really is fighting the good fight when it comes to your online privacy.

Anonymization and its discontents

Google logs an astonishing amount of data, including the search logs from its flagship product. It keeps this data indefinitely, so searching for a combination of yourwife'sname and youraddress and "rat poison in her cereal" is not a particularly smart idea (though search users do this sort of thing anyway).

But the company does "anonymize" this data eventually. The last octet of the IP address is wiped after nine months, which means there are 254 possibilities for the IP address in question (.0 and .255 are reserved addresses). After 18 months, Google anonymizes the unique cookie data stored in these logs.

This isn't especially ambitious; Europe's data protection supervisors have called for IP anonymization after six months and competing search engines like Bing do just that (and Bing removes the entire IP address, not just the last octet). Yahoo scrubs its data after 90 days.

But Whitten, who was involved in Google's decisions on such issues, said that Google has done the best it can to keep the retention period to a minimum while still extracting maximum value from that data... and that this "value" isn't just to Google but also to users.

"Wonderful things that can be done with an abundance of data," she said. When Google's teams began looking at the data retention issue a few years back, they "started with zero" and tried to see if they could make it work. They could not; Google would lose the ability to do too many useful things.

Search data is mined to "learn from the good guys," in Google's parlance, by watching how users correct their own spelling mistakes, how they write in their native language, and what sites they visit after searches. That information has been crucial to Google's famously algorithm-driven approach to problems like spell check, machine language translation, and improving its main search engine. Without the algorithms, Google Translate wouldn't be able to support less-used languages like Catalan and Welsh.

Data is also mined to watch how the "bad guys" run link farms and other Web irritants so that Google can take countermeasures.

Google eventually settled on anonymizing the IP address after nine months, though even here, "we believe that we have lost the ability to do things," said Whitten.

Web users don't mind being tracked?

Instead of cutting the data retention period further, Google is more focused on 1) transparency and 2) keeping the data locked down safely. The company believes that when users know what Google keeps and why it keeps it—and when they have the chance to opt out—users are often happy to let Google do its thing.

Wong points to behavioral advertising, which Google jumped into last year. This sort of advertising relies on a vast ad network across many sites, and the ads record a visitor's unique cookie. Google can collate this data on the back end and compile a list of interest categories associated with a particular user cookie; since most users never clean their cookies, this works well as a general ad targeting mechanism.

When Google rolled out the system in March 2009, VP Susan Wojcicki said the things that advertisers always say on such occasions: this is good for consumers.

"We believe there is real value to seeing ads about the things that interest you," she wrote. "If, for example, you love adventure travel and therefore visit adventure travel sites, Google could show you more ads for activities like hiking trips to Patagonia or African safaris. While interest-based advertising can infer your interest in adventure travel from the websites you visit, you can also choose your favorite categories, or tell us which categories you don't want to see ads for."

Choosing your favorite categories—and opting-out of behavioral ads altogether—is made possible by Google's Ads Preferences manager. The site gets limited use; despite the hundreds of millions who use Google services or are served Google ads, only "tens of thousands" visit the Ads Preferences site each week, I'm told. One might assume that these would be the most motivated "opt-outers," those who actually understand what behavioral advertising is, know how it works, and hate it with a passion.

The Google folks insist that this isn't actually what happens when people visit the Ad Preferences page. Compared to the number of people who choose to opt out entirely, four times more people merely edit their categories, while ten times more people do nothing at all.

This could mean several things (are most users just confused about the options and simply do nothing?), but Google takes it as vindication of its willingness to be transparent about what it does, and its willingness to put users in control. Certainly, there are other companies that could take a page from the Google playbook. The Ads Preferences manager makes it simple to opt out with single click, but this only applies to one browser; Google has also built a browser plugin that can remember the setting across browsers and after cookie purges.

Given the sheer amount of hate directed at Google-owned Doubleclick that erupted in our recent comment thread on ad blocking, though, it looks like Google still has some ways to go before it convinces the geekerati that its opt-out behavioral targeting practices truly aren't "evil."

As Google services rack up increasing amount of data on users, the company's strategy for reassuring users is based on such transparency, user control, and data safety. Whitten stresses with pride that Google's data doesn't leak, and Wong notes how aggressively the company pushed back against a broad Department of Justice data request in 2005.

"We're not holding onto this frivolously," Whitten said. "It's fundamental to bring value to our users."