Netflix yesterday passed out a $1 million check to the stats geeks of "BellKor's Pragmatic Chaos," the team which won the Netflix Prize by improving on the company's algorithms for picking the movies that Netflix subscribers might like to watch next. Chief Product Officer Neil Hunt then announced a sequel, the Netflix Prize 2, but one law professor is already calling the new contest a "multi-million dollar privacy blunder."

The Netflix data set released for use with the first prize was supposedly "anonymized," but security researchers found a way to link the anonymous movie recommendations with data from other sites in order to identify individual Netflix users. If your information appeared in the first data set, someone knowing a bit about your movie preferences could figure out the complete set of movies that you rated on Netflix—an information leak that would certainly not please all users.

For the Netflix Prize 2, the company is releasing even more information, despite the security weaknesses of the first data set. Gender, ZIP Codes, ages, genre ratings, and more will be released in a data set with more than 100 million entries. That has law professor Paul Ohm seeing a bright Netflix red.

Ohm has worried for years about privacy breaches, and his newest paper argues that "anonymizing" data doesn't actually keep data anonymous. Given the tremendous advances in "reidentification" technology over the last few years, Ohm argues that the release of so much new data should actually constitute a "privacy breach."

"Researchers have known for more than a decade that gender plus ZIP code plus birthdate uniquely identifies a significant percentage of Americans," he wrote in a blog entry. "True, Netflix plans to release age not birthdate, but simple arithmetic shows that for many people in the country, gender plus ZIP code plus age will narrow their private movie preferences down to at most a few hundred people. Netflix needs to understand the concept of 'information entropy': even if it is not revealing information tied to a single person, it is revealing information tied to so few that we should consider this a privacy breach."

Because Netflix already knows about the privacy implications of its data releases, it can't simply claim ignorance, and in Ohm's view the company might well be liable for damages.

"The Video Privacy Protection Act (VPPA), 18 USC 2710 prohibits a 'video tape service provider' (a broadly defined term) from revealing 'personally identifiable information' about its customers. Aggrieved customers can sue providers under the VPPA and courts can order 'not less than $2,500' in damages for each violation. If somebody brings a class action lawsuit under this statute, Netflix might face millions of dollars in damages."

Netflix claims, "As with the first Netflix Prize, all data provided is anonymous and cannot be associated with a specific Netflix member." It's an odd thing to say, as the first data set was already de-anonymized, but the company is going ahead with it. This time, the "Netflix Prize 2 focuses on the much harder problem of predicting movie enjoyment by members who don't rate movies often, or at all, by taking advantage of demographic and behavioral data carrying implicit signals about the individuals' taste profiles."

Rather than wait for years before awarding a prize this time around, Netflix has decided to give $500,000 to the team in front after six months, and another $500,000 to whichever team is in the lead after a year and a half.