The idea of “Big Data” is in the air. At the South by Southwest Interactive conference last month, it was probably the hot topic, dominating or surfacing in numerous panels, including one on which I spoke, on “Big Data: Privacy Threat or Business Model?”

Let’s be clear on what we’re talking about. The term refers to something more specific than the general fact that companies and government agencies are collecting lots of personal information about people. What it refers to is the fact that once you store up huge amounts of information, you can mine those databases to discover subtle patterns, correlations, or relationships that our brains can’t perceive on their own because the scales involved are beyond our ability to process (either the time scales at work, or the sheer number of data points).

Such data mining has been called “the macroscope”—like a telescope or microscope, making things visible to us that have never been visible before.

In many ways Big Data is just a new buzzword for data mining, which we and others have been grappling with since not long after 9/11. The New York Times, for example, wrote about it using the term “data mining” in this 2007 piece.

A more recent (and much-discussed) article by Charles Duhigg in the New York Times offers a good example to keep in mind during discussions of the subject. The piece described how Target identifies customers who are pregnant (sometimes before their own family members know) by tracking customers’ purchases and identifying patterns in their behavior. It then uses that insight to sell them baby-related goods.

And that may be only the beginning. What else might companies be identifying? Customers who are showing signs of Parkinson’s, or diabetes, or depression? I suppose that could actually be helpful if they notified you of their findings. But, there’s reason to think they won’t—Target found that it sort of freaks out their customers when they reveal what they know, so according to Duhigg the company has taken to hiding its ads to pregnant women among other “decoy” ads for things like lawnmowers so that the targets of the ads will think they’re just receiving the same flyers as everyone else.

Okay so that’s all pretty spooky. But if we were to put our fingers on the precise privacy problems with Big Data, what are they?

They include:

It incentivizes more collection of data and longer retention of it. If any and all data sets might turn out to prove useful for discovering some obscure but valuable correlation, you might as well collect it and hold on to it. In long run, the more useful big data proves to be, the stronger this incentivizing effect will be—but in the short run it almost doesn’t matter; the current buzz over the idea is enough to do the trick.

When you combine someone’s personal information with vast external data sets, you can infer new facts about that person (such as the fact that they’re pregnant, or are showing early signs of Parkinson’s disease, or are unconsciously drawn toward products that are colored red or purple). And when it comes to such facts, a person a) might not want the data owner to know b) might not want anyone to know c) might not even know themselves. The fact is, humans like to control what other people do and do not know about them—that’s the core of what privacy is, and data mining threatens to violate that principle.

Many (perhaps most) people are not aware of how much information is being collected (for example, that stores are tracking their purchases over time), let alone how it is being used (scrutinized for insights into their lives). The fact that Target goes to considerable trouble to hide its knowledge from its customers tells you all you need to know on that front.

Big data can further tilt the playing field toward big institutions and away from individuals. In economic terms, it accentuates the information asymmetries of big companies over other economic actors and allows for people to be manipulated. If a store can gain insight into just how badly I want to buy something, just how much I can afford to pay for it, just how knowledgeable I am about the marketplace, or the best way to scare me into buying it, it can extract the maximum profit from me.

It holds the potential to accentuate power differentials among individuals in society by amplifying existing advantages and disadvantages. Those who are savvy and well educated may get improved treatment from companies and government – while those who are poor, underprivileged, and perhaps already have some strikes against them in life (such as a criminal record) will be easily identified, and treated worse. In that way data mining may increase social stratification.

Data mining can be used for so-called “risk analysis” in ways that treat people unfairly and often capriciously—for example, by insurance companies or banks to approve or deny applications. Credit card companies sometimes lower a customer’s credit limit based on the repayment history of the other customers of stores where a person shops. Such “behavioral scoring” is a form of economic guilt-by-association based on making statistical inferences about a person that go far beyond anything that person can control or be aware of.

Its use by law enforcement raises even sharper issues—and when our national security agencies start using it to try to spot terrorists, those stakes can get even more serious. We know too little about how our security agencies are using Big Data, but such approaches have been discussed since the days of the Total Information Awareness program and before—and there is strong evidence that it’s being used by the NSA to sift through the vast volumes of communications that agency collects. The threat here is that people will be tagged and suffer adverse consequences without due process, the ability to fight back, or even knowledge that they have been discriminated against. The threat of bad effects is magnified by the fact that data mining is so ineffective at spotting true terrorists.

Over time such consequences will lead to chilling effects, as people become more reluctant to engage in any behaviors that will put them under the macroscope (more about that in a future post).

Update:

Subsequent post on the potential chilling effects of big data