Personal data harvested by marketers is growing so vast and far reaching that it is threatening to unleash a new wave of digital discrimination, one that ordinary people won't even be able to see happening, Microsoft principal researcher Kate Crawford is warning.

Kate Crawford speaking at MIT Media Lab. (Photo: Siemond Chan) More

Combining the troves of information collected by retailers, mobile carriers, Internet companies and others into massive databases creates so-called big data sets. Computers then troll the data looking for patterns that can be used to make predictions about consumer habits.

“Some people think that big data is really quite fantastic because you're working at a mass level and therefore you can't actually conduct group-based discrimination,” Crawford said, speaking at the EmTech conference at MIT this week. “It's actually quite the opposite. Big data is not color blind, it's not gender blind and, in fact, marketers are using big data to have ever-more precise categories about you.”

A recent study at Cambridge University looking at almost 60,000 people’s Facebook “likes” was able to predict with high degrees of accuracy their gender, race, sexual orientation and even a tendency to drink excessively. The model could tell a gay man from a straight man correctly 88% of the time and predict race with 95% accuracy, for example. Government agencies, employers or landlords could easily obtain such data, Crawford warns.

A lender, for example, who didn't want borrowers of a certain race could show online offers only to people whose social network activity fit certain parameters. Banks must report detailed statistics about their actual lending activity to regulators, but web advertising parameters are seemingly free of discrimination. By never putting offers in front of unwanted groups, and thus never formally rejecting them, those who engage in online discrimination could sidestep fair lending and redlining laws that apply in the physical world.

Most concern about data collection has focused on the government, particularly after the revelations from former National Security Agency contractor Edward Snowden. Crawford welcomed the increased skepticism following the Snowden leaks but warns there is much potential harm from commercial misuse of data, as well.

“It's not that big data is effectively discriminating -- it is, we know that it is,” says Crawford. “It's that you will never actually know what those discriminations are.”

Another problem can arise when collected data isn’t representative of the entire population. For example, well-off people are more likely to carry smartphones than the poor. Two years ago, the City of Boston released an app called Street Bump that automatically sends reports about potholes using data from smartphone sensors. But the city had to be mindful that reports were more likely to come from areas with higher phone ownership rates.

Big data predictions and pigeon-holing can also be harmful when wrong. A decade ago, some TiVo users spent weeks trying to convince their machines to stop recording shows aimed at demographic groups they weren't in. "If TiVo Thinks You Are Gay, Here's How to Set It Straight," read one Wall Street Journal headline from 2002. Mistaken algorithms today could scare off employers, college admissions officers or others screening candidates via big data. "If I predict something about you and I'm right, that can be just as dangerous as if I predict something about you and I am wrong," Crawford says.

Crawford also wants to temper the excitement around studying real-time Twitter activity to guide rescue efforts during natural disasters. A review of activity on the social network during Hurricane Sandy last year, for example, found that the peaks of activity occurred not in places with the most damage or need for help, like the outskirts of Queens and Staten Island, but in areas where Twitter use was most prevalent, like Manhattan.

Story continues