Scholars have mined data on Facebook behavior for years

In July 2014, a team of four Swedish and Polish researchers began using an automated program to better understand what people posted on Facebook.

The program, known as a “scraper,” let them log every comment and interaction from 160 public Facebook pages for nearly two years. By May 2016, they had amassed enough information to track how 368 million Facebook members behaved on the social network. It is one of the largest known sets of user data ever assembled from Facebook.

“We’re concerned about how easy it was to collect this,” said Fredrik Erlandsson, one of the researchers and a lecturer at the Blekinge Institute of Technology in Sweden. In December, he and his colleagues published a research paper in the journal Entropy detailing how their methods of trawling social media sites could be replicated.

For more than a decade, professors, doctoral candidates and researchers from academic institutions around the world have harvested information from Facebook using techniques similar to those of Erlandsson and his team. They have compiled hundreds of Facebook data sets that captured the behavior of a few thousand to hundreds of millions of individuals, according to interviews with more than a dozen scholars.

Their practices came to light in March when the New York Times and the Observer of London reported that Aleksandr Kogan, a University of Cambridge psychology professor, had obtained the data of up to 87 million Facebook users through a quiz app. Kogan sold the information to Cambridge Analytica, a political consulting firm with ties to the Trump campaign so it could build psychographic profiles of American voters. Last week, Cambridge Analytica said it would cease operations after the uproar over its use of personal information.

But while what happened with Kogan’s Facebook data set is now known, the fate of other information hoards is murkier. In many cases, the data were used for research or scholarly articles. The information was then sometimes left unsecured and stored on open servers that offered access to anyone. Some academics said the data could have been easily copied and sold to marketers or political consulting firms.

MBA BY THE BAY: See how an MBA could change your life with SFGATE's interactive directory of Bay Area programs.

The potential result is more leakage of Facebook users’ information through academic circles, said Rasmus Kleis Nielsen, a professor of political communication at the University of Oxford who has studied data collection from Facebook.

“The academic world is highly decentralized, and each individual, each institution, has a different way of securing their data,” Nielsen said. “Even if almost everyone in the academic community is careful and protects the data, you still can end up in a situation where someone is careless or acts in bad faith and sells access. It’s hard to imagine how Facebook stops that from happening.”

The Times reviewed half a dozen Facebook data sets compiled by academics from 2006 to 2017. One, gathered from 2015 to 2017 by researchers in Denmark and New Zealand, examined 1.3 million people in Denmark — about a quarter of the country’s population — to determine how liking one political page on Facebook could predict how someone would vote in the future. Another set, from 2013, by a group of Norwegian academics focused on the civic engagement of 21 million Facebook members on four continents.

The Danish research team did not respond to a request for comment. Petter Bae Brandtzaeg, one of the Norwegian researchers, said he understood concerns about data gathering.

“As a researcher you get immediate access to people’s behavior, attitudes, feelings and relationships, which are of course tempting for all,” he wrote in an email. He said many researchers lacked the technical expertise to properly secure data.

The data were typically amassed through scraper programs that crawled Facebook to document what was posted, or through quiz apps that requested access to people’s profiles. The results included users’ locations, interests, political affiliations, Facebook interactions and even music preferences.

In most cases, researchers assigned numbers to people whose Facebook information they had obtained to maintain anonymity. But the more data there are, the easier it is to overlay one information set with another to identify someone. One 2015 paper published in the journal Science looked at credit card spending data and found that data scientists could pinpoint 90 percent of the shoppers by name with just four random pieces of information from sites like Facebook, Instagram and Twitter.

Once people are identified and their interests and interactions known, they can be targeted with advertising and mobilized for political campaigns or other causes.

For years, Facebook had no specific policies about academics’ access to user data, though it had guidelines on working with third parties. While the Menlo Park company has a rule that forbids the use of scrapers, it has not enforced that policy against scholars. And at times, it has assisted researchers with studies.

In 2014, though, Facebook began limiting third-party apps, like quizzes, from obtaining users’ information.

Since Kogan’s actions were revealed, Facebook has made further changes. The company has given people more control over their privacy settings. It has said it will audit all apps that collected large amounts of Facebook data, and it temporarily stopped allowing new apps to gather information.

Last month, Facebook also narrowed the number of academics it would work with, saying it would collaborate with those who wanted to research the effect of social media on elections through an “independent election research commission.” Only scholars with election-related projects can apply.

“We are taking a hard look at the information apps can use when you connect them to Facebook, as well as other data practices,” Facebook spokeswoman Susan Glick said. “These other data practices include academic research.”

One of the earliest known academic Facebook data sets was collected in 2006 by Harvard professors. It covered 1,700 people who agreed to have their Facebook information anonymously analyzed. The data were later easily traced back by other academics to Harvard freshmen.

In Britain, researchers were doing similar work. In 2007, Michal Kosinski, then deputy director at the Psychometrics Center at the University of Cambridge, worked with colleague David Stillwell to create My Personality, a quiz app that offered to assess people’s personalities in exchange for data about them. It was one of the first times a quiz app had been used for obtaining Facebook members’ information.

My Personality has now collected details on more than 6 million Facebook users, according to the academics who have gathered the data. Many researchers have since copied the quiz app method, including Kogan.

Sheera Frenkel is a New York Times writer.