For the past two years, researchers at Northwestern University have been analyzing the habits of tens of thousands of scientists—using Dropbox. Looking at data about academics' folder-sharing habits, they found the most successful scientists share some collaboration behaviors in common. And on Friday, they published their results in an article for the Harvard Business Review.

The study quickly attracted the notice of academics—but not for the reason Dropbox and the researchers had hoped. One sentence in particular caught readers' attention: “Dropbox gave us access to project-folder-related data, which we aggregated and anonymized, for all the scientists using its platform over the period from May 2015 to May 2017—a group that represented 1,000 universities." Written by Northwestern University Institute on Complex Systems professors Adam Pah and Brian Uzzi and Dropbox Manager of Enterprise Insights Rebecca Hinds, that wording suggested Dropbox had handed over personally identifiable information on hundreds of thousands of customers.

By Tuesday, Harvard Business Review had corrected that part of the article to say the data was anonymized and aggregated prior to being given to the researchers. “Before providing any Dropbox users’ data to the researchers, Dropbox permanently anonymized the data by rendering any identifying user information unreadable, including individual emails and shared folder IDs," a Dropbox spokesperson told WIRED. But while Dropbox's more than half a billion users can rest easy that their de-anonymized data isn't readily shared with researchers, the only consent Dropbox obtained from customers involved in the study was their agreement to its privacy policy and terms of service, according to representatives for Dropbox.

"Before sharing the activity data with NICO, we randomized or hashed the dataset and grouped it into wide ranges to further ensure that no identifying information could be derived," Dropbox elaborated. "In addition, our research partners at NICO are bound by strict confidentiality obligations." Northwestern's Pah supported that statement, telling WIRED that he and his team were never able to see any personal information or the content of any Dropbox folders or files. His team sent Dropbox citation information from the Web of Science—an index that ranks researchers according to how often their work is cited—which Dropbox then paired with folder data, anonymized and aggregated, and sent back for analysis.

Even if the personal names are removed, folder titles and file structures can potentially be used to identify individuals, according to Colorado University Boulder professor Casey Fiesler, who teaches in the Department of Information Science. In a blog post Dropbox's Hinds published on Friday, she appears to directly address that concern, writing "information like university ranks and number of citations were grouped into ranges," and representatives for Dropbox say the techniques they used to anonymize and aggregate the data would make reverse identification impossible, though they couldn't share details about how that process worked.

But it still appears this research was conducted without the express consent of the thousands of customers whose information Dropbox and the researchers accessed (the HBR article originally suggested that 400,000 users' data was analyzed, while Dropbox says that the study dealt with data from 16,000 customers). Late Tuesday HBR added a second editors' note indicating that the researchers started with information on 400,000 "unique users" but pared the data set down to 16,000 after incorporating data from Web of Science. HBR editors also updated the article to indicate that it wasn't 1,000 universities that were included, but rather 1,000 separate departments.

Informed consent, one of the cornerstones of academic research, is one of the things that got Facebook in so much trouble back in 2014 when it published results from its controversial “Emotional Contagion Study.” That study was never approved by an internal review board, which is tasked with maintaining ethical standards in research; since the data had already been collected by Facebook and was not identifiable, the university where it was conducted reportedly considered it IRB-exempt. Dropbox representatives said that the same was true for this study, because the data was delivered to the researchers deidentified.