Read: Your data is forever

But when the parameters of his research changed, Tomasi admits he didn’t inform the IRB. For minor changes, that’s allowed. But Tomasi got permission to record indoors, not outdoors. And more significantly, he promised to allow access to the database only upon request. Instead, he opened it to anyone to download, he admitted to the Chronicle. “IRB is not to be blamed, as I failed to consult them at critical junctures. I take full responsibility for my mistakes, and I apologize to all the people who were recorded and to Duke for their consequences,” his statement reads.

Duke ultimately decided to delete the data set related to the research. Stanford did the same thing with a similarly derived data set its researchers created from patrons filmed at a San Francisco café. At UCCS, where researchers recorded students to test identification software, the lead researcher says the team never collected individually identifying information. Researchers for the Stanford and UCCS projects didn’t respond to requests for comment. In separate statements, each university reiterated that ethics boards approved all research, and underscored its commitment to student privacy.

But the problem is that university ethics boards are inherently limited in their scope. They oversee certain, narrow aspects of how research is conducted, but not always where it ends up. And in the information age, the majority of academic research goes online, and what’s online lives forever. Other researchers, unbound by IRB standards, could download the data set and use it how they wish, introducing all manner of consequences for people with no way of being informed or offering consent.

Those consequences can be far beyond what researchers imagine. Adam Harvey, a countersurveillance expert in Germany, found more than 100 machine-learning projects across the globe that cited Duke’s data set. He created a map that tracked the spread of the data set around the world like a flight tracker, with long blue lines extending from Duke University in every direction. Universities, start-ups, and institutions worldwide used the data set, including SenseTime and Megvii, Chinese surveillance firms linked to the state repression of Muslim minorities in China.

Every time a data set is accessed for a new project, the intention, scope, and potential for harm changes. The portability and pliability of data meet the speed of the internet, massively expanding the possibilities of any one research project, and scaling the risk far beyond what any one university can be held accountable for. For better or worse, they can only regulate the intentions of the original researchers.

The federal government’s Office for Human Research Protections explicitly asks board members not to consider “possible long-range effects of applying knowledge gained in the research.” Instead, they’re asked to focus only on the subjects directly involved in a study. And if those subjects are largely anonymous people briefly idling in a public space, there’s no reason to believe they’ve been explicitly harmed.