Data and Goliath's Big Idea

Data and Goliath is a book about surveillance, both government and corporate. It’s an exploration in three parts: what’s happening, why it matters, and what to do about it. This is a big and important issue, and one that I’ve been working on for decades now. We’ve been on a headlong path of more and more surveillance, fueled by fear­–of terrorism mostly­–on the government side, and convenience on the corporate side. My goal was to step back and say “wait a minute; does any of this make sense?” I’m proud of the book, and hope it will contribute to the debate.

But there’s a big idea here too, and that’s the balance between group interest and self-interest. Data about us is individually private, and at the same time valuable to all us collectively. How do we decide between the two? If President Obama tells us that we have to sacrifice the privacy of our data to keep our society safe from terrorism, how do we decide if that’s a good trade-off? If Google and Facebook offer us free services in exchange for allowing them to build intimate dossiers on us, how do we know whether to take the deal?

There are a lot of these sorts of deals on offer. Waze gives us real-time traffic information, but does it by collecting the location data of everyone using the service. The medical community wants our detailed health data to perform all sorts of health studies and to get early warning of pandemics. The government wants to know all about you to better deliver social services. Google wants to know everything about you for marketing purposes, but will “pay” you with free search, free e-mail, and the like.

Here’s another one I describe in the book: “Social media researcher Reynol Junco analyzes the study habits of his students. Many textbooks are online, and the textbook websites collect an enormous amount of data about how­–and how often­–students interact with the course material. Junco augments that information with surveillance of his students’ other computer activities. This is incredibly invasive research, but its duration is limited and he is gaining new understanding about how both good and bad students study­–and has developed interventions aimed at improving how students learn. Did the group benefit of this study outweigh the individual privacy interest of the subjects who took part in it?”

Again and again, it’s the same trade-off: individual value versus group value.

I believe this is the fundamental issue of the information age, and solving it means careful thinking about the specific issues and a moral analysis of how they affect our core values.

You can see that in some of the debate today. I know hardened privacy advocates who think it should be a crime for people to withhold their medical data from the pool of information. I know people who are fine with pretty much any corporate surveillance but want to prohibit all government surveillance, and others who advocate the exact opposite.

When possible, we need to figure out how to get the best of both: how to design systems that make use of our data collectively to benefit society as a whole, while at the same time protecting people individually.

The world isn’t waiting; decisions about surveillance are being made for us­–often in secret. If we don’t figure this out for ourselves, others will decide what they want to do with us and our data. And we don’t want that. I say: “We don’t want the FBI and NSA to secretly decide what levels of government surveillance are the default on our cell phones; we want Congress to decide matters like these in an open and public debate. We don’t want the governments of China and Russia to decide what censorship capabilities are built into the Internet; we want an international standards body to make those decisions. We don’t want Facebook to decide the extent of privacy we enjoy amongst our friends; we want to decide for ourselves.”

In my last chapter, I write: “Data is the pollution problem of the information age, and protecting privacy is the environmental challenge. Almost all computers produce personal information. It stays around, festering. How we deal with it­–how we contain it and how we dispose of it­–is central to the health of our information economy. Just as we look back today at the early decades of the industrial age and wonder how our ancestors could have ignored pollution in their rush to build an industrial world, our grandchildren will look back at us during these early decades of the information age and judge us on how we addressed the challenge of data collection and misuse.”

That’s it; that’s our big challenge. Some of our data is best shared with others. Some of it can be ‘processed’­–anonymized, maybe­–before reuse. Some of it needs to be disposed of properly, either immediately or after a time. And some of it should be saved forever. Knowing what data goes where is a balancing act between group and self-interest, a trade-off that will continually change as technology changes, and one that we will be debating for decades to come.

This essay previously appeared on John Scalzi’s blog Whatever.

EDITED TO ADD (3/7): Hacker News thread.

Posted on March 6, 2015 at 2:10 PM • 39 Comments