Last night’s Data Skeptics Meetup talk by Suresh Naidu was great, as I suspected it would be. I’m not going to be able to cover everything he talked about (a discussion is forming here as well) but I’ll touch on a few things related to my chosen topic for the day, namely who stays off the data radar.

In his talk Suresh discussed the history of governments tracking people with data, which more or less until recently was the history of the census. The issue of trust or lack thereof that people have in being classified and tracked has been central since the get-go, and with it the understanding by the data collectors that people respond differently to data collection when they anticipate it being used against them.

Among other examples he mentioned the efforts of the U.S. Census Bureau to stay independent (specifically, away from any kind of tax decisions) in order to be trusted but then turning around during war time and using census tracks to put Japanese into internment camps.

It made me wonder, who distrusts data collection so much that they manage to stay off the data radar?

Suresh gave quite a few examples of people who did this out of fear of persecution or what have you, and because, at least in the example of the Domesday Book, once land ownership was written down it was somehow “more official and objective” than anything else, which of course resulted in some people getting screwed out of their land.

It’s not just a historical problem, of course: it’s still true that certain populations, especially illegal immigrant populations, are afraid of how the census will be used and go undercounted. Who can say when the census might start being used to deport illegal immigrants?

As a kind of anti example, he mentioned that the census was essentially canceled in 1920 because the South knew that so many ex-slaves were moving north that their representation in government was growing weak. I say anti-example because in this case it wasn’t out of distrust, to avoid detection, but it was a savvy and political move, to remain looking large.

What about the modern version of government tracking? In this case, of course, it’s not just census data, but anything else the NSA happens to collect about us. I’m no expert (tell me if you know data on this) but I will hazard a guess on who avoids being tracked:

Old people who don’t have computers and never have, Members of hacking group Anonymous who know how it works and how to bypass the system, and People who have worked or are now working at the NSA.

Of course there are a few other rare people that just happen to care enough about privacy to educate themselves on how to avoid being tracked. But it’s hard to do, obviously.

Let me soften the requirements a bit – instead of staying off the radar completely, who makes it really hard to find them?

If you’re talking about individuals, I’d start with this answer: politicians. In my work with Peter Darche and Lee Drutman from the Sunlight Foundation (blog post coming soon!) trying to follow money in politics, it’s amazed me time and time again how difficult it’s been to put together the political events for a given politician – events that are individually publicly recorded but are seemingly intentionally siloed so it will be extremely difficult to put together a narrative. Thanks to Peter’s recent efforts, and the Sunlight Foundations long-term efforts, we are getting to the point where we can do this, but it’s been a data munging problem from hell.

If you’re generalizing to entities and corporations, then the “making data collection hard” award should probably go to the corporations with hundreds of subsidiaries all over the world which now don’t even need to be reported on tax forms.

Funny how the very people who know the most about how data can be used are paranoid about being tracked.