Seeing Like a Geek

Yes, as through this world I've wandered I've seen many men, I guess; Some will rob you with a six gun, And some with a GIS.

In the state of Tamil Nadu, near the town of Marakkanam, right next to a reserved forest, lies a contested plot of land. Records say these three acres belong to a member of the Mudaliar caste, but lower-caste Dalits living nearby claim the plot should be part of the reserved forest, which is not privately owned. The Dalits claim that the Mudaliars have pulled a fast one, using their influence in the local bureaucracy to fix the land records, and that older records will bear out the Dalit claim. Complicating the case, officials say that boundaries between land parcels in the area are often difficult to ascertain.

According to Bhuvaneswari Raman, the Dalit claim was sideswiped by a Tamil Nadu government program to standardize, digitize and centralize land records. The program, promoted by the World Bank as a pro-poor, pro-transparency initiative, was undertaken to capitalize on the boom in nearby Chennai. The absence of clear land titles made extensive land purchases time consuming and expensive, and this was a bottleneck to large-scale development projects. As part of the program, the Tamil Nadu government declared that the digitized records would be the only evidence admissable in court for land claims, so the older records and less precise data that formed the basis of the Dalit claims lost any legal footing they had, and their claim was sunk.

A new generation of land developers grew up alongside the digitized records: firms with the skills and information to make efficient use of this new resource. These developers lobbied effectively for records and spatial data to be made open, and then used their advantages to displace smaller firms who, as Raman writes, "relied on their knowledge of local histories and relationships to assemble land for development". The effects went far beyond the three-acre plot near Marakkanan: newly visible master plans became used as "the reference point to label legal and illegal spaces and as a justification for evicting the poor from their economic and residential spaces." The "pro-poor" initiative turned out to be anything but. Tamil Nadu was not alone in running an open data project that made life harder for the poor; neighbouring Karnataka’s "Bhoomi" (or ‘land’) e-governance program has had similar effects: a 2007 publication concluded that "the digitization of land records led to increased corruption, much more bribes and substantially increased time taken for land transactions. At another level, it facilitated very large players in the land markets to capture vast quantities of land at a time when Bangalore experiences a boom in the land market."

The open data doppelgänger

Making data "open" has two effects:

By cutting the price of the data to zero, for everyone and for any purpose, it undermines the power of those who previously controlled access to it. Just as cheap fish increases the demand for chips, so free data increases the demand for, and raises the value of, complementary resources and skills.

Effect 1 has many benefits, both real and potential. While all point out that open data is just one part of a complete breakfast, the essays by Victoria Stodden, Tom Lee and Matthew Yglesias in this seminar highlight the possibilities for improved accountability in government, those by Clay Shirky and Steven Berlin Johnson focus on the possibilities for improved services, and Beth Noveck emphasizes the possibilities for enhanced participation.

But there is an inevitable flip side to open data, which is the rise of new markets in its complements (effect 2). The point of this post is to draw attention to this open government data doppelgänger—the shadow of commercial interests that follow civic hackers wherever they go; the new markets that spring up inevitably from the ruins of the old—and to its dangers. I am suspicious of this doppelgänger: more so than most open data proponents, who tend to use the language of entrepreneurship and innovation when discussing companies who work with open data, and who contrast the new firms with the aging business models they seek to replace, and they often present commercial use as a complement to civic use.

The problem is, it’s not just that new markets and new businesses replace old ones. The markets undermined by open data are generally traditional in structure, characterised by decreasing returns, with market power that is distributed and limited in scope. Before digitization, the property developers of Tamil Nadu had particular knowledge about land ownership patterns in a specific area and each used that knowledge to build their own little empire. In contrast, constant fixed costs and zero marginal costs are "the baseline case" for information goods, so markets in open data environments are likely to consist of a few, big firms, each with significant market power. It’s no surprise that the new generation of property developers in Tamil Nadu were larger than those they displaced.

The dynamic is familiar from other "open" movements and from previous price changes forced by digitization. A range of institutions have been overthrown (with much rhetoric about the stifling effect of "gatekeepers" and the democratizing nature of the Internet), only to be replaced by fewer, bigger institutions.

The digitization of books undermined publishers and booksellers, and gave us a great big Amazonian bookseller/publisher.

The digitization of video pulled the market from under the feet of Blockbuster and from independent video stores, and now we have Netflix.

The mass sharing of digital music toppled major music labels, and saw the global rise of iTunes as the whole world’s music store.

All this is, of course, very general, but the downsides of open data are real and need to be addressed. Describing them as paradoxical "unintended consequences" (see Tauberer p 14) suggests they are anomalous edge cases, which misses the ubiquity of the problem.

Effective use: empowering the empowered

A small chorus of voices has been calling attention to the dangers of the open data’s free-market doppelgänger, particularly in countries where the gap between rich and poor is large. Bhuvaneswari Raman, Solomon Benjamin and others’ work (above) around land record digitization in India are one set of voices. Another is Michael Gurstein, a leading light in the field of "community informatics" who has been constructively raising concerns about how open data may "empower the empowered" for some time. The skills and resources needed to make "effective use" are complements to data. As just one case, Gurstein quotes from a recent study of who uses the British mySociety TheyWorkForYou.com open government initiative:

"people above the age of 54 tend to be over-represented, while dangers younger than 45 are under-represented in comparison to the Internet population. In terms of demographics there is a strong male bias and a strong overrepresentation of people with a university degree that also translates into strong participation from high income groups… One in five users (21%) of the site has not been politically active within the last year"

Gurstein comments that:

this attempt to enhance democratic participation has ended up providing an additional opportunity for those who already, because of their income, education, and overall conventional characteristics of higher status (age, gender etc.) have the means to communicate with and influence politicians. The additional information and an additional communications channel thus has the effect of reinforcing patterns of opportunity that are already there rather than widening the base of participation and influence. (link)

Another dissenting voice is Kentaro Toyama, an expert in the use of information technology for development. He argues that "in contexts where literacy and social capital are unevenly distributed, technology tends to amplify inequalities rather than reduce them. An email account cannot make you more connected unless you have some existing social network to build on." Again, in thinking about the effects of new technologies we must look at the complements to the technology, and how those complements shape new markets.

Seeing like a geek

Shunning the free-market doppelgänger can have a positive effect on outcomes.

Development studies scholar Kevin Donovan sees similarities between open data efforts and the demands of the state as described in James Scott’s "Seeing Like a State". Open standards and structured, machine-readable data are key parts of the open data programme. For Donovan this formalization and standardization is "far more value-laden than typically considered". Open data programmes, like the state, seek to "make society legible through simplification". Standardized data, like the state, "operate[s] over a multitude of communities and attempt[s] to eliminate cultural norms through standardization". He writes:

Eliminating illegibilty in this way reduces the public’s political autonomy because it enables powerful entities to act on a greater scale. Scott argued, ‘A thoroughly legible society eliminates local monopolies of information and creates a kind of national transparency through the uniformity of codes, identities, statistics, regulations and measures. At the same time it is likely to create new positional advantages for those at the apex who have the knowledge and access to easily decipher the new state-created format’

Open data undermines the power of those who benefit from "the idiosyncracies and complexities of communities… Local residents [who] understand the complexity of their community due to prolonged exposure." The Bhoomi land records program is an example of this: it explicitly devalues informal knowledge of particular places and histories, making it legally irrelevant; in the brave new world of open data such knowledge is trumped by the ability to make effective queries of the "open" land records. The valuing of technological facility over idiosyncratic and informal knowledge is baked right in to open data efforts.

More encouragingly, Donovan looks at how some "data geeks" recognized their own myopia in the Map Kibera project. The project started as a community-mapping project to trace the massive Nairobi slum. Some questioned the need for the project as "locals [already] knew their surroundings intimately", arguing that making mapping information available would more likely benefit external parties than the residents themselves.

The problems the project seeks to address (Kibera’s poverty and marginalization) were of the class Donovan calls "wicked" problems: ill-defined, tangled, and resistant to technological fixes. However, "Although it began as an example of misdiagnosing a wicked problem… as a tame one (insufficient information availability), Map Kibera has admirably grown beyond a reductionist approach"; it has expanded to include other forms of activity such as citizen reporting, and has taken steps to ensure local ownership of the project. The project has moved beyond a technological goal to a set of social goals. Its list of sponsors, interestingly, includes only non-commercial organizations.

Donovan contrasts Map Kibera’s evolution with that of commercial, and more narrowly technological mapping projects, such as Google’s Map Maker initiatives which have been accused of unethical "exploitation of open communities." The danger of such projects is that, by eliminating the illegibility that privileges local knowledge over outsider knowledge, they may allow "more powerful entities to see like a slum" and benefit those already in power.

When it comes to development programs, Donovan concludes, making data available is not enough. Instead, transparency must be linked with deliberative development. Effecting social change cannot avoid the need to actually address underlying dynamics of power.

Have you considered the benefits of an alarm system?

Combining open data with its complements is a step on the road to surveillance.

One of the most valuable complements to open data is, of course, other data (mashups!): a bus schedule is more valuable if you can combine it with a map. This combinatorial aspect of open data raises problems for government-collected data, as legal scholars Teresa Scassa and Lisa M. Campbell highlighted recently, because data protection legislation "typically requires that information collected for specific purposes should not be used for other purposes without consent."

Scassa and Campbell look how "even relatively low quality spatial data may attract the application of data protection or privacy law, particularly when it is matched or combined with other data sets". Take, for example, Ottawa Police’s crime mapping tool (link), which is a map of calls for police assistance provided through a collaboration between Ottawa Police and US company Public Engines. If insurance companies make decisions about rates or insurability based on the crime-mapping data, or if security companies use it to target specific areas for marketing campaigns ("Did you know there were three robberies on your street in the last two months? Would you like a visit from one of our salespeople?") then this site could be violating those conditions.

Again, it is worth thinking about what knowledge increases in value and what is displaced when local data is made digitally public in this way. Brandon, Manitoba released property tax and assessment for every single property in town (here). For residents of Brandon, and particularly for local real-estate agents, this data release will not tell them a lot that is new. But now you don’t need to know anything about Brandon–even where it is–to have a good idea of the wealth level of each inhabitant. Who cares about such stuff? The people who attend the Toronto Dx3 Canada event in January, for sure: "the first and only trade show dedicated to Digital Marketing, Digital Advertising and Digital Retail, is offering attendees the chance to get intimate with the City of Toronto’s Open Data Initiative."

Open data advocates commonly address privacy issues by reference to personally identifiable information, but there is no clear dividing line between data that identifies individuals and data that doesn’t. It is well known that the right way to think of privacy when it comes to data made available in a "release and forget" manner (which open data is by definition) is in terms of information entropy or, to be less jargony, in a twenty-questions kind of way. Each question reveals a little more about the subject; no one question tells us what we need to know, but by successive filtering we arrive at the only possible answer.

The commercial potential of combining open government data with other data sets is an irresistible temptation for the open data goppelganger, regardless of the privacy consequences. There is a need for vigilance against its vulnerability to these temptations.

Reining in the doppelgänger

When I have brought up conflicts between markets and civic open data initiatives, I have occasionally been accused of cynicism and negativity (who me?) and exhorted to "get involved" instead. Many open data activists sees themselves as being idealistic and positive yet they retain a deep cynicism of government agencies while maintaining faith in the market’s ability to maintain diversity and consumer power. I find it odd to see this combination of attitudes in a movement that often describes itself in egalitarian terms.

The faith in markets sometimes goes further among open data advocates. It’s not just that open data can create new markets, there is a substantial portion of the push for open data that is explicitly seeking to create new markets as an alternative to providing government services. Influential advocate Tim O’Reilly claims not to be in favour of such an agenda (see comment here), but his "Government as Platform" initiative has been readily adopted by many who are.

In a recent paper, Jo Bates highlights the way in which open government data programs can be used as a form of privatization and deregulation: a deliberate attempt to create new markets in "Public Sector Information (PSI) reuse" instead of providing government services. Here is a summarizing quotation that I’ve used before:

the current ‘transparency agenda’ [of the UK government, supported by prominent Open data advocates] should be recognised as an initiative that also aims to enable the marketisation of public services, and this is something that is not readily apparent to the general observer. Further, whilst democratic ends are claimed in the desire to enable ‘the public’ to hold ‘the state’ to account via these measures, there is an issue in utilising a dichotomy between the state and a notion of ‘the public’ which does not differentiate between citizens and commercial interests… The construction… encourages those attracted to civic engagement into an embrace of solidarity with profit seeking interests, distanced from the ever suspect notion of the state.

Here is the kind of activity that now comes under "open data" initiatives (again from Jo Bates, here):

[T]here has been significant lobbying by the financial industry to get better access to UK weather data so that it is able to compete in this [weather risk management] market. Groups such as the Lighthill Risk Network, of which Lloyds of London are a member, have lobbied government for better weather data so that they can develop risk based weather products. Similarly, the insurance industry has requested real time information on the pretext that they might respond more quickly to extreme weather events such as flooding. My own research and the recent announcement suggest that these demands have been met enthusiastically by well placed policy makers in national government who are keen to develop a UK weather derivatives market.

Weather risk management might seem like an odd duck, but Bates reports that "This weather risk management market far outweighs the USA’s commercial weather products market which in 2000 was estimated at approximately $500 million a year", touching over $45 billion in 2005-06.

Welcoming corporate involvement in open data activities will lead to new Amazons and Apples, while undermining the community activism that is the movement’s strong point. Whatever we think of Amazon and Apple from a consumer point of view, it is difficult to see how their rise has positive political outcomes.

A final example: one of the leading companies in the open data space is Palantir Technologies, highlighted by the civic-minded Code for America ("The success of Palantir or Socrata in offering innovative, web 2.0-style services for government shows the way forward for new government-focused enterprise companies." – here), a sponsor of O’Reilly’s gov 2.0 summit (link) and adopter of "Government as a Platform" terminology, and an early partner of USAid’s Food Security Open Data Challenge. And what do we know about Palantir? It is hooked closely in to US intelligence agencies, with early funding by the CIA through its [In-Q-Tel] (http://en.wikipedia.org/wiki/In-Q-Tel) venture capital arm and Peter Thiel’s Founders Fund, both organizations known for their profound commitment to openness and equality. It is deeply involved in anti-terrorism programmes. Peter Thiel is Palantir’s Chairman of the Board: perhaps he will be pursuing open data projects for the secretive Bilderberg Group, on whose Steering Committee he sits?

Are there ways to rein in open data’s free-marker doppelgänger? The parallels between the economics of information goods and the economics of cultural goods can give us some ideas for dealing with the new oligopolies that threaten to grow around open data.

One lesson of cultural economics is that creative works for which there is significant demand in a small market can be swamped by near-zero-marginal cost exports from large markets. It is more profitable for TV stations in smaller markets to broadcast cheap American shows than it is to broadcast more expensive home-grown material, even in cases where the latter would draw a bigger audience, because cultural producers seek to cover their costs in their home market and are typically sell at discounted rates elsewhere.

To maintain cultural diversity in the face of winner-take-all markets, governments in smaller countries have designed a toolbox of interventions. The contents include production subsidies, broadcast quotas, spending rules, national ownership, and competition policy. In general, such measures have received support from those with a left-leaning outlook.

Unfortunately the Open Data Movement demands that data be provided without borders and in a uniform way: machine processable, available to anyone, and license-free. It mandates non-discriminatory licensing, focuses on standards-based formats, and generally insists that data be accessible to rich and poor alike, like justice and the Ritz. It insists that any measures governments would like to take to favour—for example—non-commercial users or local users, be taken off the table. It strikes me as bizarre that this logic has gained such a significant hold among left-leaning digital enthusiasts that it has become orthodoxy.

I am not convinced that a coherent case can be made for "open data" as a public good, independent of the social changes that must accompany it, until the movement confronts its doppelgänger. This will require putting far more emphasis on experimentation in standards, licensing, and selective provision of data at the municipal and higher levels of government to ensure that what is a potentially valuable public resource is not plundered by those with the digital skills and resources to make most use of it.