In our work with government data we encourage governments to develop great looking websites with fantastic APIs. But you can’t have true transparency unless you go all the way down to the source giving citizens access to the raw bulk data.

Take for instance, USAspending.gov often lauded by our community as a great boon to transparency. It certainly was a great leap forward– the website gives citizens access to descriptions of federal contracts has XML feeds, and a great search engine. But take a look at this $375,000 expenditure by the Department of the Navy to make Facebook “sites”. On the surface, this is pretty incredible but since I have a summary of the contract, not the actual contract, I can’t see what actually went on here. It is pretty easy to make some harsh conclusions.

Before you rush to judgement though, ask some key questions: what are the technical specifications of the project, is it really just Facebook? Was it for advertising? Who authorized the purchase? Was there hardware purchased? Was it to buy a license that is valid for other things? What does it mean by “NAVY OFFICER FACEBOOK SITES,” because I didn’t spend any money on my Facebook site.

In all likelihood this is an advertising expenditure. But to get that, I probably have to FOIA for it which can take time and money and could just not happen. Despite USASpending providing an adequate API, XML feeds, CSV files, and a nice search engine, it isn’t adequate enough because the raw data isn’t available. Without the raw data, it isn’t transparent (or disclosed).

The interesting thing is that with the raw data, the rest of the stuff can be built by non government types (like for instance, us!) in ways that may provide an interesting take on the data that Government can’t or won’t provide. This is very clearly what our friends over at OpenSecrets.org do. They take data from the Federal Election Committee and combine it with sector and industry codes and provide quality control that makes searching through OpenSecrets data a more informative experience than looking at the FEC data on FEC.gov.

Government should be providing us access to data in three steps:

First, give us bulk access to the data in its rawest form. Give us scanned images of checks and contracts. Give us the campaign contribution information as it comes in from the campaigns at the FEC (they do!). Second, give us machine readable APIs so that we can take the data and mash it up with data across agencies. Finally, create a user experience that allows non-technical citizens to access the data.

But if you start with providing a website and don’t provide bulk data access or an API (like for instance, THOMAS) then we’ve got to go through the effort of parsing all the pages using something like BeautifulSoup to turn them into things like OpenCongress.org. What a drag!