UPDATE: See how this issue resolved.

Rep. Ander Crenshaw (FL4) and the House subcommittee he chairs decided this week that the American public can’t be trusted with more thorough records about what Congress is doing.

/// Take Action: Write your rep to oppose Crenshaw’s report.

/// Start a Letter >

In a committee report on the legislative branch appropriations bill H.R. 5882, the subcommittee responded to requests from GovTrack, the Sunlight Foundation, Washington Watch, and other watchdog groups about particular technological measures Congress can take to improve legislative transparency. I wrote about our request most recently in March, though I have been asking for the improvements for more than 10 years. We asked for “bulk data,” which means comprehensive records in a format that is machine-processable.

The committee’s response to our request starting on the bottom of page 17 is steeped in technical language, making it hard to quote here. Here are the important parts:

The Committee has heard requests for the increased dissemination of congressional information via bulk data download from non-governmental groups supporting openness and transparency in the legislative process. While sharing these goals, the Committee is also concerned that Congress maintains the ability to ensure that its legislative data files remain intact . . . once they are removed from the Government’s domain to private sites. . . . [How would we pay for] Congress to confirm or invalidate third party analyses of legislative data based on bulk downloads in XML?

What they’re saying is that they fear that if the American public is given more detailed and precise records about Congress that we’ll distort it and, well, hurt ourselves I guess. And then, according to the committee, Congress will have to go around correcting us. How insulting!

Especially since for the last eight years I’ve been making this sort of information available on GovTrack, and last I checked that was a good thing. Even Congress’s staff uses GovTrack: Crenshaw’s own staff has probably used GovTrack for their research.

A world without bulk legislative data is a world where everyday citizens couldn’t ask simple questions such as how often is a bill enacted, where in the legislative process is a bill, is my representative moderate or extreme, and what bills are coming up next week. Without bulk data there are no tools for legislative tracking, like email updates, discussion forums, and write-your-rep websites. And in the world Crenshaw envisions there is no need for journalists, since the only information the public needs can be found in whatever “intact” files Congress thinks is sufficient.

The report focuses on “digital signatures on XML documents,” a way to be able to distinguish official government documents from fakes. While that’s important, it’s not relevant to the question of bulk data. First, the report claims there is no such thing as a digital signature for an XML document, but that’s simply false. In fact, Congress has already been using digital signatures for XML documents for years. (Thanks to Eric Mill for pointing that out.)

Second, and most importantly, no one actually cares. The millions of individuals who use GovTrack to find the status of a bill are not looking for an official government document. They want an explanation of what is going on with the bill, and that’s not provided by the government. But it’s something we can create more easily with bulk data.

Third, Crenshaw’s colleagues in the House Republican leadership have been putting out all sorts of new bulk data in the last year without digital signatures, and it hasn’t been a problem. A few weeks ago I blogged that the House did good work in making the week ahead’s schedule more available as bulk XML data. I try to look on the positive side of these things. In that blog post I congratulated the House on its achievements with XML. But I can’t find a positive way to look at this committee report.

Daniel Schuman at Sunlight Foundation has been covering the recent developments as well. Check out his blog post for more background on what’s going on here.

What I’m asking Congress and the Library of Congress to do is to share their internal database of legislative information that powers their official THOMAS website. That database would make GovTrack and dozens of other websites more accurate. Over the last six months, GovTrack and its data partners have been used by millions of individuals — again, including Congress’s own staff. More precise data would go immediately toward helping millions of individuals.

“Bulk data” is today considered a core component of any government information dissemination program. In 2009, the Government Printing Office began offering bulk data for bill text and other publications. Executive branch agencies are all now under a directive to embrace data. This is a no-brainer.

I’m sympathetic to other reasons to put off bulk data, such as cost (actually it’s cheap) or there being other priorities for Congress to address. I think bulk data is important, but I can understand if not everyone thinks it’s so important to do right now. Although cost was mentioned in the report, the main gist was the techno-nonsense about digital signatures.

Crenshaw has some explaining to do if he doesn’t think Americans can handle the data. I reached out to his office for a comment but did not receive a reply.