Government web sites have been a joke for almost as long as there have been web sites. They tend to be slow, clunky, and far behind their private-sector counterparts. All three presidential candidates have signaled that they want the federal government to make better use of the Internet, but there's a real danger that the next administration will simply add another coat of lipstick to a very ugly pig.

A new paper from researchers at Princeton University suggests a different strategy. David Robinson, Harlan Yu, William Zeller, and Ed Felten, all of Princeton's Information Technology Policy Center, suggest that government officials abandon the dream of developing usable web sites, and instead focus on providing raw public data such as regulatory decisions, Congressional votes, and campaign finance data in open, structured formats such as RSS and XML. This raw data would be made freely and publicly available to anyone who wanted it and could be used for any purpose.

Robinson et al predict that the private sector would quickly surpass the feds at the task of organizing and presenting this information in a user-friendly manner. Indeed, they note that in several cases, private parties have already produced user-friendly web sites with government data despite the high barrier to entry created by the need to manually scrape the data from the feds' existing web sites.



The THOMAS system

For example, GovTrack presents information about the legislative process in ways that are superior in some respects to Thomas, the government's official source for legislative information. It is maintained by linguistics graduate student Joshua Tauberer in his free time.

The Princeton researchers suggest that once the private sector has been relieved of the irritating task of manually scraping data from government web sites, a proliferation of user-friendly sites will allow people to sort, search, and analyze the data in a variety of ways.



Govtrack.us

To ensure that governments' release of structured data does not become an afterthought, the study's authors advocate a requirement that agencies' user-facing web sites must exclusively use the publicly-available data sources as their "back ends." This would give agencies a strong incentive to ensure the data sources remain in good working order, and that they contain complete and up-to-date information.

Hack, Mash, and Peer

Robinson and his colleagues are not the only researchers advocating that government agencies release public data in structured formats. Last fall, Jerry Brito of the Mercatus Center at George Mason University released a paper called "Hack, Mash, and Peer." The paper explained the basics of structured data formats and argued that releasing raw data in open, structured formats would make government more transparent and accountable.

Mandating the release of structured data would also discourage the manipulation of public data for self-serving reasons. For example, Brito relates the story of a Washington Post researcher who discovered on the Senate web site out-of-date XML files containing Senators' congressional votes. The researcher e-mailed the Senate webmaster, who replied that the XML files had been disabled because Senators "have a right to present and comment on their votes to their constituents in the manner they prefer." In other words, allowing access to structured voting data might expose Senators to too much public scrutiny.

The current administration has proven less than enthusiastic about government transparency, so it is unlikely to make these proposals a priority. But when a new administration takes office, it will have an historic opportunity to increase government transparency through the use of open, structured data formats. Researchers at Princeton, George Mason, and elsewhere are laying the groundwork by developing specific proposals for administrative and legislative changes. We can only hope the next president listens.

Disclosure: Beginning this fall, Timothy B. Lee will be a student at Princeton's Center for IT Policy, studying under Ed Felten, one of the study's authors. Also, Lee and Jerry Brito both contribute to the Technology Liberation Front blog.