One year ago this Friday, United States chief information officer Vivek Kundra launched an ambitious website called Data.gov to make the government’s vast stores of data available to the public. The thinking behind the site then, as now, was to give app developers access to these rich, comprehensive datasets on all sorts of topics — health care, education, energy, the environment and so on — in the hope that they would create useful tools for analyzing a range of information, from air quality by county to crime statistics by neighborhood and foreign aid by nation.

“Part of what we’re trying to do [with Data.gov] is realize the president’s vision when he set out,” Kundra told Wired.com by phone. “He issued, on his first full day, a memorandum on transparency and open government.”

To make good on that promise, Data.gov launched with 47 datasets on May 21, 2009. On its first anniversary, Data.gov will have ballooned to more than 250,000 datasets and racked up 97.6 million hits — not bad for a website whose main attractions are massive databases and wonky graphs.

However, the sailing has not been entirely smooth for Kundra and company — mirroring, in a way, Obama’s presidency. Federal Computer Week complained that Data.gov was a demonstration of “how not to do open government” because it didn’t pass “the ‘mom test.'” By that, FCW meant it was too hard for the average person to figure out what to do on the site and where everything was.

Kundra’s team redesigned the entire Data.gov with that sort of feedback in mind, and we’re inclined to agree that the new homepage, currently password-protected but set to launch Friday, is markedly more intuitive and user-friendly, as you can see for yourself on the following pages.

Still, he concedes that plenty of work remains, particularly in the mobile app department, which Kundra described as “the next frontier” along with the semantic web (more on that below). On a deeper level, one big problem with wrangling government data is that some of its myriad systems run software that is decades past its expiration date.

“Some of the challenges that remain are challenges around, frankly, systems — outdated technology within the federal government,” said Kundra. “We’ve noticed this huge gap in tech investment between the private sector and the public sector. The Patent and Trademark office, for example, is using 30-year-old systems built on Cobol. Therefore, releasing a lot of the data that we want to, very fast, is difficult.” He added that the Veterans Administration is another department mired in ancient technology.

But perhaps the most damning criticism one could make about Data.gov is that people haven’t been using it enough. In a country of 300 million or so people, many of the datasets have only been voted on a handful of times, and only 250 or so apps are listed on the site (granted, Kundra said “countless” others may have been developed without being listed on the site).

To encourage more widespread use of the government’s data, the Obama administration recently brought on an evangelist for the program: Jeanne Holm, formerly chief knowledge architect for NASA’s jet-propulsion laboratory, whose new title is communications and collaborations lead for Data.gov.

In her new role, modeled somewhat on Kundra’s own outreach to Rensselaer Polytechnic Institute (which resulted in some of the most powerful uses of the government’s data to date, as you’ll see below), Holm will travel to grade schools, high schools and universities over the coming year to explain to students and teachers how they can make use of these datasets through basic science projects, in the case of younger students, and more advanced data analysis at higher educational levels.

Hopefully, this will increase the nation’s awareness of these now-public resources, and by doing so, the number of web and mobile apps they build on them over the coming years.

“[Holm] is going to help us evangelize, and create an army of developers who are going to create applications that we can’t even imagine, as we think through innovation in the coming century,” said Kundra.

In addition, the government plans to make tools available that enable non-programmers to mash its data into public-facing apps, “making it as easy as writing an article on Wikipedia is today — that’s the vision, that’s what we’re focused on in the coming year.”

Kundra walked us through the redesigned Data.gov, set to launch Friday. Come along on a little tour through the site, which you won’t find anywhere else, because Wired.com is the only publication with the password.

Note: The design of the following webpages is subject to change before the site goes live on Friday, and some of these examples are visible on the current, pre-redesign version of Data.gov.

Let’s take a look:

Slicker, simpler, more intuitive

Kundra and his team redesigned Data.gov from the ground up, altering everything from fonts and page layout to the site’s organizational and search features, in an effort to make it simpler and more powerful. The homepage is less crowded and surfaces the most popular data sets, while a prominent search function now appears at the top of every page on the site.

Basically, the site looks more professionally done, approaching a commercial level of slickness, whereas the old design looked more like, well, the work of a government agency.

“We’ve designed it to make it as simple as possible, if you compare this [redesigned version] to the current website,” said Kundra. “We’ve used consumer search tools to make it really easy for people who are just looking for a dataset or if they’re interested in a policy area to quickly search, and people who are interested in apps can quickly find them, rather than organizing it by catalogs and so forth at the top level.”

Other countries adopting the Data.gov model

The next stop on our tour: A graph showing how the Data.gov idea itself has started to spread around the world over the past year. “Other countries are following the lead of the Obama administration.” said Kundra.

So far, Australia, Canada, Estonia, New Zealand, Norway and the United Kingdom have set up similar sites and released datasets to the public, as the graph shows.

It’s unlikely that countries such as North Korea would post such information, but as more countries come on board with the concept, programmers will be able to mash data not only in those countries but across their borders — an especially powerful tool for tracking worldwide phenomena such as global warming and air or water pollution.

Tracking obesity and health trends geographically

One of the most effective uses of the government’s health data developed as part of the Data.gov initiative is the National Obesity Comparison Tool, powered by Tableau Software’s visualization software. Tableau’s charts represent about as much fun as you can have analyzing obesity data by state and county.

Mouse over any of the counties to see detailed information, or click one to see where it stands relative to the rest of the state, as well as the national average on the graphs below, which track low consumption of fruits and vegetables, lack of exercise and tobacco smoking in addition to obesity. It’s a quick and fascinating way to browse the health habits of your fellow citizens.

“The vision that I want to realize is a YouTube for data,” said Kundra. “You can embed your analysis on a website, and the content and presentation layer will stay alive as the data gets updated over time. Imagine a world where people can literally share data and analysis in the same way today we share YouTube videos across the world.”

On-time flights

The FlyOnTime.us web app “allows the American people to make intelligent decisions around which flight to pick. The only reason this app can exist is that we release data from the FAA.”

As part of the app, citizens report how long it takes to get through the security line at various airports via Twitter, so you can schedule your trip accordingly (another tip: we hear you can get an iPad through security without taking it out of your bag, when laptops must be removed and scanned individually).

To report security wait times using your cellphone, bookmark flyontime.us/m/lines/security or tweet “#airportsecurity [three-letter airport code] in” when you enter the security line, and “#airportsecurity [airport code] out” once you’ve gotten through.

So far, however, Data.gov hasn’t received many of these submissions. Come on, America; Uncle Sam wants you to tweet these times.

Web 3.0: Making Data Talk to Data

Kundra visited Rensselaer Polytechnic Institute (RPI) to encourage students there to apply “new international standards for persistent government data (and metadata),” otherwise known as the semantic web or Web 3.0, to Data.gov’s datasets. These standards will make it easier for governmental data to power deep-linked data mash-ups combining various data sources in a consistent way.

“In the same way that the ability to link documents across the world essentially gave birth to the world wide web, where you could literally create HTML pages on the fly, our vision is to create an open data-linked web,” explained Kundra. “There are hundreds of applications that RPI has created that have brought in datasets from all over the world that [for instance] compare crime in the U.K. versus crime in the United States.

“Imagine these living, breathing sites that are constantly updating data from databases and linking them in an open format. That’s one of the big shifts that we have moving forward as we create this community of developers.”

To demonstrate how this works, he showed a map of U.S. foreign aid that quickly surfaces how much aid, and what kind of aid — be it security, development, child survival, agriculture, narcotics control and so on — our country sent to others around the world in 2008 (subsequent years to come), mashed together with news from The New York Times and facts from the CIA World Fact Handbook. Extensive graphs can be generated to reflect historical trends in a number of aid categories.

Other successful semantic web-style mash-ups created by RPI and served via W3C-approved RDF SPARQL queries include a graphical representation of the top 100 visitors to the White House and who they met with, nationwide maps depicting the presence of pollution, the bankruptcies of public companies and the ratio of library books to population.

In addition to RPI’s participation, Kundra said he has been particularly impressed by the Sunlight Foundation’s $25,000 contest to make the best use of Data.gov datasets and by Silicon Valley entrepreneurs creating profitable businesses on the back of this data.

Worldwide earthquake tracking

Anyone can request that the government make a dataset available in the Dialogue section of the site, and users can also vote on which of those suggestions they would most like to see. So long as the release of the data doesn’t compromise citizens’ privacy or national security, the Data.gov team honors those requests in the order of popularity.

“Right after [the] Haiti [earthquake], we got feedback from the public that they wanted to see datasets around earthquakes all over the world,” said Kundra. “Part of what we’ve incorporated over the past one year is the ability to get public feedback and act on it. You can also see ratings of these datasets — overall, utility, ease of access — we’re crowdsourcing which datasets we want to democratize and the order of operation.”

We told Kundra one dataset we would like to see would expose the environmental impact of the ongoing oil spill in the Gulf of Mexico, which some experts expect to spread to east cost beaches over the summer.

“We’ve got a lot of datasets from NORAD and the EPA that speak to that issue, and as more is generated, we will make that available on Data.gov.”

Making Data.gov itself more transparent

If Data.gov is about making the government’s data more transparent, and the site itself can be considered a form of data, then why not surface data about how people are using the site? Over the past year, Kundra and his team have added all sorts of data to that effect, all of which can be downloaded and sliced in various ways.

“We’ve driven transparency at the data element level,” said Kundra. “You can see the number of downloads per dataset … the Top 10 [non-U.S.] countries that are visiting Data.gov [Canada, the U.K. and Germany are the top three], the states that are visiting datasets [California, Georgia, the District of Columbia],” how the traffic to the site dips on the weekends, the number of datasets published by each governmental department each month, and more.

See Also: