I came across this video via Twitter from my friend Jim Hendler (blog | @jahendler). It’s a walkthrough by US Deputy Chief Technology Officer Chris Vein of http://ethics.data.gov .

Walkthrough of Ethics.Data.Gov

This website brings together key open data sets such as White House visitors,lobbying, campaign donations, etc. As the URL shows, it’s a sub site of the over all US open data project, http://data.gov. You can see in the image below the datasets that comprise the Ethics data site:

The data is available for download and the website offers some nifty ways of working with, visualizing, and embedding the data. For instance, I’ve embedded the White House Visitor data right here. Go ahead, do some searching or filtering, right here.

You can change the column order by using the Manage button:

You can set up some fairly decent filters (is, contains, etc.) on the columns, too. Here are the visitors named Karen Lopez:

That’s not me. (I seem to recall that I am mayor of the Lincoln Bedroom on Foursquare, though.) This is the problem with trying to use something like First Name and Last Name as a primary key. My data does show up in the Federal Campaign donations list, though. Only one donation…my other donation was returned to me because "Canadians can’t donate to US campaigns". Unfortunately for that candidate, they assumed that I was Canadian based on my residency, not my citizenship. They lost the money, but the other campaign got to keep my money. The entire world is one big data modeling problem, I tell ya. Get your semantics and your syntax right and you can take over the world. Or at least the US.

The real power in open data is being able to find correlations. As Deputy CTO Vein mentions, one could match up the data from the White House visitors, lobbyists and campaign donations to see if you find any matches. That’s not bad, it’s just more information. This is tough to pull off with any certainty, though, due to that dang primary key issue I mentioned above. What might help this? URIs. Or some other way of uniquely identifying people and organizations.

To cross match data, you’ll need to use one of the Export methods of using the API (Socrata ) or download the data to your own tools.

Data is available for download in these formats:

You can also discuss the datasets right on the site (registration required). There are only 7 datasets that are part of this ethics website, but the data stewards are eager to find out what datasets you’d like to see added. I’d also like to hear what data you think should be part of an ethics website focused on data. I’m thinking:

Expenditures that required extra approval/oversight

Travel data (who went where an why)

Some of the criticism that I’ve heard about data.gov is that there are too few datasets or that so much more could be provided. I’ve even heard complaints about money being spent on this service. As Tony Clement, Canadian MP and President of the Treasury Board (site | @tonyclementCPC ) said recently about the Canadian open data initiatives: open data is about transparency. We can’t wait until we have all the data, in a perfect format, to share it. He also mentioned that open data is saving the Canadian Government in significantly reduced costs for Freedom of Information Access requests. Think about it. What open data will become is self-serve FOIA. No waiting around for someone to spend weeks or months to find some data, then thousands of dollars to prepare and provide it.

I’m also hoping that the move to open data will allow government data architects to influence good data management practices. Exposing the data to sunshine is going to allow us, the people who fund the data collection and processing, to point out where the data is poor quality. The usability and ability to integrate data sets is going to be key in making it useful.

I’m thinking that I’d like to use some of these sets and others from data.gov for some upcoming demos.