Wanted - Data Curators to Maintain Key Datasets in High-Quality, Easy-to-Use and Open Form

Wanted: volunteers to join a team of “Data Curators” maintaining “core” datasets (like GDP or ISO-codes) in high-quality, easy-to-use and open form.

What is the project about : Collecting and maintaining important and commonly-used (“core”) datasets in high-quality, standardized and easy-to-use form - in particular, as up-to-date, well-structured Data Packages.

The “Core Datasets” effort is part of the broader Frictionless Data initiative.

: Collecting and maintaining important and commonly-used (“core”) datasets in high-quality, standardized and easy-to-use form - in particular, as up-to-date, well-structured Data Packages. The “Core Datasets” effort is part of the broader Frictionless Data initiative. What would you be doing : identifying and locating core (public) datasets, cleaning and standardizing the data and making sure the results are kept up to date and easy to use

: identifying and locating core (public) datasets, cleaning and standardizing the data and making sure the results are kept up to date and easy to use Who can participate : anyone can contribute. Details on the skills needed are below.

: anyone can contribute. Details on the skills needed are below. Get involved: read more below or jump straight to the sign-up section.

What is the Core Datasets effort?

Summary: Collect and maintain important and commonly-used (“core”) datasets in high-quality, reliable and easy-to-use form (as Data Packages).

Core = important and commonly-used datasets e.g. reference data (country codes) and indicators (inflation, GDP)

Curate = take existing data and provide it in high-quality, reliable, and easy-to-use form (standardized, structured, open)

Full details : including slide-deck at http://data.okfn.org/roadmap/core-datasets.

: including slide-deck at http://data.okfn.org/roadmap/core-datasets. Live examples: You can find already packaged core datasets at http://data.okfn.org/data/ and in “raw” form on Github at https://github.com/datasets/

What Roles and Skills are Needed

We need a variety of roles from identifying new “core” datasets to packaging the data to performing quality control (checking metadata etc).

Core Skills - at least one of these skills will be needed:

Data Wrangling Experience . Many of our source datasets are not complex (just an Excel file or similar) and can be “wrangled” in a Spreadsheet program. What we therefore recommend is at least one of: Experience with a Spreadsheet application such as Excel or (preferably) Google Docs including use of formulas and (desirably) macros (you should at least know how you could quickly convert a cell containing ‘2014’ to ‘2014-01-01’ across 1000 rows) Coding for data processing (especially scraping) in one or more of python, javascript, bash

. Many of our source datasets are not complex (just an Excel file or similar) and can be “wrangled” in a Spreadsheet program. What we therefore recommend is at least one of: Data sleuthing - the ability to dig up data on the web (specific desirable skills: you know how to search by filetype in google, you know where the developer tools are in chrome or firefox, you know how to find the URL a form posts to)

Desirable Skills (the more the better!):

Data vs Metadata: know difference between data and metadata

Familiarity with Git (and Github)

Familiarity with a command line (preferably bash)

Know what JSON is

Mac or Unix is your default operating system (will make access to relevant tools that much easier)

Knowledge of Web APIs and/or HTML

Use of curl or similar command line tool for accessing Web APIs or web pages

Scraping using a command line tool or (even better) by coding yourself

Know what a Data Package and a Tabular Data Package are

Know what a text editor is (e.g. notepad, textmate, vim, emacs, …) and know how to use it (useful for both working with data and for editing Data Package metadata)

Get Involved - Sign Up Now!

We are looking for volunteer contributors to form a “curation team”.

Time commitment : Members of the team commit to at least 8-16h per month (though this will be an average - if you are especially busy with other things one month and do less that is fine)

: Members of the team commit to at least 8-16h per month (though this will be an average - if you are especially busy with other things one month and do less that is fine) Schedule : There is no schedule so you can contribute at any time that is good for you - evenings, weekeneds, lunch-times etc

: There is no schedule so you can contribute at any time that is good for you - evenings, weekeneds, lunch-times etc Location : all activity will be carried out online so you can be based anywhere in the world

: all activity will be carried out online so you can be based anywhere in the world Skills: see above

To register your interest fill in the following form. Any questions, please get in touch directly.

Loading...

Want to Dive Straight In?

Can’t wait to get started as a Data Curator? You can dive straight in and start packaging the already-selected (but not packaged) core datasets. Full instructions here:

http://data.okfn.org/roadmap/core-datasets#contribute

Please enable JavaScript to view the comments powered by Disqus.