We had a lot of CSV data to parse and slice when we built our end of year report. We couldn’t easily parse CSV using sed , because a simple “split on commas” strategy quickly fell down. CSV has a surprisingly complicated spec that (for example) allows commas inside values. Instead, we needed something that’s built to parse CSV in all of its intricacy.

Enter CSVkit, “a suite of utilities for converting to and working with CSV”. It’s cut , sort , and grep for CSV.

Let’s try out CSVkit’s tools on this list of conferences in CSV format:

name,start_date,end_date,location,url "Rubyfuza","02/06/2014","02/08/2014","Cape Town","http://www.rubyfuza.org/" "Launch","02/26/2014","02/28/2014","SF","http://events.launch.co/festival" "EmberConf","02/23/2014","03/24/2014","Portland","http://ember.com" "NSConf","03/17/2014","03/19/2014","England","http://nsconference.com" "RailsConf","04/22/2014","04/25/2014","Chicago","http://www.railsconf.com/"

Let’s use csvcut to grab the name and location of each conference:

$ csvcut --columns name,location conferences.csv

name,location Rubyfuza,Cape Town Launch,SF EmberConf,Portland NSConf,England RailsConf,Chicago

The --columns option determines which column the command is operating on. It’s used in nearly all of the CSVkit tools.

csvsort can sort data by a specific column. Let’s sort by name:

$ csvsort --columns name conferences.csv

name,start_date,end_date,location,url EmberConf,2014-02-23,2014-03-24,Portland,http://ember.com Launch,2014-02-26,2014-02-28,SF,http://events.launch.co/festival NSConf,2014-03-17,2014-03-19,England,http://nsconference.com RailsConf,2014-04-22,2014-04-25,Chicago,http://www.railsconf.com/ Rubyfuza,2014-02-06,2014-02-08,Cape Town,http://www.rubyfuza.org/

Note that the output of csvsort , like the output from other CSVkit tools, is itself valid CSV. That means we can pipe it to other CSVkit tools, or take in other tools’ input. Here are the names and locations of the conferences, sorted by name:

$ csvcut --columns name,location conferences.csv | csvsort --columns name

name,location EmberConf,Portland Launch,SF NSConf,England RailsConf,Chicago Rubyfuza,Cape Town

csvgrep filters your data to only rows where a column matches a specific value.

Let’s search the name column for values that match the word “Conf”:

$ csvgrep --columns name --match Conf conferences.csv

name,start_date,end_date,location,url EmberConf,02/23/2014,03/24/2014,Portland,http://ember.com NSConf,03/17/2014,03/19/2014,England,http://nsconference.com RailsConf,04/22/2014,04/25/2014,Chicago,http://www.railsconf.com/

We can also use regexes with the --regex option. Let’s find conferences with at least two capital letters in their name:

$ csvgrep --columns name --regex "[A-Z].[A-Z]" conferences.csv

name,start_date,end_date,location,url EmberConf,02/23/2014,03/24/2014,Portland,http://ember.com NSConf,03/17/2014,03/19/2014,England,http://nsconference.com RailsConf,04/22/2014,04/25/2014,Chicago,http://www.railsconf.com/

Since CSVgrep is written in Python, it interprets regexes the same way Python does. If it works in Python, it should work when passed to --regex .

This covers the 80% use case of CSVkit, but if there’s something else you want to do, I strongly recommend reading the CSVkit docs. They’re exceptionally well-written and have helpful real-life examples.