One of the most frequent questions I get is, “What software do you use to visualize data?” A lot of people are excited to play with their data, but don’t know how to go about doing it or even start. Here are the tools I use or have used and resources that I own or found helpful for data visualization â€“ starting with organizing the data, to graphs and charts, and lastly, animation and interaction.



Organizing the Data

Data are hardly ever in the format that you need them to be in. Maybe you got a comma-delimited file and you need it to be in XML; or you got an Excel spreadsheet that needs to go into a MySQL database; or the data are stuck on hundreds of HTML pages and you need to get it all together in one place. Data organization isn’t incredibly fun, but it’s worth getting to know these tools/languages. The last thing you want is to be restricted by data format.

PHP

PHP was the first scripting language I learned that was well-suited for the Web, so I’m pretty comfortable with it. I oftentimes use PHP to get CSV files into some XML format. The function fgetcsv() does just fine. It’s also a good hook into a MySQL database or calling API methods.

RESOURCES:

Python

Most computer science types – at least the ones I’ve worked with – scoff at PHP and opt for Python mostly because Python code is often better structured (as a requirement) and has cooler server-side functions. My favorite Python toy is Beautiful Soup, which is an HTML/XML parser. What does that mean? Beautiful Soup is excellent for screen scraping.

RESOURCES:

MySQL

When I have a lot of data – like on the magnitude of the tends to hundreds of thousands – I use PHP or Python to stick it in a MySQL database. MySQL lets me subset on the data on pretty much any way I please.

RESOURCES:

R

Ah, good old R. It’s what statisticians use, and pretty much nobody else. Everyone else has it installed on their computer, but haven’t gotten around to learning it. I use R for analysis. Sometimes though, I use it to extract useful subsets from a dataset if the conditions are more complex than those I’d use with MySQL and then export them as CSV files.

RESOURCES:

Microsoft Excel

We all know this one. I use Excel from time to time when my dataset is small or if I’m in a point-and-click mood.

Charts and Graphs





Alright, the data are processed, formatted, and ready to go. Now it’s time to visualize. The software I use for static charts and graphs depends on the task at hand, so I try not to limit myself to anyone piece of software. For example, R is good for quick results, but no good for a Web application.

Adobe Illustrator

I use Adobe Illustrator for publication-level graphics. I learned how to use it when I was at The Times out of necessity and have been enjoying it since. You can manipulate every element of a graph with a simple click and a drag – which can be a blessing and a curse.

RESOURCES:

R

If you have a particular type of (non-animated, non-interactive) statistical visualization in mind, R has probably got it. R is free with countless libraries available. If you can’t find a library to suit your needs, you can always script it yourself. One cool thing about R is that you can save your graphics as PDF and then polish it in Adobe Illustrator.

RESOURCES:

PHP Graphics Library

I’ve only had limited experience the the PHP GD library. There are several PHP graphing packages available, but I haven’t found one that I liked a whole lot, so I’m usually more satisfied drawing my own graphs with the GD library. The Sparklines PHP graphing library isn’t half bad either.

RESOURCES:

HTML + CSS + Javascript

You can surprisingly do quite a bit with some simple HTML and CSS. You can make graphs and of course tables as well as control colors and sizes. For example, a lot of the tag clouds you see on the Web are just HTML and CSS. Throw Javascript in to the mix and you’ve got yourself a party i.e. interaction capabilities.

RESOURCES:

Flash/Actionscript

Flash and Actionscript is better known for animating and moving data, but it can be used for static stuff too. It’s pretty good if you want to add interaction to your visualization like highlighting or filtering. I’ve done some stuff from scratch and also played around with Flare, the Actionscript visualization toolkit.

RESOURCES:

Microsoft Excel

It’s pretty rare that I use Excel for graphics. If I need something really quick though and the data are already in an Excel spreadsheet, I’ll click that graph button.

RESOURCES:

Animating the Data

There are several options to create animated and interactive data visualization, but these are the only ones I use (and for the most part, dominate what you see on the Web).

Processing

Yeah, it’s called Processing. I’ve seen mostly designers use it, but there’s no reason it can’t be used elsewhere. Processing uses a canvas metaphor where you draw and make sketches and then get a Java applet out of it. Processing was created to make programmatic goodness available to non-programmers.

RESOURCES:

Flash/Actionscript

Flash and Actionscript has been my point of interest lately â€“ mostly because the Java applet is dead as far the Web is concerned. The interactive/animated visualization you see from places like The New York Times, Stamen Design, and web applications are usually implemented with Flash and Actionscript. Not sure if it’s Flash? The tell tale sign is a simple right click on whatever you’re looking at. Take a look at my previous post on How to Learn Actionscript for Data Visualization for more details.

RESOURCES:

Phew, that was a lot. I started this out as a list of 10 tools and resources, and it just kept growing. I didn’t realize I use so many things. It just goes to show that for any given job, there’s a tool that’s right and one that’s not.

The amazing thing is that these are only the tools I use. There are lots of others out there. Do you use something that’s not on the list to visualize data or know of another resource that would be useful?