Visualization is easy, right? After all, it’s just some colorful shapes and a few text labels. But things are more complex than they seem, largely due to the the ways we see and digest charts, graphs, and other data-driven images. While scientifically-backed studies do exist, there are actually many things we don’t know about how and why visualization works. To help you make better decisions when visualizing your data, here’s a brief tour of the research.

The Early Years of Understanding Data

While the early days of visualization go back over 200 years, actual research to understand how it works really only started in the 1960s. Jacques Bertin’s Sémiologie Graphique (Semiology of Graphics), published in 1969, was the first systematic treatment of the different ways graphical representations encode data. Bertin coined many terms of the trade, such as the mark, which is the basic unit of every visualization, like a bar, line, or circle sector. He also defined a number of retinal variables, which are the visual properties we use to express the data; these include color, size, location, etc.

In the early 1980s, Bertin’s work was picked up by researchers in statistical graphics and the nascent field of visualization (which didn’t quite have its name yet). William Cleveland and Robert McGill performed experiments to find out which of Bertin’s retinal variables were best suited for particular types of data, while Jock Mackinlay built a system that put Bertin’s and their work to use to create visualizations from data.

Thanks to Cleveland and McGill, we know that our perception is the most precise when it comes to understanding the location of a mark, followed closely by our ability to perceive length. We’re even less adept at perceiving area and orientation, and our ability to distinguish colors is even worse. We can see tiny differences in direction between lines that are almost but not exactly parallel, but we have a hard time quantifying an angle to say how many percent it represents in a pie chart. We can tell fewer than a dozen colors apart when their hues are very distinct, and can precisely compare shades of colors next to each other; but move them apart and surround them with very different ones, and it all goes out the window.

This may all seem interesting, but its practical uses are not obvious. To turn the theory into practice, Mackinlay built a system that assigned data fields to visual variables automatically in a way that optimized readability. Most visualization tools today still don’t offer that kind of intelligence, though Tableau’s Show Me! feature is built on a very similar idea.

More Knowledge, More Questions

A lot has happened since the 1980s, but there seems to be a bit of a standstill when it comes to understanding the basics. There are many open questions today, and we also realize the gaps and problems with some of the work performed.

As a case in point, Cleveland promoted an idea that he called banking to 45 degrees. The idea is simple: in a line chart, the average slope should be 45 degrees. That makes intuitive sense, since very steep charts tend to look overly dramatic and very flat ones make it hard to see any change in the data at all. Cleveland’s recommendation was based on research on how well we are able to compare the slopes of lines. He found that the highest accuracy was achieved when the lines being compared had an average of 45 degrees inclination.

But it turns out that that is not the entire truth. There were some limitations in Cleveland’s study that made 45 degrees look like the best option, but it seems that shallower angles are actually better. This was shown in a research paper that Justin Talbot, John Gerth, and Pat Hanrahan published in October 2012 at the annual VisWeek conference. The left line graph below is closer to 45 degrees on average, but the right one, while shallower, has fewer areas that produce large errors (which are indicated by the dark red color).

There is more. My former student Caroline Ziemkiewicz and I found that there is a potential interaction between the visual metaphor used to show data and the linguistic metaphor used to ask a question. We found this when looking at visualizations of trees, or hierarchies. The two most popular visualization techniques for this type of data, treemaps and node-link diagrams, differ in the way they show the hierarchy. Node-link diagrams use levels (or “above-ness”), while treemaps use nesting. A question asked using a levels metaphor (“Which of the nodes below node D …”) is easier to answer using the node-link diagram, which uses a compatible metaphor, than one asked using containment (“Which of the directories inside directory D…”), which works better with treemaps. The different metaphors are illustrated below, with treemaps on the left and node-link diagrams on the right.

We only scratched the surface on this, there are many other metaphors that are used in visualization, whether obvious or not. Barbara Tversky and Jeff Zacks found in the early 2000s that lines imply transitions whereas bars imply individual values. The seemingly simple choice between a bar and a line chart has implications on how we perceive the data.

Bizarrely, so does gravity. In our work on metaphors, Ziemkiewicz and I found that people interpreted round shapes as unstable because, they said, they might roll away. But to roll, there must be a force that causes the movement. After studying this effect some more, we found that the points in a scatterplot attract each other, and that they are seemingly pulled down by gravity. We remember points not where they are in the plot, but shift them towards clusters in our memory, and let them drift slightly downwards.

Findings and distinctions in visualization can be subtle, but they can have a profound impact on how well we can read the information and how we interpret it. There is much more to be learned about how visualization works and how best we can represent, analyze, and communicate data.