By David Mendoza - Monday, May 4, 2015

This bar chart recently appeared in the Washington Post. It attempts to compare the ages of several presidential candidates with previous presidents. It’s a fairly innocuous graphic, except for one major problem: The baseline of the chart starts at 40 — not zero. This is an egregious error, which greatly distorts the data presented in the chart.

Since bar charts visually encode data through length, starting the chart at 40 exaggerates how large the age difference between different candidates and presidents actually are. By doing this, the designer also violates one of Edward Tufte’s principles of graphical integrity. In “The Visual Display of Quantitative Information”, Tufte wrote, “The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities represented.” That’s not even remotely true about this chart.

In the figure above, I condensed the original version from the Washington Post to a few bars and measured the length of the longest and shortest bar in millimeters. I did this to calculate the Lie Factor, which is a measure Tufte came up with to determine how inaccurate misleading graphics are. The Lie Factor is calculated by dividing the size of the effect shown in the graphic by the size of the effect in the data.

In this instance, the bars displaying the ages of Ronald Reagan and Marco Rubio are 402 mm and 54 mm, respectively. This means the increase shown in the graphic works out to 644.4%. However, this seriously overstates the effect in the actual data. Reagan was 77 when he was inaugurated into office and Rubio would be 45 on his hypothetical Inauguration Day. That’s an increase of only 71.1% from Rubio’s age to Reagan’s. The Lie Factor of the bar chart, then, is 9.1. As Tufte wrote, a graphic with a Lie Factor larger than 1.05 represents a “substantial distortion.”

To put it another way, the effect shown in the chart is commensurate with data that would have Rubio’s age as 45 and Reagan’s age as 335.

I present a more effective way to visualize this data below.

Click here to embiggen this image.

The original designer’s instincts were correct. The area between 0 and 40 on each bar is superfluous. If we made a bar chart that included this area, it would inhibit the ability of the viewer to easily compare where each bar ends. However, the solution the designer used (i.e., truncating the bars) doesn’t work — as I’ve shown above.

Instead, he should have used a dot chart, since it doesn’t require a zero baseline. Unlike bar charts, dot charts don’t rely on the viewer to compare the length of each item. Rather viewers compare the position of each dot along a common scale. In “The Elements of Graphing Data,” William Cleveland points to this feature of the dot chart as the reason why it’s superior to the bar chart. “Ordinary bar charts,” he wrote, “have not been used so far in this book.” Instead, he explains that he used dot charts because “they are a more flexible display” and “they do not require a meaningful baseline on the scale line.”

I will note that my example isn’t exactly a “dot” chart because I decided to use the age of each candidate in place of dots. This further improves the ability of the viewer to make meaningful comparisons. Additionally, I only included the ages of the youngest, oldest, and current presidents on the chart in order to save space.

Do you have a comment, question, or correction? Email or tweet it to me.