Bert Gunter writes:

This link is to an online CNN “analysis” of school shootings in the U.S. I think it is a complete mess (you may disagree, of course).

The report in question is by Christina Walker and Sam Petulla.

Gunter lists two problems:

1. Graph labeled “Race Plays A Factor in When School Shootings Occur”:

AFAICT, they are graphing number of casualties vs. time of shooting. But they should be graphing the number of shootings vs time; in fact, as they should be comparing incident *rates* vs time by race, they should be graphing the proportion of each category of schools that have shooting incidents vs time (I of course ignore more formal statistical modeling, which would not be meaningful for a mass market without a good deal of explanatory work). 2. Graph of “Shootings at White Schools Have More Casualties”:

The area of the rectangles in the graph appears to be proportional to the casualties per incident but with both different lengths and widths, it is not possible to glean clear information by eye (for me anyway). And aside from the obvious huge 3 or 4 largest incidents in the White Majority schools, I do not see any notable differences by category. Paraphrasing Bill Cleveland, the graph is a puzzle to be decipered: it appears to violate most of the principles of good graphics. Moreover, it is not clear that casualties per incident is all that meaningful anyway. Maybe White schools involved in shootings just have more students so that it’s easier for a shooter to amass more casualties. The “appropriate” analysis is: “Most school shootings everywhere involve 1 or 2 people, except for a handful of mass shootings at White schools. The graph is a deliberate attempt to mislead, not just merely bad.” Unfortunately, as you are well aware, due to intense competition for viewer eyeballs, both formerly only print (NYT, WSJ, etc.) and purely online news media are now full of such colorful, sometimes interactive, and increasingly animated data analyses whose quality is, ummm… rather uneven. So impossible to discuss statistical deficiences and the possible political/sociological consequences of such mass media data analytical malfeasance in it all.

My reply:

I think the report is pretty good. Sure, some of the graphs don’t present data patterns so clearly, but as Antony Unwin and I wrote a few years ago, infovis and statistical graphics have different goals and different looks. In this case, I think these are the main messages being conveyed by these plots:

– There have been a lot of school shootings in the past decade.

– They’ve been happening all over the place, at all different times and to all different sorts of students.

– This report is based on real data that the researchers collected.

Indeed, at the bottom of the report they provide a link to the data on Github.

Regarding Gunter’s points 1 and 2 above, sure, there are other ways of analyzing and graphing the data. But (a) I don’t see why he says the graph is a deliberate attempt to mislead, and (b) I think the graphs are admirably transparent.

Consider for example the first two graphs in the report, here:

and here:

Both these graphs have issues, and there are places where I would’ve made different design choices. For example, I think the color scheme is confusing in that the same palette is used in two different ways, also I think it’s just wack to make three different graphs for early morning, daytime, and late afternoon and evening (and to compress the time scales for some of these). Also a mistake to compress Sat/Sun into one date: distorting the scale obscures the data. Instead, they could simply have rotated that second graph 90 degrees, running day of week down from Monday to Sunday on the vertical axis and time of day from 00:00 to 24:00 on the horizontal axis. One clean graph would then display all the shootings and their times.

The above graph has a problem that I see a lot in data graphics, and in statistical analysis more generally, which is that it is overdesigned. The breaking up into three graphs, the distortion of the hour and day scales, the extraneous colors (which convey no information, as time is already indicated by position on the plot) all just add confusion and make a simple story look more complicated.

So, sure, the graphs are not perfect. Which is no surprise. We all have deadlines. My own published graphs could be improved too.

The thing I really like about the graphs in Walker and Petulla’s report is that they are so clearly tied to the data. That’s important.

If someone were to do more about this, I think the next step would be to graph shootings and other violent crimes that occur outside of schools.