Charting Death: Reality vs Reported

Background How do people die? How do people think we die? And is there a difference? Well, it turns out there's a fascinating study conducted by Paul Slovic and Barbara Combs where they looked at how often different types of deaths were mentioned in the news. They then compared the frequency of news coverage with the actual frequency of people who died for each cause. The results are what one might cynically expect: "Although all diseases claim almost 1,OOO times as many lives as do homicides, there were about three times as many articles about homicides than about all diseases. Furthermore, homicide articles tended to be more than twice as long as articles reporting deaths from diseases and accidents." Since 1979, when the original Combs and Slovic study was conducted, there have been several more empirical analyses which have found largely similar results. (Notably, here and here) For our final capstone project for the fantastic Bradley Voytek's COGS 108 course at UCSD, we thought it would be interesting for us to have our own go at examining potential disparities between actual deaths and their corresponding media attention. For anyone curious about any of the steps throughout this project, the original data and code we used to do all this analysis is available here on GitHub.

Data: The Gathering For our project, we looked at four sources: The Center for Disease Control’s WONDER database for public health data (1999-2016). Google Trends search volume (2004-2016). The Guardian’s article database (1999-2016). The New York Times’ article database (1999-2016). For all of the above data, we looked at the top 10 largest causes of mortality, as well as terrorism, overdoses, and homicides, three other causes of death which we believe receive a lot of media attention. In all the charts below, we’ve normalized their value by dividing by the sum of all values for that year. Thus, the values given represent their relative share, rather than absolute counts. This is mainly to make comparisons between distributions easier, as what we really care about here is the proportionality in representation across different sources. First off, as our “ground truth”, we’ll look at the causes of mortality as given by the CDC. ⬅ ➡ Year: 1999 Average All Years Immediately, we can see that cancer and heart disease take up a major chunk of all deaths, each responsible for around 30% of the total death count. On the graph, everything is visible except for terrorism, which is so small it doesn’t show up unless we zoom in (You can do this by clicking on different causes in the legend to “strike them out” from the graph). Next, here’s the Google Trends data. (Because Google Trends didn’t start until 2004, we alas aren’t able to explore search data from 1999-2003.) ⬅ ➡ Year: 2004 Average All Years The two major changes here seem to be that heart disease is underrepresented here, and terrorism is very much overrepresented. Suicide also looks like it has several times more relative share here than compared to the actual death rate. The rest of the causes look like they’re within the right order of magnitude as the CDC data. Now here’s the data for The Guardian and The New York Times. We put them both below as they appear quite similar. (We’ll be able to quantify the degree of similarity in the next section.) ⬅ ➡ Year: 1999 Average All Years ⬅ ➡ Year: 1999 Average All Years Here, we see that terrorism, cancer, and homicides are the causes of death that are most mentioned in the newspapers. Though the share that cancer occupies seems largely proportional, the share given to both homicides and terrorism appears grossly overrepresented, given their respective share of total deaths. Finally, here’s all of the above data presented in one graph, so we can see them side-by-side:

Data Analysis After our cursory glance at the data, we have reason to think that the distributions given to these different causes of death for each source (CDC, Google Trends, The Guardian, and The NYT) are not in fact the same. To examine whether or not these distributions are the same, we’ll use a 𝛘2 (chi-squared) test for homogeneity, which can tell us if the way that different categorical variables are distributed in two groups are the same. We’ll run 𝛘2 tests with these four pairings of our data: CDC and Google Trends CDC and The Guardian CDC and The New York Times The Guardian and The New York Times Here are the results: Data Compared 𝛘2 Test Statistic p-value CDC and Google Trends 49.242 1.897×10-6 CDC and The Guardian 1198.758 3.205×10-249 CDC and The NYT 1204.499 1.860×10-250 The Guardian and The NYT 0.056 0.999 As we guessed, the 𝛘2 value for tests 1-3 are indeed quite high. Especially for tests 2 and 3, the p-value is incredibly low, meaning that we would basically never expect to see results of this kind, if it were the case that our null hypothesis that the newspaper’s categorical distribution matches that of the CDC’s distribution was true. We can also see that the NYT and the Guardian’s have a very low 𝛘2 value, indicating that it is quite likely they came from the same distribution. So now we have evidence that our two media sources are roughly similar, and this distribution is different from that of how causes of death actually affect the population. During our preliminary graphing of the data, we noted that terrorism and homicides appeared overrepresented in the news data, and that heart disease appeared underrepresented. Below, we’ve listed the difference of factors in representation across the different sources for the 13 causes of deaths. (For the Factor of Difference column, we took the larger value of Avg Deaths Prop./ Avg News Prop. and Avg News Prop./ Avg Deaths Prop. and added "Over" or "Under" to denote whether this value was over or underrepresented relative to the Avg Deaths Proportion value.) Cause of Death Avg Deaths Proportion Avg Newspaper Proportion Factor of Difference Alzheimer's Disease 0.036 0.009 4.172 Under Cancer 0.279 0.171 1.631 Under Car Accidents 0.057 0.025 2.285 Under Diabetes 0.035 0.028 1.260 Under Heart Disease 0.305 0.029 10.388 Under Homicide 0.008 0.251 30.796 Over Kidney Disease 0.023 0.002 10.793 Under Lower Respiratory Disease 0.064 0.018 3.520 Under Overdose 0.014 0.002 7.143 Under Pneumonia & Influenza 0.028 0.041 1.486 Over Stroke 0.053 0.059 1.119 Over Suicide 0.017 0.118 6.878 Over Terrorism 0.000 0.306 3906.304 Over Here's a graphical representation of the Avg News Prop./ Avg Deaths Prop. factors. (Note that the y-axis is log-scaled) The most striking disparities here are that of kidney disease, heart disease, terrorism, and homicide. Kidney disease and heart disease are both about 10 times underrepresented in the news, while homicide is about 31 times overrepresented, and terrorism is a whopping 3900 times overrepresented. Kidney disease is a little surprising; we had guessed at the other three, but it was only by calculating the factor here that this disparity became visible.

Conclusion We set out to see if the public attention given to causes of death was similar to the actual distribution of deaths. After looking at our data, we found that, like results before us, the attention given by news outlets and Google searches does not match the actual distribution of deaths. This suggests that general public sentiment is not well-calibrated with the ways that people actually die. Heart disease and kidney disease appear largely underrepresented in the sphere of public attention, while terrorism and homicides capture a far larger share, relative to their share of deaths caused. Though we have shown a disparity between attention and reality, we caution from drawing immediate conclusions for policy. One major issue we have failed to address here is that of tractability; just because a cause of death claims more lives does not mean that it is easily addressable. A more nuanced look at which causes of mortality to prioritize would likely be with an evaluation framework.