Wiekvoet, and kindly contributed to Want to share your content on R-bloggers? [This article was first published on, and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There are a number of on line efforts to register victims of shootings online. Shootingtracker tries to register all mass shootings, those with four or more victims. Slate had the gun death tally (GDT), gun deaths starting at Newtown, running through to December 31, 2013. This project is continued in the Gun Violence Archive.

In this post I am comparing the 2013 data of shootingtracker and GDT with CDC data of 2009 to 2011. Compared to each other shootingtracker and GDT are similar, but the CDC data has much higher counts.

Shootingtracker and Gun Death Tally

Shootingtracker has data of shootings with four or more victims. Since not everybody who is shot is dead, this makes the data uncomparable to CDC data. However, by restricting the selection to those shootings with four or more killed, it is still possible to make a comparison with GDT data. However the GDT data is not organized by incidence, but rather by victim. Its also appears that the state given is not the state of the incident, but rather the residence of the victim. In addition, the dates used in GDT and shootingtracker are not the same. Since both GDT and shootingtracker have web links for each record, it is possible to manually compare them. After this check there were 53 incidences, 49 from shootingtracker, 46 from GDT, 42 in common. Based on these data, using capture-recapture formula, approximately 54 incidences are estimated.

Gun Death Tally and CDC

For CDC the crude rates from 2009 to 2011 were extracted, with the following ICD-10 Codes:

X72 (Intentional self-harm by handgun discharge),

X73 (Intentional self-harm by rifle, shotgun and larger firearm, discharge),

X74 (Intentional self-harm by other and unspecified firearm discharge),

X93 (Assault by handgun discharge),

X94 (Assault by rifle, shotgun and larger firearm discharge),

X95 (Assault by other and unspecified firearm discharge)

Data from GDT are summarized by state and divided by inhabitants to obtain a rate.

The plot shows huge differences. While the years covered are different, the year to year variation in the CDC data seems much less than the difference with GDT. Washington DC, which seemed so bad in shootingtracker is bad in all data bases. However, it does not stick out as much, it just appears that things are more easily registered there.

Appendix 1: CDC data

Centers for Disease Control and Prevention, National Center for Health Statistics. Underlying Cause of Death 1999-2011 on CDC WONDER Online Database, released 2014. Data are from the Multiple Cause of Death Files, 1999-2011, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/ucd-icd10.html on Nov 2, 2014 10:56:15 AM

Dataset: Underlying Cause of Death, 1999-2011

Query Parameters:

Title:

2013 Urbanization: All

Autopsy: All

Gender: All

Hispanic Origin: All

ICD-10 Codes: X72 (Intentional self-harm by handgun discharge), X73 (Intentional self-harm by rifle, shotgun and larger firearm discharge), X74 (Intentional self-harm by other and unspecified firearm discharge), X93 (Assault by handgun discharge), X94 (Assault by rifle, shotgun and larger firearm discharge), X95 (Assault by other and unspecified firearm discharge)

Place of Death: All

Race: All

States: All

Ten-Year Age Groups: All

Weekday: All

Year/Month: 2009, 2010, 2011

Group By: State, Year

Show Totals: False

Show Zero Values: False

Show Suppressed: False

Calculate Rates Per: 100,000

Rate Options: Default intercensal populations for years 2001-2009 (except Infant Age Groups)

Appendix 2: R code for plot

library(plyr) library(dplyr) library(ggplot2)

cdc state_order % summarize(.,CR=mean(Crude.Rate)) %>% arrange(.,CR) %>% .$State state_order cdc cdc$Origin=’CDC’

slate1 stringsAsFactors=FALSE) %>% mutate(.,Date=as.Date(date,format=”%Y-%m-%d”)) %>% mutate(.,State=toupper(state)) %>% select(.,Date,State) %>% filter(.,Date>as.Date(‘2013-01-01’) )

states State=as.character(state.name) ) # http://www.census.gov/popest/data/state/totals/2013/index.html inhabitants #put it all together states State=’District of Columbia’)) states slate2 % rename(., Killed=Freq) %>% inner_join(states,.,by=c(‘StateAbb’=’State’)) %>% mutate(.,Rate=100000*Killed/Population) %>% mutate(.,Origin=’Slate’) %>% mutate(.,Year=2013) %>% select(.,State,Year,Rate,Origin)

rates % mutate(Year=factor(Year)) %>% mutate(State=factor(State,levels=state_order))

ggplot(rates,aes(x=State,y=Rate,colour=Year,shape=Origin) ) + geom_point() + ylab(‘Rate (per 100000)’) + coord_flip()