Let me first say this: Nate Silver is one of my idols. His work on FiveThirtyEight pioneered statistical writing as a popular medium, particularly in sports; without it this blog wouldn’t exist and I might not have even been inspired to get my masters degree in Analytics.

However, I was struck by Silver’s most recent article on FiveThirtyEight, and wanted to offer a rebuttal. In the article, Silver makes the argument that the states Clinton won are more demographically similar to democrats in states which have yet to vote in primaries. Silver suggests that her success in these states favors Clinton moving forward, particularly in stats whose demographics conform closer to average. In his assertion, Silver calculated the root mean squared error between a state’s demographics and average Democratic demographics, where a low RMSE indicates that a state’s demographics are more representative of all democrats. Silver noted that of the states with the 9 lowest RMSE’s, Hillary Clinton won 8 of them.

To be clear: nothing Silver wrote in this piece is incorrect. It’s true that Clinton has outperformed Sanders in states with more “Democratic” demographics. However, implying that Clinton is in better shape because demographic favor her is disingenuous, and ignores more substantial reasons why Clinton performed the way she did. As you can see from the plot below, demographic similarity to average explains only 13% of the variance in voting outcome and the relationship is, in fact, not statistically significant1:

Even if we only look at wins and losses while ignoring margin of victory, a p-test would also suggest a non-statistically significant relationship between demographic similarity and support for Clinton2.

Really, using demographics to explain state polling behaviors is (literally) black and white. States with a large percentage of black voters tend to support Clinton and states with a large percentage of white voters tend to support Sanders, both to a statistically significant degree3:

To say that Sanders only does well with white voters or Clinton only does well with black voters isn’t quite true, as both sides are eager to point out: They perform roughly equally among the Hispanic/Latino and Asian/Other demographics, although Asian voters tend to lean slightly towards Sanders. But if we focus on only white and black voters, it’s clear Sanders has an advantage going forward. States in upcoming primaries have a higher percentage of white voters and a lower percentage of black voters than states who have already voted:

To take this analysis a step further, I used a linear regression model to predict voting outcome based solely on state demographics in states that haven’t voted yet. I then took each state’s pledged delegate counts to estimate each candidate’s delegates share. As it turns out, Sanders has more projected pledged delegates moving forward:

State Predicted Outcome Delegates Projected Clinton Delegates Projected Sanders Delegates New Jersey Clinton +12.85 126.0 71.1 54.9 New York Clinton +11.48 247.0 137.68 109.32 Maryland Clinton +29.65 95.0 61.58 33.42 Pennsylvania Clinton +3.51 189.0 97.82 91.18 California Sanders +12.23 475.0 208.45 266.55 Delaware Sanders +9.56 21.0 9.5 11.5 Kentucky Clinton +18.49 55.0 32.58 22.42 Connecticut Sanders +19.63 55.0 22.1 32.9 Indiana Sanders +5.37 83.0 39.27 43.73 District of Columbia Clinton +45.79 20.0 14.58 5.42 Rhode Island Sanders +12.43 24.0 10.51 13.49 New Mexico Sanders +6.5 34.0 15.9 18.1 Montana Sanders +10.78 21.0 9.37 11.63 South Dakota Sanders +13.41 20.0 8.66 11.34 North Dakota Sanders +25.37 18.0 6.72 11.28 West Virginia Sanders +3.22 29.0 14.03 14.97 Oregon Sanders +47.37 61.0 16.05 44.95 TOTAL 776 797

Based purely on demographics, my model projects Sanders to win 11 upcoming states, compared to just 6 for Clinton, leading to a projected delegate margin of 797 to 776. Silver does have a point – of the four states whose demographics most closely resemble all democrats, my model projects Clinton as the favorite in all 4. Clinton surely has an advantage in both New York and New Jersey, both important primary states. However, his analysis swept aside one huge factor: California, which has a massive 475 pledged delegates, has voter demographics that are substantially more friendly to Sanders than Clinton. FiveThirtyEight projects Democratic voters in the Golden State to be only 11% African American while boasting the 2nd largest Asian American voting proportion, boding well for Sanders in the most crucial state for Democrats.

Now, a few things here. First, there are many, many more confounding factors that go into predicting a state’s voting behavior (caucus vs primary, closed vs open, etc). But for the sake of this post and for my rebuttal on Nate Silver, I’m focusing only on demographics here. Second, I’m not a political scientist by any means and didn’t research how each state determines how it distributes it’s delegates (if you know, leave a comment!). For simplicity’s sake I simply took the number of each state’s pledged delegates and multiplied by my model’s predicted voting percentage to calculate a rough projected delegate count for each candidate.

So keep in mind that these results are a relatively quick-and-dirty look at the relationship between state demographics and voting tendencies. And I think it’s safe to say that at the very least, demographics are not something Sanders has to worry about moving forward.

1 p-value obtained from a linear regression model with voting outcome regressed on RMSE. I used an alpha threshold of .05.

2 I used a Shapiro-Wilk test for normality and determined that a p-test is a valid test in this scenario.

3 p-value obtained from a linear regression model with voting outcome regressed on white/black voting percent. I used an alpha threshold of .05.