A simple model based on two predictors — the racial composition of the Democratic primary electorate and a dummy variable for region — explain over 90% of the variance in Hillary Clinton’s vote share in this year’s Democratic primaries through March 8. The results of a regression analysis of Clinton’s vote share in 12 states for which exit poll data are available are presented in Table 1. I excluded Vermont from this analysis since Bernie Sanders clearly enjoyed a substantial advantage in his home state.

Table 1: Results of Regression Analysis of Clinton Vote Share in Democratic Primaries

The results in Table 1 show that both the nonwhite share of the electorate and region (South vs. Non-South) had strong effects on Clinton’s vote share. In these 12 states, a one percentage point increase in the nonwhite share of the electorate led to an increase of about one-third of a percentage point in Clinton’s vote share. And even after controlling for nonwhite share of the electorate, Hillary Clinton’s vote share was almost 15 percentage points higher in southern states than in states outside of the South (which is defined here as the 11 states of the Old Confederacy).

Figure 1: Scatterplot of Clinton vote share in Democratic primaries by predicted Clinton vote share

Figure 1 displays a scatterplot of the relationship between the predicted vote share for Clinton and her actual vote share in these 12 states. These results show that the predictions were highly accurate. Almost all of the points are very close to the regression line. It is especially interesting that the predicted result for the Michigan primary falls exactly on the regression line: Clinton’s predicted vote share was identical to her actual vote share of 48 percent. Thus the prediction model was far more accurate than pre-election polls, which had Clinton leading by an average margin of about 20 points.

Predictions of March 15 primary results

The estimates from Table 1 can be used to predict the results of future Democratic primary contests by simply plugging in the region dummy variable and an estimate of the nonwhite share of the electorate. Table 2 presents a prediction range for each of the five states holding Democratic primaries on March 15: Florida, Illinois, Missouri, North Carolina, and Ohio. The prediction range is based on two estimates of the nonwhite share of the 2016 Democratic primary electorate — the first estimate uses the nonwhite share of the electorate in the 2008 exit polls. The second estimate uses a correction of +6 percent based on the average change in the nonwhite share of the Democratic primary electorate between 2008 and 2016 in the 12 states for which 2016 exit poll data are available.

Table 2: Predicted vote shares and winners of March 15 primaries

Note: Predictions have an estimated standard error of plus or minus 5 percentage points, which are reflected in the ratings.

Based on the race and region model, we can predict that Hillary Clinton will win three states and Bernie Sanders will win two states on March 15. Clinton is predicted to receive between 65% and 67% of the vote in North Carolina, between 64% and 66% of the vote in Florida, between 52% and 54% of the vote in Illinois, between 46% and 48% of the vote in Ohio and between 45% and 47% percent of the vote in Missouri.

A note of caution and conclusions

These predictions assume that the effects of race and region on the outcomes of future Democratic primaries will be the same as their effects in the 12 states used in this analysis. That assumption may or may not turn out to be correct. In addition, the estimates of the nonwhite share of the electorate clearly are subject to error. On average, the nonwhite share of the Democratic electorate increased by about six percentage points between 2008 and 2016 in the 12 states used in this analysis. However the change ranged from -1 point in Virginia to +24 points in Mississippi.

Leaving these important caveats aside, our results suggest that Bernie Sanders is likely to present a strong challenge to Hillary Clinton in the remaining Democratic primaries. Clinton has had a big advantage in the nomination race thus far because so many of the contests have been in the South. After next Tuesday, however, there will be no more primaries in the South. Based on the results presented here, she will be favored over Sanders only in non-southern states in which the nonwhite share of the Democratic primary electorate is at least 40 percent. The key question may be whether the huge delegate lead she has built up by winning southern primaries by landslide margins will be enough to sustain her through the rest of the primary season.