Late last week, the Crystal Ball published a simple forecasting model that I created to try to predict the results of the Democratic primary. The model is based on three predictors — region (South versus North), African-American percentage of primary voters in 2008, and Democratic percentage of primary voters in 2008 — and it outperformed pre-election polls in the five Democratic primaries held on April 26. Table 1 compares the forecasts based on the model with the results of the Democratic primaries in Connecticut, Delaware, Maryland, Pennsylvania and Rhode Island.

Table 1: Accuracy of April 26 Democratic primary forecasts

The model correctly predicted the winner in all five states with an average error of only one percentage point. In contrast, pre-election polls missed Bernie Sanders’ victory in Rhode Island and badly underestimated Hillary Clinton’s margin in Delaware. In terms of margin, the average error of the model was 3.6 percentage points across all five states compared with an average error of 8 percentage points for pre-election polls according to data from RealClearPolitics. The model performed as well as pre-election polls in the three states for which there were a large number of polls — Pennsylvania, Maryland and Connecticut — and much better in the two states for which there were only one or two polls available — Delaware and Rhode Island.

Table 2: Updated Democratic primary model

Table 2 displays the results of an updated regression analysis of Clinton’s vote share on the three predictors in the model, adding the results from the three states voting on April 26 for which exit poll data are available — Connecticut, Maryland and Pennsylvania. The model continues to perform extremely well with an adjusted R2 of .90, thus accounting for 90% of the variation in the data. All three coefficients are highly statistically significant.

Table 3: Predicted Clinton vote share in May primaries

Note: African-American and Democratic share of electorate based on 2008 Democratic primary exit polls

Finally, Table 3 displays forecasts of Hillary Clinton’s vote share in the four Democratic primaries coming up in the month of May: Indiana on May 3, West Virginia on May 10, and Oregon and Kentucky on May 17. Based on the African-American share of the electorate in 2008, the Democratic share of the electorate in 2008, and the fact that all three states are located outside of the South, the model predicts Sanders victories in Indiana and Oregon, a Clinton victory in Kentucky, and a tie in West Virginia. The main reason why Sanders is favored in Indiana and Oregon while Clinton is favored in Kentucky is that the Democratic share of primary voters in Kentucky was much higher than in Indiana or Oregon in 2008. While Oregon’s primary, like Kentucky’s, is technically closed, self-identified independents made up a much larger share of Oregon’s Democratic primary voters in 2008, and I assume that this will also be the case in 2016. And while West Virginia holds an open primary, Democrats made up almost 80% of the voters in 2008.

While the model predicts that Bernie Sanders has a chance to win three of the next four Democratic primaries and is clearly favored in two, the relatively small numbers of delegates at stake in these three states and the expected closeness of the predicted margins indicate that he is unlikely to gain much ground in the overall delegate race. As a result, Hillary Clinton’s substantial lead over Bernie Sanders in pledged delegates is unlikely to change very much in the next month.