By: Dr. Ikjyot Singh Kohli

Election season is upon us again, and a number of people from political analysts to campaign advisors are making a huge deal about winning the Iowa caucuses. This seems to be the standard “wisdom”. I decided to run some analysis on the data to see if it was true.

I looked at every Democratic primary since 1976 and tried to find which states are absolutely “must-win” for a candidate to be the Democratic presidential nominee. Because the data from a data science perspective is scarce, I had to run Monte Carlo bootstrap sampling on the dataset to come up with the results.

Interestingly, irrespective of the number of bootstrap samples, three classification tree results kept coming up, which I now present:

Winning a certain state was encoded as a binary variable. “0” indicates a candidate losing the state, while “1” indicates a candidate won the state.

Very interestingly, from the classification tree above, one sees that actually the most important state for a candidate to win to ensure the highest probability of being the Democratic nominee is Illinois.

The other result from bootstrap sampling was as follows:

Winning a certain state was encoded as a binary variable. “0” indicates a candidate losing the state, while “1” indicates a candidate won the state.

Here we see that winning Texas is of paramount importance. In fact, all subsequent paths to the nomination stem from winning Texas.

There is also a third result that came from the bootstrap simulation:

Winning a certain state was encoded as a binary variable. “0” indicates a candidate losing the state, while “1” indicates a candidate won the state.

We see that in this simulation, once again Illinois is of prime importance. However, even if a candidate does lose Illinois, evidently a path to the nomination is still possible if that candidate wins Maryland and Arizona.

Conclusion: We see that from analyzing the data that Iowa and New Hampshire are actually not very important in becoming the Democratic party nomination. Rather, Illinois and Texas are much more important to ensure a candidate of a high probability of being the Democratic nominee.