Six states vote today where a total of 694 pledged delegates will be apportioned; California is the big fish with 475. California was also the most heavily polled state of the six which means there is actually data to analyze. This article is going to focus on California as we aren't quite done gathering data in the other five states. North Dakota is the lone Caucus tomorrow, with no polling, and we've once again setup a crowd-sourced results page complete with a realtime map.

The Sanders campaign has focused much of their effort in the last three weeks on the Golden State, and for the most part their effort has paid off. New Jersey on the other hand appears to be a lost cause for the Sanders campaign; the ground organization in NJ was poor and many volunteers were inefficiently utilized. The structure of the get-out-the-vote efforts in California and New Jersey were vastly different; New Jersey stuck with the top-down hierarchical model, while California was largely conducted in a more a distributed fashion. To put it another way, their were more volunteers in NJ than the campaign could effectively allocate; this is a new problem in political campaigns and will be the subject of further analysis in a later article.

Their were a total of 21 polls conducted in California, the vast majority of which provided demographic sub-samples. This means our traditional table of "projections" is actually statistically relevant. These projections are predicated solely on polling data and aren't intended to predict the outcome; polling itself is predicated on the continuation of patterns, if the patterns hold polling is accurate, if not, then polling is inaccurate:

The California Primary permits registered Democrats and unaffiliated citizens to vote; the registration deadline was May 23rd. The open participation rules make it slightly more difficult for pollsters to predict the electorate. It is difficult to assess how many newly registered and unaffiliated voters may participate for the first time. Participants in closed primaries, like those held in New York and Pennsylvania, are more consistent and easier to predict which yields more accurate polling. About the only basis for accessing unaffiliated turnout is the 2008 primary where, according to exit polling, represented just 18% of the electorate [1].

We haven't historically collected party affiliation data for primaries, as its generally not interesting, but we have gone through and collated that data in California. Bernie is dominant among unaffiliated subjects, while Clinton wins among Democrats. This trend is also consistently present in General Election polling, with the most evident example being Georgia. In most states, and we don't have data for them all yet, Bernie does better (relative to Clinton) among Republicans and Independents but a larger number of Democrats remain undecided.

In the California Primary, if 34% of the electorate is unaffiliated, the outcome would be 50-50 based on the projected polling outcomes associated with party affiliation. If the electorate today matches the electorate from 8 years ago, and it anecdotally has in other states, the outcome would be roughly 54%-46% in favor of Clinton.

There are however some reasons to believe that Bernie's numbers may be under-sampled. In comparing the party affiliation "projections" to exit polling from previous states, Michigan [2], Wisconsin [3] and Indiana [4], the implied margin in California among unaffiliated voters likely underestimates Bernie's support by something in the neighborhood of 5-10%. That may seem significant, but if unaffiliated voters only comprise 18% of the electorate, it really only accounts for 2% at most.

Our last point as it relates to a potential outcome is the re-iteration of Bernie's support among Males. Bernie hasn't lost Males since Florida which was 84 days ago. The current polling projections show him up slightly which is consistent with historical trends.

Unfortunately the media won't be conducting exit polling in California, or any other state today, so most of this analysis cannot be verified. This decision by the media conglomerates to waive polling was purely an economic decision that does a disservice to the political science community and will inhibit retroactive analysis now and as it relates to the next election cycle.