Three recent surveys from highly rated polling firms (Marist College, Field and Public Policy Institute of California) show Bernie Sanders just 2 points behind Hillary Clinton in California. Clinton is ahead by double digits, however, in other polls, including one that has her up by 18 percentage points. It’s making for another confusing finish in a primary season that has already had plenty of them. And it’s an indication of how little we know about how Hispanic Democrats (and Asian-American Democrats) are voting this year.

Our polls-only model, taking all the various polls into account, gives Clinton a 5 percentage point lead, and translates that into a 86 percent chance of her winning California. Even though Clinton has led in every poll, that seems overconfident given the generally mixed track record of the polls in the Democratic primaries this year, and I’d happily take Sanders at the 6-to-1 odds the model offers.

But unlike in Michigan or Indiana, the two previous states that Sanders won as an underdog in our polling model, the demographics in California aren’t necessarily out of step with polls showing a Clinton lead.

At various stages throughout the Democratic primaries, we’ve been issuing forecasts from a demographic model. The ingredients of the model have changed slightly over time as we’ve learned more about what determines the Democratic vote — that Sanders does better than Clinton in caucuses, for example, or that Clinton does better in primaries that are closed to independent voters.

These demographic models have always accounted for race, but there are a lot of different ways to do it. Here are five alternative strategies:

I. Account for the percentage of white voters in a state; group all nonwhite voters (black, Hispanic, etc.) together.

II. Account for the percentage of black voters in a state; group all nonblack voters (white, Hispanic, etc.) together.

III. Account for the percentage of white voters and the percentage of black voters; group Hispanic, Asian and “other” voters together.

IV. Account for the percentage of black voters and the percentage of Hispanic voters; group white, Asian and “other” voters together.

V. Account for the percentage of white voters, the percentage of black voters and the percentage of Hispanic voters; group Asian and “other” voters together.

I’ve seen all of these strategies applied by various people over the course of the campaign. We ourselves have not been totally consistent about it, starting out by using strategy I, and most commonly using strategy IV, with occasional forays into strategy III.

Most of the time, it doesn’t make a lot of difference, but it does for California, and for some of the other states set to vote on June 7. Model I is very favorable for Clinton, for example. It notes that Clinton usually does well in states with lots of nonwhite voters. Therefore it projects Clinton to get 60 percent of the two-way vote in California, a state with lots of nonwhite voters, meaning that she’d beat Sanders by about 20 percentage points.

Model II has Clinton as a slight underdog in California, by contrast. It notes that the state has relatively few black voters (more of those nonwhite voters are Hispanic or Asian), and that its primary is open to independent voters. To model II, California looks a lot like Indiana — another open primary state with few black voters — and it expects Clinton to lose California by 3 or 4 percentage points, similar to her margin of defeat in the Hoosier State.

These differences are even more profound in other states. Depending on which model you use, Clinton is either an underdog in New Mexico, which has few black voters but lots of Hispanics and Native Americans, or a 50-point favorite.

PROJECTED CLINTON VOTE SHARE STATE MODEL I MODEL II MODEL III MODEL IV MODEL V California 60.1% 48.4% 50.5% 54.9% 54.2% Montana 46.2 39.1 39.6 38.4 36.7 New Jersey 63.7 59.1 59.9 60.3 59.4 New Mexico 76.0 46.7 52.4 63.2 61.0 North Dakota 26.5 29.5 28.8 28.3 28.8 South Dakota 47.4 39.5 40.2 38.3 36.0 How you model race makes a lot of difference in your forecast All models control for whether the election is open or closed to independent voters, whether it’s a primary or a caucus, and national polls at the time of the election.

This serves as a neat illustration of how small choices in building a model make a lot of difference. I don’t mean to make it seem like an exercise in futility, however. Instead, I think there are pretty good reasons to use one of the models (IV or V) that account separately for the Hispanic vote. Those models would have Clinton winning California by 8 to 10 percentage points, consistent with her lead in the polling average or maybe just a pinch better than it.

One reason is that Clinton has a good track record this year in states with large Hispanic populations, having blown Sanders out in Florida, Arizona, Texas and New York, and edged him out in Nevada (although she lost the Colorado caucuses). As you can see below, it’s hard to explain the vote in those states unless you have a variable to account for the Hispanic vote. (For the sake of simplicity, I’ve limited the table to models II and IV.) So it’s not just that the Hispanic vote is a statistically significant predictor of Clinton’s vote; it’s also highly practically significant.

CLINTON VOTE SHARE STATE MODEL II PROJECTION MODEL IV PROJECTION ACTUAL Nevada 42.7% 47.7% 52.7% Colorado 39.6 43.3 40.6 Texas 56.4 66.4 66.3 Arizona 49.3 55.1 57.6 Florida 63.1 64.9 65.9 New York 56.2 57.2 58.0 Accounting for states’ Hispanic populations produces more accurate results

Another reason is that, in those states, Clinton has done well in heavily Hispanic areas. So far, 17 majority-Hispanic districts have voted in the Democratic campaign: 10 congressional districts in Arizona, Florida, Illinois and New York, and seven state Senate districts in Texas (which tabulates its vote based on state Senate boundaries rather than congressional boundaries). Of those 17 districts, Clinton has won 16. In fact, she’s dominated them, winning an average of 66 percent of the vote to Sanders’s 34 percent. The lone, weird exception is Chicago’s earmuff-shaped 4th Congressional District, where Sanders won by 16 percentage points.

VOTE SHARE STATE DISTRICT SHARE HISPANIC CLINTON SANDERS Texas 27 89.1% 72.4% – 27.6% – Florida 27 83.1 70.5 – 29.5 – Texas 29 82.0 67.5 – 32.5 – Texas 20 77.5 71.8 – 28.2 – Florida 25 75.7 72.8 – 27.2 – Florida 26 74.0 69.4 – 30.6 – Texas 6 73.8 73.0 – 27.0 – Illinois 4 73.5 41.8 – 58.2 – Texas 21 72.3 67.4 – 32.6 – New York 15 70.4 72.8 – 27.2 – Arizona 7 70.1 58.5 – 41.5 – Texas 26 68.4 67.8 – 32.2 – Arizona 3 67.0 60.1 – 39.9 – Texas 19 66.7 71.0 – 29.0 – New York 13 61.1 62.9 – 37.1 – Florida 9 52.4 67.1 – 32.9 – New York 14 51.8 58.1 – 41.9 – Clinton has dominated majority-Hispanic districts District numbers in Texas refer to state Senate districts; in other states, to congressional districts Sources: Pew Research, The Green Papers, State of Texas

MARGIN OF SUPPORT POLLSTER OVERALL WITH HISPANICS/LATINOS Field Poll Clinton +2 Clinton +4 Marist College Clinton +2 Sanders +3 SurveyUSA Clinton +18 Clinton +6 PPIC Clinton +2 Clinton +9 YouGov Clinton +13 Clinton +12 USC* Clinton +10 Clinton +11 Polls show a close race for California’s Hispanic vote USC poll did not break out results among likely Hispanic voters. Results are extrapolated based on Hispanic registered voters. Exit polls also show some evidence of Clinton’s strong performance with Hispanics, although with some inconsistencies. They had her winning Hispanics by more than 40 percentage points in Florida and Texas and by nearly 30 points in New York, although narrowly losing them in Nevada and Illinois. In California, by contrast, recent polls do not show Clinton performing especially well with Hispanics. Instead, they have her winning them by about 7 percentage points, on average, similar to her overall lead on Sanders.

The Hispanic vote is not monolithic; Mexicans, Puerto Ricans, Cubans and other groups all vote somewhat differently from one another. Age can matter a lot: Clinton performs well among older Hispanics while Sanders does well among younger ones. The predominantly Spanish-speaking Hispanic population can vote differently from the English-speaking Hispanic population. All of this can make it dangerous to extrapolate results from one state to another. But it also makes it tricky for the polls, which often have small sample sizes for ethnic subgroups and trouble reaching a representative sample of Hispanic voters. To add to the complication, California also has a significant Asian-American population, and we have very little evidence about how Asian-Americans are voting this year.

So while the polls could be off by enough for Sanders to win California — I like his odds better than our polling model does — they could also be off in the other direction, meaning that Clinton could win by 15 to 20 percentage points. In 2008, Clinton significantly outperformed her polls in California, in part by winning the Hispanic vote 2-to-1 over Barack Obama.

Whatever the outcome, it’s almost certainly too late to help Sanders win the nomination; he’d need to win every remaining state by roughly 35 percentage points to catch up to Clinton in pledged delegates. But California may tell us something about whether Hispanic Democrats are already standing with Clinton, or whether she’ll have some outreach to do to ensure they turn out for her in November.