Sam Wang is a data scientist, a co-founder of the Princeton Election Consortium and an associate professor of neuroscience and molecular biology at Princeton University. Follow him on Twitter @SamWangPhD.

As the midterm election season heats up, politically inclined quantitative nerds like me have been trying to predict which party will end up in charge of the Senate. For us, it is the most suspenseful question of the year. Control could easily go either way.

Today, there’s a glut of forecasts out there, each one promising to be more accurate than the last. Their authors range from veteran handicappers like Charlie Cook and Larry Sabato to relative newcomers like the Monkey Cage and the Upshot. How to make sense of the free-for-all? You just need to keep a few basic principles in mind.


The Data

Polls tend to be scarce before Memorial Day, so early predictions of the November election outcome must rely on indirect indicators of how voters are likely to behave—what we call “fundamentals.” To make a sports analogy, these predictions are like a team’s initial seeding in a tournament. They just tell us who’s looking good at the outset of the campaign.

Once polls become available, they can capture the same ballpark range of November performance that fundamentals do—and with much less uncertainty. Years of polling have shown that what voters say they want “right now” is a strong starting point for predicting, give or take a few points, how they will vote in the fall. Because of that—no matter the race—the most accurate predictions are made using polling data, when enough of it is available.

The bottom line: Even at this early stage, polls are our best way to predict November outcomes. In the 2012 election, for instance, polling data available in July and knowledge of how far presidential polls tend to move in the months leading up to the election were enough to give President Obama’s reelection a probability of 91 percent. That crept up to nearly 100 percent as the election approached. However, predicting the partisan control of the Senate in 2015 is a far harder problem.

The Model Type

In the election-prediction business, most models fall somewhere between two extremes: Type 1, which is purely fundamentals-based, and Type 2, which is purely poll-based.

Here are a few different quantitative models and how they answer the question: Will Republicans take over the Senate? The GOP needs to pick up six seats in the fall, on top of the 45 the party currently controls.

Model: GOP Takeover Probability

New York Times ( The Upshot): 41 percent

FiveThirtyEight ( Enten and Silver): over 50 percent (not yet specified)

Washington Post ( The Monkey Cage): 77 percent

At first glance, these three predictions seem very much at odds with one other. But in my experience, any contest with probabilities between 20 percent and 80 percent should be regarded as a toss-up, with no solid favorite. All three models above therefore suggest a knife-edge situation. Although I love to point out that all predictions like in this range are basically hedged bets, I recognize that it is natural to ask why the probability given is above or below the magic 50 percent threshold. To understand the answer, a closer look at the models they have used can prove useful.

Type 1: Fundamentals only. Type 1 models, which rely on no polling data at all, have the advantage that they can be created before the campaign even starts. The Monkey Cage model is currently pure Type 1, relying on a large number of fundamentals, from candidate “quality” to economic growth. This year, the most important fundamental is that, in midterm elections, national public opinion tends to go against the president’s party. That gives us some idea of the range of possible outcomes: Basically, Democrats are going to lose seats.

Interestingly, because of this reliance on national public opinion, as a general rule, with a Democratic president in power, the more a model relies on non-poll-based assumptions, the more it will favor the Republicans. Note that the probability of a GOP takeover is higher in the Monkey Cage model than it is in the others.

When it comes to extremely close races, though, Type 1 models are of limited use. Modelers put in lots of factors that have been shown to affect election outcomes (“signals,” in engineering parlance), such as the economy and incumbency. But each factor you add also contributes “noise”—accumulating uncertainties that, once added, cannot be taken out. For example, during a midterm election year, the generic congressional poll (Would you rather vote for a Democrat or a Republican?) tends to move against the president’s party—but the range of actual outcomes on Election Day ranges from an 11 percentage-point loss to a 4 percentage-point gain in the national popular vote margin.

Fundamentals can be national factors, such as the generic congressional ballot, which captures a general national mood. Or they can be local, such as whether an incumbent is in the race, a factor that attempts to capture how well known a candidate is. But these are simplifications. From a reader’s standpoint, probabilities in Type 1 models should never be read with more certainty than, say, the National Weather Service’s numbers.Rain forecast probabilities are good enough to help us plan our weekend outings—and even they are uncertain enough always to be rounded to the nearest 10 percent.

Rather, Type 1 models are hypotheses about where a campaign is naturally headed. You can think of them as asking, “Do our assumptions about how politics works give the correct prediction?” They tend to be of most use after the results are in.In 2012, Type 1 presidential models ranged from predicting a Romney win to an Obama landslide—and everything in between.

If past history is any guide, FiveThirtyEight comes up with a more exact model, it will have a strong Type 1 component but will also include some polling data. That probably explains why FiveThirtyEight’s state-by-state win probabilities seem to give Democrats a better shot than the Monkey Cage does.

Type 2: Polls only . Once we have a sufficient amount of polling data, fundamentals lose their importance for prediction purposes. All the fundamentals are naturally baked into the polling data. Even today, about 160 days before the election, polls are fairly predictive, and are enough by themselves to form a clear snapshot of the current state of play. The Upshot is closer to a Type 2 model. It focuses on polling data, using fundamentals about candidate quality and national trendsto set expectations for how voter sentiment might change between now and the election. (This is an excellent approach to combining polls with fundamentals, one that was pioneered by Drew Linzer’s Votamatic.) The Upshot’s method is likely to be more accurate than the others, though the Monkey Cage does plan to update its model to reflect new polling data.

One question faced by all the models is when to start phasing in polling data and phasing out fundamentals. To estimate the possible range of outcomes, it is even possible to skip using fundamentals at all, simply by using the ups and downs of polls to estimate the range of likely movement between now and Election Day. In 2012, I used a polls-only Type 2 approach to get all 10 close Senate races correct, while the election-eve FiveThirtyEight calculations, which leaned heavily on fundamentals, got two races wrong.

What does a pure Type 2 approach for 2014 look like? Using my time-tested methods, I took a shot at it.

My first step was to take the median and the uncertainty in the median (which I calculated) of up to five recent polls (no more than six weeks old)for each close Senate race. There are nine: Alaska, Arkansas, Colorado, Georgia, Iowa, Kentucky, Louisiana, Michigan and North Carolina. (Taking the median rather than the average reduces the impact of outliers caused by a biased pollster or some other error. Though high-quality Senate polls are hard to find, with this approach the number of available polls is more important than their quality.) I used the median and its uncertainty to calculate a win probability for each seat and then, a total seat count.

At the time I made this calculation, in three states (Alaska, Georgia, Iowa) the GOP nominee was not yet settled. In those states I used the front-runner in the upcoming primary.

Here is what today’s snapshot looks like:

What this graph shows is a histogram of the likeliest outcomes of an election held today. It takes into account all 512 possible combinations of outcomes across nine close races. The probability of Republicans winning 51 seats or more is 33 percent—which you can get by adding up all the red-labeled probabilities—well within the knife’s edge range that I mentioned before. Note the similarity with the Upshot’s number (41 percent—derived from a poll-heavy model).

A shift in public sentiment of even a few points nationally could change the picture significantly. When Senate races move, they tend all to move in the same direction. Currently, Senate races in Kentucky and Louisiana lean slightly toward the Republican candidate—but that could change if Democrats surged in the event that, say, congressional Republicans come close to staging another government shutdown. Conversely, the races in Arkansas, Colorado and Georgia lean slightly Democratic—but this could be reversed in a matter of weeks if President Obama’s approval ratings, now on the uptick, go back into decline.

Using today’s polls as a starting point, I re-calculated the possible Senate outcomes in the event that the polls in those nine races were to shift by a given number of percentage points in the Democratic or Republican direction.

As you can see, a 1 percent increase in the Democratic vs. Republican vote margin would yield, on average, about 0.8 more Democratic seats, and the same is true in the Republican direction. A swing of two points toward the Democrats would put their odds at controlling the Senate at 19-1, far better than the current 2-1 odds in their favor. A swing of two points in the Republican direction puts that party’s odds of control at 3-1 in the GOP’s favor, up from 2-1 against them.

Such swings are well within the range of possibility, which explains why—even with all of these fancy charts—it’s so hard to predict who will win the Senate in the fall.

What to Watch For This Summer

In the weeks and months ahead, be on the lookout for the following changes.

In June, watch for Democratic gains.President Obama’s approval ratings have been rising for the past month. Yet some Senate polls are up to six weeks old. In June, as state polling catches up with presidential approval data, look for all of the models to move in the Democrats’ direction.

Starting in July, watch for Republican gains. After the primaries, Republican-leaning voters will coalesce around their nominees in Alaska, Georgia and Iowa, boosting their fortunes. The size of this trend is critical for the GOP’s success.

Keep an eye on Louisiana and Kentucky.In these two states, the incumbents are outperforming their approval ratings—for now. Both Mary Landrieu (D-La.) and Mitch McConnell (R-Ky.) are leading slightly, yet their approval ratings are dismal, in the 30s. Their weakness opens a major opportunity for their challengers.

Watch President Obama’s approval ratings. The president is currently at about 5 percent net disapproval (approval minus disapproval) in the HuffPollster average. This is an improvement from November, during the early rollout of HealthCare.gov. If Obama’s numbers continue to improve, the chances for Democrats to keep control of the Senate improve as well. If his numbers get worse, a GOP takeover will become more likely.

***

So after all that, what have we learned? In the end, it is near certain that the Senate will be more closely divided in 2015 than it is now. Assuming neither party has a massive meltdown, I would not be surprised to see a 49-51 split in one direction or the other. Vice President Joseph Biden might end up with plenty of tie-breaking work to do.