If you’ve been following the wonky world of political science and data journalism, you’ve probably heard someone say that you should ignore early polling – especially in a presidential primary. When data nerds deploy phrases like this, we aren’t advocating for wholesale avoidance of these surveys – the polls provide reams of useful and interesting data. Instead we’re pointing out that in chaotic primaries (such as this cycle’s Republican contest) polls done long before the actual voting typically don’t do a good job of predicting election results. One only needs to scroll through the Washington Post’s Past Frontrunners feed to find Newt Gingrich, Rudy Giuliani, Howard Dean and other non-nominees topping the polls in the November before previous primary contests.

But this “ignore the polls” mantra raises important questions: When will the surveys become predictive? And which polls are most helpful in making accurate forecasts?

There are a couple different ways to tackle these questions, but my analysis of the data suggests that we should start paying a bit more attention to Iowa and New Hampshire polls sometime in the next few weeks. The predictive power of these polls in these states should -- if the pattern from past primaries holds -- start to increase soon and steadily continue to do so as these early contests approach.

Early State Polls Should Become More Predictive After You Finish Your Leftover Turkey

Around Thanksgiving during the last three primaries (the 2012 Republican primary, and the 2008 Republican and Democratic primaries), the races hadn’t fully taken shape; as such, the polls weren’t great predictors of final results. Then, roughly two weeks after Thanksgiving (indicated by the black line on the graphic below), early-state voters began to tune in and decide whom they would eventually support.

A quick note on the math here (those who aren’t interested can skip to the next paragraph). To get this figure I first obtained the RCP averages for each candidate in the 2008 Republican, 2008 Democratic and 2012 Republican New Hampshire primary and Iowa caucuses for a few months before each contest. Then, for every day for which I had complete data, I did a simple linear regression of those values against the final results in those states and plotted the R squared as “predictive power.” For the statistically uninitiated: R squared is a statistic between zero and one that basically communicates how well one set of things (the RCP averages on a given day) map onto some other thing (the final election result). In other words, this graph shows how well the RCP average for early states on a given day predicted the final results.

The story is relatively clear here. Polling averages become slightly more predictive throughout the fall, but until Thanksgiving they typically had a value of around 0.6 or less on a scale of zero to one (where zero means the polls predict absolutely nothing and one means the polls can be used to perfectly predict the final outcome). A value of 0.6 isn’t very good, but neither is it meaningless. For instance, around Thanksgiving of 2011 the RCP average for the Iowa Republican caucuses had Michele Bachmann, Rick Santorum, Rick Perry and Jon Huntsman at the back of the pack and Newt Gingrich, Mitt Romney, Herman Cain and Ron Paul at the front. Cain and Gingrich would crater and Santorum would surge before the January voting, but the rest of the candidates stayed pretty close to their previous rankings. So the polls at this point in 2011 weren’t perfect predictors, but they were far from useless.

A couple weeks after Thanksgiving, the surveys’ predictive power began to increase bit by bit. That doesn’t mean these primaries weren’t chaotic – a number of candidates saw short-term surges – but that, on average, as voters tuned in more and candidates competed for their support, the polls were better able to discern what the final result would look like. By the time the primary contest rolled around, voters had a pretty good idea of what they were going to do, and the polls reflected that.

This isn’t to say that Thanksgiving is some magical date after which voters decide to pay attention to primaries and that polls suddenly become better predictors. Iowa and New Hampshire are about a month later this cycle than they were in 2008 and 2012, so voters might not start to tune in until late December or early January this time. But our data provide us with a good rule of thumb – the polls will start becoming better predictors soon, so start paying attention to them now and really focus in on them as the actual contests get closer.

National Polls Matter – But Pay Special Attention to Iowa and New Hampshire

So far I’ve mostly focused on polls in the first two voting states rather than national surveys. That might seem odd considering how important the latter are. National polls provide the best picture we have of each candidate’s overall strength, help inform the decisions of key donors, control who has and hasn’t been on the main stage of each debate and shape media coverage of the race.

But when it comes to predicting the results of the primary, it may be a better idea to focus on New Hampshire and Iowa polls for now. That’s because national polls measure a race that, in the most technical sense, will never happen. There is no single day when voters nationwide will all go their polling places to register their preference in the presidential primary. These contests advance through a months-long calendar, a few states at a time, and candidates drop out as the voting rolls on. To see why this matters, suppose a California Republican voter is questioned in a national poll and says he or she supports Chris Christie. California’s primary is in June. It’s possible that Chris Christie drops out long before June, or that some candidate has won enough delegates to secure the nomination before the California primary even happens. The fact that that voter supports Christie could provide some insight into the New Jersey governor’s base of demographic support, but that data point isn’t necessarily useful for prediction. Late-state polls especially suffer from this issue – by the time voters in Texas cast their ballots, the results from Iowa, New Hampshire, South Carolina and Nevada will probably have knocked out more than a few competitors.

Again, national polls really do matter – but Iowa and New Hampshire occupy a special position that makes correctly predicting outcomes crucial to any good primary forecast.

Beware of Last-Minute Surges

Finally, it’s important to note that even good polls conducted close to an election can miss the mark. Sometimes a candidate catches fire at the last minute, as Rick Santorum did in the run-up to the 2012 Iowa caucuses. The polls did a good job of gauging support for many of the other candidates, but getting the right winner matters – and they missed that one. Similarly, the polls failed to detect Hillary Clinton’s 2008 New Hampshire Democratic primary win – a victory that revived her campaign and helped her stay in a long, tough fight with Barack Obama.

That being said, the polls are the most useful data we have heading into an election. And RCP’s poll averages provide a much better read on the state of the race than any single survey. But even if the polls end up being less than perfect heading into Iowa and New Hampshire, start paying attention to them now because they will likely become powerful predictors soon.