Advance Summary

Seat Polls Vs Uniform Swing

Free Data Outperforms Seat Polling!

How Seat Polls Should Be Used

Is Repeat Seat Polling More Accurate?

Cases Where Seat Polling Should Be More Useful

---------------------------------------------------------------------------------------------------------average errormargin of errorIn the leadup to the "Super Saturday" federal by-elections, some unexpected seat-polling results have been met with widespread derision on social media. Results like a 54-46 to Liberal ReachTEL in Braddon, and a 52-48 to LNP ReachTEL in Longman, fly in the face of a historic pattern that governments don't win opposition seats in by-elections (especially not when they're already polling badly). Given the recent history of seat polls performing badly at federal elections, while national polls have performed well, it's been very easy for people to dismiss these findings out of hand.The widespread disbelief in seat polls has been advanced by the recent publication of Simon Jackman and Luke Mansillo's excellent analysis (PDF download) of the performance of seat polls at the 2016 election. Jackman and Mansillo found, among other things, that seat polls at that election were so bad that they should be treated as if their sample size waswhat it actually was. Errors on the primary votes were especially severe:At the 2013 federal election, seat polls were even worse than in 2016, displaying a massive average pro-Coalition skew . A big miss by ReachTEL in the Darling Range by-election (about 7% out two-party preferred, for a poll taken only one week from the election) won't do wonders for any confidence still out there in seat polling.So we know seat polls are pretty bad, and there are plenty of reasons being suggested as to why that might be so, but that doesn't answer an important question:I'll give an example of why seat poll data shouldn't automatically be thrown away just on account of it being more erratic than it seems it should be. Suppose I know nothing about two candidates for an upcoming election and I do a poll with sample size 800 and get a 50-50 result. Now, someone else comes along using exactly the same methods with a sample size 200 and gets 55-45 to one candidate over the other. Obviously, no-one should believe the smaller sample. But if the two polls are combined, the expected result is 51-49 to the candidate who led in the smaller poll. So long as the smaller poll was conducted using the same methods, it is better to combine the two samples rather than to throw away data.However, suppose we find out that the smaller poll might not have been conducted by the same method, but might have been conducted using some method that would have caused a skew to one of the candidates. Then, unless we could establish how large that skew was, we should ignore the smaller poll.Even if (per Jackman and Mansillo) a seat poll with sample size 750 should be treated as really having a sample size of 125, that alone is not enough reason to ignore it totally. It might still be reasonable to allow it to nudge our opinion about the seat slightly one way or the other.If seat polls were really good face-value sources of information about upcoming results, one thing we should expect them to do is to out-predict a really simple model such as uniform swing. The simplest version of a uniform swing prediction is to estimate a nationwide swing based on polling, and assume exactly that swing will occur in every seat.I had a look at the 2016 seat polls to see if they did that. I looked at 70 seat polls from 38 classic-2PP (ie Liberal/National vs Labor) seats taken during the (long) 2016 campaign itself. I only looked at the 2PP estimates because that is what determines who is claimed to be winning the seat (thus, polls without a published 2PP estimate were ignored). I ignored who commissioned the polls and what 2PP preferencing method each pollster used.On this basis the average absolute error of these polls (ignoring which direction it went in in each case) waspoints. That means a(not the same thing) of about 6.3 points.However, if we plug in the national 2PP swing that happened (3.13 points) into a uniform swing model and match it to each of the seat polls (which means counting several seats more than once), the average error of the seat polls drops topoints. That is, the seat polls performedthan assuming there was no variation between the different seats at all.Of course, that's not a fair test. Nobody can know exactly what the national swing can be in advance, and if someone was doing the same thing using a national swing estimate that was a bit wrong, presumably they would make worse errors and might not beat the seat polls?Well, actually, no. In fact, most remotely reasonable 2PP swing estimates (and some remotely unreasonable estimates too!) would have still beaten the seat polls by the humble uniform swing method. Any estimate of the national swing between 2.36 points (ie 51.15% 2PP for Coalition) and 5.39 points (ie 48.12% for Coalition) would have done the job. The smallest error would have beenpoints, off a swing estimate of 4.10 points (49.41%), on account of the seat polls including a few seats where the Government got mugged by monster swings.So suppose you were a newspaper staffer and you commissioned a seat poll of one of these seats in 2016. You then wrote up an article about the seat using no source of predictions except for the seat poll. It turns out that on average, in 2PP predictive terms, you spent money to obtain a 2PP result from a pollster that wasthan if you'd just said "National polling aggregates says there is a 3% swing against the government, based on which the 2PP result expected in (seat blah) is (blah)". If being accurate in predicting election results is what media are purchasing seat polling for, then they're wasting their money on achieving a worse result. (And lest anyone think 2016 was especially bad for seat polling, 2013 was much worse.)Moreover, a uniform swing model is one of the more primitive models that is ever freely available. On average, it in turn will be outperformed by models that also include retirement and sophomore effects for sitting members, and possibly statewide federal polling breakdowns (though I haven't tested the latter over multiple elections.)I don't think media sources that commission polling really care that much about this - I think they mainly commission polls to have something interesting to report, and that news sources might even prefer an inaccurate but startling result to an accurate but boring one. Some media (or activist groups) might well even be happy if their seat polls tended to skew to one side or other by a point or two. But if news sources do care about using seat polls to do forecasting then they need to think about using them correctly.Seat polls are unreliable data, but that doesn't mean they aredata. In the case of the 2016 election, it's easy to test this by comparing the predictions of (i) the seat polls and (ii) the uniform swing model with (iii) a hybrid model that is a weighted average of both.So, plugging in the actual 2016 swing of 3.13%, we already know the uniform swing model beats the seat polls when it comes to average error. However, the hybrid model using each given seat poll in turn beats the uniform swing model provided that the weighting given to the seat polls is not more than 77%. The best result in that case (an average error of 2.85 points) would have come from giving the seat polls a 52% weighting and the national swing model a 48% weighting. By playing around with both the national swing and the seat poll weighting it's possible to get the average weighting down a little further (eg a 4.4% swing with a 41% weighting for the seat polls gets it down to 2.77 points).However the exact "best" assumption set for an election in retrospect doesn't mean anything (and using a particular election to find what would have been the best value for that election, then applying it in the future, creates a big danger of overfitting ). The point is that using two sources of imperfect data (seat polls and some kind of national or state swing based model - hopefully a better one than just uniform swing) is likely, with any reasonable set of assumptions, to work better than just using one.So, for instance, supposing that modelling done without seat polling suggests Party X is likely to win 54:46, but a seat poll shows Party Y leading 51:49, a news report on the seat poll shouldn't say that the seat poll shows Party Y headed for victory. Rather, it should say that seat poll result, while inconclusive raises a question about how secure the seat is for Party X.On the other hand, if the existing modelling suggests Party X is expected to win 51:49 but a seat poll shows Party Y ahead 58:42, a news report on the seat can say that Party Y appears to be headed for a much stronger than expected result in a seat which established modelling does not provide much of a guide to.And if the existing modelling says, say, 52.5:47.5, but the seat poll is 52:48 the other way, then the correct reading of the two in combination may be that the seat is anyone's guess.One might also normally expect that where there are multiple seat polls of the same seat, averageing their samples will provide a more accurate result than otherwise. Therefore if we had a result that was not what modelling off the national polls expected, that result would be weighted more highly if it was the average of five or six different seat polls rather than just one. This is what I assumed in my own 2016 modelling of individual seat results, so that when a seat had had many seat polls, my prediction was mainly determined by them.Unfortunately 2016 just didn't support that view. Bass was polled four times and the average of the four 2PPs was wrong by 7.6 points. Macarthur was polled three times and the average was wrong by 8 points. Dobell was polled four times and the average was wrong by 4.06 points. Lindsay was polled six times and five of the six polls had the wrong winner, with the average still being wrong by 2.6 points (which may not sound like much but is much higher than would be randomly expected). Overall there was no difference in 2016 between the error for single seat polls in seats polled only once (2.83 points) and the error for the average of multiple polls in seats polled more than once (2.82 points). Which is another way of saying that individual polls in seats that received more attention tended to be worse than those in less heavily polled seats!Seat polling should in theory be more useful than otherwise in cases where the modelling otherwise available is worse. The first example of this is. It is very difficult to model Coalition vs Independent or Labor vs Green contests in particular seats from the state of national polling. Sometimes, as with the Nick Xenophon Team in 2016 or One Nation in the 2017 Queensland state election, one can have a go at it by using a combination of polling and the results of some other election. Unfortunately, here the track record of seat polling seems to be even worse than for classic seats. Independent challengers sometimes surge as polling day approaches, their profile increases and the feeling of an upset grows. Labor vs Green (and for that matter Liberal vs Green) seats are often inner-city seats with high levels of enrolment churn and a lot of uncontactable voters.The other one, since the matter is so topical, ought to be. So for instance a current challenge is to try to predict Longman and Braddon, which are both opposition-held seats where both government and opposition are contesting. However, the standard deviation of 2PP swings in such contests (historically) is six points, so the margin of error on the average swing to oppositions goes into double digits. Even taking out factors that explain some of the variation (such as whether the federal government is polling well or poorly at the time), a seat poll taken reasonably close to the by-election should still be more accurate than such a model.While ReachTEL's disaster in Darling Range is getting a lot of bad social media press at the moment, it is worth bearing in mind that the company's poll of Canning after the removal of Tony Abbott had a 2PP error of less than two points, and this was also true of both its polls in Bennelong.The Mayo by-election is interesting because it is both a by-election and a non-classic contest, making it completely unmodellable by normal means. The best one could attempt by way of a model was to assume that the Centre Alliance vote might decline in accord with what happened in the SA state election (on which basis Rebekha Sharkie could have been in trouble). However multiple polls showing very large leads for Sharkie are enough evidence to destroy this narrative and replace it with a view that Sharkie should easily hold the seat.