



Nice 2PP. Shame it's for the other side ...





"I have always believed in miracles" said re-elected Prime Minister Scott Morrison very late on Saturday night. But many (not all) of us who study national Australian polls and use them to try to forecast elections have believed in a miracle for one election too many. The reason we believed in this miracle was that it kept delivering. While polls failed to forecast Brexit, Trump and two UK elections in a row (among other high profile failures) Australian national polls continued to churn out highly accurate final results. The two-party preferred results in final Newspolls from 2007 to 2016 are an example of this: 52 (result 52.7), 50.2 (result 50.1), 54 (result 53.5), 50.5 (result 50.4).





Predicting federal elections pretty accurately has long been as simple as aggregating the polls, adjusting for obvious house effects and personal votes, applying probability models (not just the simple pendulum) and off you go; you generally won't be more than 5-6 seats wrong on the totals. While overseas observers like Nate Silver pour scorn on our little polling failure as a modest example of the genre and blast our media for failing to anticipate it, they do so apparently unfamiliar with just how good our national polling has been since the mid-1980s compared to polling overseas. As a predictor of final results, the aggregation of at least the final polls has survived the decline of landlines, volatile campaigns following leadership changes or major events, suspected preferencing shifts that frequently barely appeared, herding with the finish line in sight, and come up trumps many elections in a row. This has been put down to many things, not least that compulsory voting makes polling easier by removing the problem of trying to work out who will actually vote (another possibility is the quality of our public demographic data). But perhaps it was just lucky.





And so, when warning signs appeared, in the form of both the Coalition moving within historical striking distance, and then a ridiculous level of herding first called out in an excellent post by Mark the Ballot , I went through the motions of warning that there was a realistic if slim chance that the polls were all baloney. Based on historic projections, there was maybe a 25% chance that the Coalition would win anyway (a la Trump vs Clinton). But after so often flogging the poll failure risk horse and having my fears proven groundless, my heart wasn't totally in it. After seeing blowouts compared with final polling in five of the last six state elections around the country, and in seat polling where clusters of 50-50s and 51-49s were often followed by lopsided margins, it looked more likely that Labor would win by more against a government that nine months ago had set itself on fire. To the extent that this was a widespread view (especially after the death of Bob Hawke) it was another incidence of Nate's First Rule : polls can be wrong, but trying to second-guess which way they will be wrong if so nearly always backfires.





A Failure In Two Parts

To outline the nature of the failure briefly, especially for unfamiliar audiences overseas, this is how the final polled primary votes and two-party preferred (2PP) compare with the current counting towards the final result.





(* Average excludes Essential, which did not provide a breakdown for UAP.)





The primary for the Liberal-National government was underestimated by about three points and the opposition Labor Party was overestimated by two points. Errors on the other parties were minor. The two-party preferred (2PP) figure on the far right hand side, however, is the key figure in Australian polling, because which of the Liberal-National and Labor parties wins the nationwide two-party-preferred vote will usually form government (with some exceptions when it's very close). The 51.6% is my estimate based on swings in the ABC's election night projections; currently the live count is at 50.9% but this live count excludes several seats where the Government and Opposition are not the final two candidates, or where they were wrongly expected not to be.





Depending on exactly where the 2PP ends up, a small part of the circa 3 point miss on the 2PP may be caused by modelling error regarding how the preferences of the minor parties would flow. Nearly all of it, however, has been caused by getting the primary vote for the major parties wrong. (The exception is the Ipsos poll which has a long-unfixed habit of getting its Green primaries a few points too high, and also tends to get Labor low compared to others - which cancels out as 82% of Green preferences flow to Labor.)





seventeen published polls (and the Galaxy exit poll as well) all had a rounded 2PP for the government of 48, 48.5 or 49. The sample sizes of these polls varied from several hundred to around 3000, so if the average had really been around 48.5 the smaller polls would have had about a one in three chance of landing in this band randomly, while the larger polls would have done so about 60% of the time. The chance of all seventeen doing so - if they are independent and purely random samples - is a little under 1 in 200,000. But in fact polls aren't purely random and the use of weighting in the polls should increase their margin of error and make the streak even less likely. As



The failure is amplified because not a single poll in the entire term of government showed the government leading, with the exception of five obviously dodgy respondent-allocated 2PPs in the short-lived YouGov-Fifty Acres series. The government hadn't even tied a poll, except a single Ipsos on respondent preferences, since 2016. The old saw about the only poll counting being the one on election day held true, except that in Australia now, election day goes for three weeks, robbing the pollsters of the usual excuse for this sort of thing (that people simply changed their minds.) So that's the first part of the error - that, on average, pollsters had the 2PP wrong by about three points. This by itself might be dismissed as part of the "house effect" of polling at this election. But there is more: the finalpublished polls (and the Galaxy exit poll as well) all had a rounded 2PP for the government of 48, 48.5 or 49. The sample sizes of these polls varied from several hundred to around 3000, so if the average had really been around 48.5 the smaller polls would have had about a one in three chance of landing in this band randomly, while the larger polls would have done so about 60% of the time. The chance of all seventeen doing so - if they are independent and purely random samples - is a little under. But in fact polls aren't purely random and the use of weighting in the polls should increase their margin of error and make the streak even less likely. As Professor Brian Schmidt pointed out in the Guardian , the mathematics do not lie. Some of the polls were not pure samples, some of the polls were not independent of the same pollster's other polls, or some of the polls were not independent of each other.The failure is amplified because not a single poll in the entire term of government showed the government leading, with the exception of five obviously dodgy respondent-allocated 2PPs in the short-lived YouGov-Fifty Acres series. The government hadn't even tied a poll, except a single Ipsos on respondent preferences, since 2016. The old saw about the only poll counting being the one on election day held true, except that in Australia now, election day goes for three weeks, robbing the pollsters of the usual excuse for this sort of thing (that people simply changed their minds.)





Is this like Trump or Brexit?









But the failure that has happened here is actually worse than the Trump failure. In the Trump case, national polls were actually quite accurate - they projected that Donald Trump would lose the popular vote, which he did, though not by quite as much as they projected. It's the equivalent of a 1 point 2PP failure in Australia, which in this instance would probably have seen Bill Shorten in the Lodge but without a floor majority. The main failure in the USA was in the polling in a few particular states crucial to Trump's win in the Electoral College.



It is also slightly worse than Brexit. The average error in the Brexit case was very similar, but the polls were not herded - two pollsters' final polls had No winning, albeit by less than it did. Recent major failings of polling like Trump and Brexit have created an impression that polling is getting much worse, when actually, worldwide, it isn't - it's as mediocre as it always was . It just happens that there have been failures on some particularly momentous and close contests that have cast polling in a bad light.But the failure that has happened here is actually worse than the Trump failure. In the Trump case, national polls were actually quite accurate - they projected that Donald Trump would lose the popular vote, which he did, though not by quite as much as they projected. It's the equivalent of a 1 point 2PP failure in Australia, which in this instance would probably have seen Bill Shorten in the Lodge but without a floor majority. The main failure in the USA was in the polling in a few particular states crucial to Trump's win in the Electoral College.It is also slightly worse than Brexit. The average error in the Brexit case was very similar, but the polls were not herded - two pollsters' final polls had No winning, albeit by less than it did.





Why were the polls, on average, wrong?





One of the problems in saying why the polls were, on average, so wrong is that Australian pollsters, especially YouGov-Galaxy which also administers Newspoll, don't tell us very much about how they are doing their polling. This frustrating opacity is a big contrast to many UK pollsters whose released results come with lengthy and detailed reports ( example ). For instance, we know that when Galaxy took over Newspoll it commenced augmenting phone polling with online samples, but the breakdown of the two methods isn't published. Another example is that in late 2017 the pollster changed the way it dealt with One Nation preferences in constructing its 2PP estimate. The change was well justified, but the pollster did not make it publicly known until psephologists had gradually detected the issue over the coming five months , during which time the Australian had continued to claim that the poll was using last-election preferences.





Getting accurate samples in polling is increasingly difficult. No major Australian pollster still uses purely landline-based sampling, nor has since 2015, but one still often sees claims that they do. Nowhere near everyone has a landline, answers it, or takes a recorded-voice call. Live phone calls are expensive and prone to social-desirability bias (as is face-to-face polling), a possible source of Ipsos' constant inflation of the Greens vote. Pollsters lack access to a complete database of mobile phone numbers, so some people can be reached by mobile phone and some can't. Online polling is another solution, but not everybody likes spending their time filling out surveys on a computer for trivial returns. This particular failure has cut across all of these polling modes.





Here is one explanation that has been offered that is definitely false:





* Margin of error: People casually familiar with margin of error are claiming that the failure was within the margin of error of the polls. But margin of error applies to the results of a single poll, not a string of them. One poll might be wrong at the outer edges of its margin of error, but if even two polls in a row by the same company do this in the same direction then there is already a problem. (This is a variant of error 3 in my list of People casually familiar with margin of error are claiming that the failure was within the margin of error of the polls. But margin of error applies to the results of a single poll, not a string of them. One poll might be wrong at the outer edges of its margin of error, but if even two polls in a row by the same company do this in the same direction then there is already a problem. (This is a variant of error 3 in my list of margin of error myths ). Also the failure was outside the margin of error of the largest polls taken. The 17 clustered polls collectively had a sample of about 23,600 on which the margin of error would be 0.6%. Four or five times that margin puts you nine standard deviations from the expected sample if you were correct - in other words, extremely improbable.





Here are some explanations that have been advanced that in my view are non-starters:





* Late swing: the idea here is that those making up their minds on the day swung to the Coalition but made that decision too late to be included in the sample. The problem with this is that prepoll voting was going on more or less throughout the period of these wrong polls, and prepolls have actually showed a greater swing to the Coalition (2 points 2PP) than election-day booth voting (0.8 points) - figures as of Sunday morning. Of course, the prepoll voting mix has changed a lot with the massive increase in prepolling but even so to expect that to make 4 points of difference the other way seems a bit much.



* Rolling late swing: the idea here is that some voters were intending to vote Labor but chickened out because of scare campaigns on the way to the ballot box (or post box) whether they voted before polling day or on it and voted Coalition instead. This theory avoids the problem with the prepoll and booth swings mentioned above. However if this was the case the polls should have shifted to the Coalition through the campaign by about a point as these voters reported back their actual vote. This didn't happen. All else being equal this should have also led to a larger gap than last time between the votes of those who had already voted and those voting on the day in the few polls that provided a breakdown of this. In the one I saw, Ipsos, it didn't.





* "Shy Tory effect": the idea here is that conservative voters are afraid of telling the pollster they vote Liberal because they think the interviewer will think they are a bigot. But unless a respondent is very paranoid, they're hardly likely to care about admitting they vote Coalition to a robopoll or an online survey. Also, there is no systematic skew to Labor in recent Australian polling - the Victorian election with its 3.3 point skew to Coalition in final polls being an example. If ever Tories had a campaign to be shy about, that one, with its "African Gangs" beatup, would surely be it.





Here are some explanations that in my view are plausible (at least as partial explanations):









* Connected to the above, some pollsters may have been underweighting (or failing to set quotas for) other important information. There is too little information available about what Australian pollsters actually adjust their samples for, but what is available often refers just to age, gender and location. Pollsters should not be expected to declare their exact formulae, but a list of everything a pollster weights for would be useful in picking cases where the polls might be missing something. * As with overseas polling failures, pollsters may have been oversampling voters who are politically engaged or highly educated (often the same thing).* Connected to the above, some pollsters may have been underweighting (or failing to set quotas for) other important information. There is too little information available about what Australian pollsters actually adjust their samples for, but what is available often refers just to age, gender and location. Pollsters should not be expected to declare their exact formulae, but a list of everything a pollster weights for would be useful in picking cases where the polls might be missing something.





* There could well have been an unusual skew to the Coalition among politically disengaged voters who are basically unreachable by any method by pollsters, but are still required to vote (a way in which compulsory voting can make polling more difficult not less). Pollsters just have to trust that the unreachables behave like those they could reach with the same demographics



I may add others as I see them mentioned.





Herding, smoothing etc





Even if some explanation can be found for the average skew of the polls at this election, it doesn't explain the run of seventeen polls in a row with the same wrong result. This sort of thing is often referred to in polling studies as herding . Nobody wants to be the lone pollster with the completely wrong result while all the others are right (a la Morgan in 2001) so the myriad subjective choices that can be made in polling may result in struggling pollsters being more likely to get results similar to better pollsters. But if the normally good pollsters then make mistakes, the whole herd is dragged off course. In reducing the risk of being the outlying pollster, herding pollsters increase the risk that everyone is wrong. An appearance of herding has been common at recent Australian elections including the 2016 federal election, and sometimes this takes the form of one or more pollsters who have had different results to Newspoll early in a campaign saying the same thing as Newspoll at the end.





self-herding, and just coincidentally happened to do so around the same range of values as each other. This could happen if pollsters were using some form of unpublished smoothing method to stop their poll from bouncing around and producing rogue results. Galaxy has always had uncanny stability, and when it took over Newspoll we started seeing such things as, at one stage, the same Newspoll 2PP six polls in a row. Essential has also in the past been prone to get "stuck", but seems to have behaved more naturally recently. Galaxy has also displayed strange behaviour in seat polling at both the last two elections - when it repeat-polls a seat, the difference between the two polls is on average little over a point on 2PP, about half what it should be randomly even if nothing has happened in the campaign in that seat. Also in 2016, Galaxy's seat polls had strangely underdispersed swings.







It's worth noting that the primary votes do not display the same level of herding as the 2PP, for reasons including Ipsos' trademark inflation of the Greens vote, and also Essential having One Nation too high. But it's largely irrelevant because the 2PP is the figure that is used to forecast results and that would be the obvious concern for any pollster worried about their reputation. In theory, herding the 2PP could also be a factor in decisions about how to allocate preferences from minor parties. The four different pollsters (counting Galaxy and Newspoll as one) applied four different preferencing methods. Galaxy is known to have changed theirs at least twice during the term, while what Ipsos did is unclear based on published information.



Were the seat polls better?



David Briggs of YouGov-Galaxy has referred to the many seat polls that Galaxy issued in the final week as pointing to the correct picture and has complained that when these results were released they were scoffed at because they didn't fit the narrative.



I'm aware of 16 such results from the final week. In fact, on current figures (which won't change much) they were on average 3.2 points better for Labor on 2PP than what actually happened, so they were just as bad as the national polls in that regard. Across the whole campaign, the average 2PP/2CP error per seat for the 21 Coalition-vs-Labor/Greens seats polled by Galaxy/Newspoll (some seats polled more than once) was about 2.7 points. It's true that all these polls showed only two wins for Labor in Coalition seats, both of which they actually won. It's also true that these polls showed Labor behind on average in two of their own seats, both of which they lost. And of the 18 such seats where these polls showed one side ahead on average, they picked the right winner in 16, with two in doubt where if the wrong side gets up, it will be by a very slim margin. So that's impressive.



But these seat polls did not reveal anything to cast doubt on the picture in the national polls, because they had the same swing as the national polls. In the final week Galaxy had Labor at 50-50 in four Coalition seats and 49-51 in five Coalition seats. If these were broadly accurate samples Labor would have won three or four of these seats by chance, but Labor has actually won none, has lost one of its own seats (Herbert) where the final Galaxy had it at 50-50, and is a tossup in another (Macquarie) that Galaxy had it leading in 53-47. Moreover, only one of the nine 50-50 or 49-51 seats has ended up even remotely close. There were massive errors in two Queensland seats (Labor held Herbert and LNP held Forde) that Galaxy had as 50-50; both are now showing the LNP over 57%.



The final week polls were also, yet again, somewhat under-dispersed for their small sample size, with a standard deviation of around two points both on the released 2PPs and the predicted swing from the actual election, compared to three points in the actual results and four points in the actual swings. (About three points is normal.) This repeats an issue that was



Were the internal polls better?



When there is an unexpected result one usually hears that the losing party had this result in its internal polling all along, released only to a select list of senior people. This is even though this is invariably news compared to what little could be gleaned about internal polls from pre-election spoonfeeding of the media and, in rare cases, leaks. As a general rule, any claim about internal polling that is politically convenient for the source making it should be ignored unless they publish the full details.



In this case, we have got a rare



The tracking poll covered 20 seats with an average pre-election 2PP of 51.4% to Coalition. As an average through the campaign, the tracking poll showed Labor getting a 0.8% swing, enough to probably govern in minority but probably not with a majority. At the end it kicked up to a 1.4% swing, probably enough for a slim majority government and completely consistent with the late reports from an unnamed Labor source that they expected to win about 77-78 seats (but hoped for more).



The interesting thing is that the tracking poll dipped below easy-win territory (51 and above) after only five days of the campaign, and at some points even had Labor losing based on the swing in these marginals. If Labor were taking this polling seriously they should have noted that it was telling a different story to the national polls early in the campaign, and should have realised that they needed to cut back on risks. Labor's tracking poll was as wrong as everything else at the end, but it wasn't always so.



There are, however, suggestions that the



Where to from here?



It's not clear that there was actually any herding this election. An alternative possibility is that pollsters could be-herding, and just coincidentally happened to do so around the same range of values as each other. This could happen if pollsters were using some form of unpublished smoothing method to stop their poll from bouncing around and producing rogue results. Galaxy has always had uncanny stability, and when it took over Newspoll we started seeing such things as, at one stage, the same Newspoll 2PP six polls in a row. Essential has also in the past been prone to get "stuck", but seems to have behaved more naturally recently. Galaxy has also displayed strange behaviour in seat polling at both the last two elections - when it repeat-polls a seat, the difference between the two polls is on average little over a point on 2PP, about half what it should be randomly even if nothing has happened in the campaign in that seat. Also in 2016, Galaxy's seat polls had strangely underdispersed swings. An article by former Nielsen pollster John Stirton raised concerns about the strange lack of volatility in Newspoll since Galaxy took over but the claim has been neither confirmed nor denied to my knowledge. (There's not necessarily anything wrong with it, but if a poll isn't a pure poll, this should be declared.)It's worth noting that the primary votes do not display the same level of herding as the 2PP, for reasons including Ipsos' trademark inflation of the Greens vote, and also Essential having One Nation too high. But it's largely irrelevant because the 2PP is the figure that is used to forecast results and that would be the obvious concern for any pollster worried about their reputation. In theory, herding the 2PP could also be a factor in decisions about how to allocate preferences from minor parties. The four different pollsters (counting Galaxy and Newspoll as one) applied four different preferencing methods. Galaxy is known to have changed theirs at least twice during the term, while what Ipsos did is unclear based on published information.David Briggs of YouGov-Galaxy has referred to the many seat polls that Galaxy issued in the final week as pointing to the correct picture and has complained that when these results were released they were scoffed at because they didn't fit the narrative.I'm aware of 16 such results from the final week. In fact, on current figures (which won't change much) they were on average 3.2 points better for Labor on 2PP than what actually happened, so they were just as bad as the national polls in that regard. Across the whole campaign, the average 2PP/2CP error per seat for the 21 Coalition-vs-Labor/Greens seats polled by Galaxy/Newspoll (some seats polled more than once) was about 2.7 points. It's true that all these polls showed only two wins for Labor in Coalition seats, both of which they actually won. It's also true that these polls showed Labor behind on average in two of their own seats, both of which they lost. And of the 18 such seats where these polls showed one side ahead on average, they picked the right winner in 16, with two in doubt where if the wrong side gets up, it will be by a very slim margin. So that's impressive.But these seat polls did not reveal anything to cast doubt on the picture in the national polls, because they had the same swing as the national polls. In the final week Galaxy had Labor at 50-50 in four Coalition seats and 49-51 in five Coalition seats. If these were broadly accurate samples Labor would have won three or four of these seats by chance, but Labor has actually won none, has lost one of its own seats (Herbert) where the final Galaxy had it at 50-50, and is a tossup in another (Macquarie) that Galaxy had it leading in 53-47. Moreover, only one of the nine 50-50 or 49-51 seats has ended up even remotely close. There were massive errors in two Queensland seats (Labor held Herbert and LNP held Forde) that Galaxy had as 50-50; both are now showing the LNP over 57%.The final week polls were also, yet again, somewhat under-dispersed for their small sample size, with a standard deviation of around two points both on the released 2PPs and the predicted swing from the actual election, compared to three points in the actual results and four points in the actual swings. (About three points is normal.) This repeats an issue that was seen in the last election , and raises the question of how on earth Galaxy keep getting results at both national and seat poll level that are less variable than they should be by chance.When there is an unexpected result one usually hears that the losing party had this result in its internal polling all along, released only to a select list of senior people. This is even though this is invariably news compared to what little could be gleaned about internal polls from pre-election spoonfeeding of the media and, in rare cases, leaks. As a general rule, any claim about internal polling that is politically convenient for the source making it should be ignored unless they publish the full details.In this case, we have got a rare glimpse under the hood of Labor's internal tracking polling, which reveals that Labor chose to contract YouGov-Galaxy for their internals even while the same company was also providing Newspoll and Galaxy polls to the Murdoch press (which was editorially hostile to Labor during the campaign). This is quite a surprise in itself, especially following the concerns about another pollster, uComms, having union ties that were not well known to clients . It raises the question of whether Australia has enough good pollsters to be able to avoid hiring someone who might be conflicted. (There's an illuminating history of other such issues by Murray Goot. It is known to me that ReachTEL often declines contracts to avoid having a conflict in the market.) YouGov-Galaxy have pointed out in the article that the two contracts were "siloed", meaning that the polling teams used for Newspoll and for the Labor research were entirely separate.The tracking poll covered 20 seats with an average pre-election 2PP of 51.4% to Coalition. As an average through the campaign, the tracking poll showed Labor getting a 0.8% swing, enough to probably govern in minority but probably not with a majority. At the end it kicked up to a 1.4% swing, probably enough for a slim majority government and completely consistent with the late reports from an unnamed Labor source that they expected to win about 77-78 seats (but hoped for more).The interesting thing is that the tracking poll dipped below easy-win territory (51 and above) after only five days of the campaign, and at some points even had Labor losing based on the swing in these marginals. If Labor were taking this polling seriously they should have noted that it was telling a different story to the national polls early in the campaign, and should have realised that they needed to cut back on risks. Labor's tracking poll was as wrong as everything else at the end, but it wasn't always so.There are, however, suggestions that the Liberal Crosby-Textor internal polling was rather accurate. Again, we need to see detailed figures on this.

I mentioned before that Australia's last fully-fledged polling failure in a national election was in 1980. Commenting on that case in 1983, David Butler wrote "Nowhere in the world has a debacle for the polls diminished their use in subsequent elections". But I wonder. The frequency and diversity of Australian polling (especially at state level) had already declined sharply in recent years compared to the period 2013-5, though Australian pollsters had done little to deserve that. This downturn was sometimes attributed to the Donald Trump upset casting pollsters in an unflattering light, though I think the increased dripfeeding of journalists with stories culled from internal/commissioned polling also has a bit to do with it. Where does an already weakened industry go from here?