Canadian poll aggregators suck

In March 2008, a statistician using the pseudonymstarted a blog where he made predictions about the outcome of the 2008 US primary elections. This blog — named "FiveThirtyEight" in reference to the size of the US electoral college — quickly rose to prominence, and the author (who soon revealed himself to be Nate Silver) was widely lauded for his successful predictions of electoral outcomes.

He also attracted a large number of imitators — especially in Canada, where (since FiveThirtyEight concerned itself almost exclusively with US politics) they didn't face any competition from FiveThirtyEight. In some cases, these competitors went so far as to imitate the name; one of the first Canadian poll aggregation websites operated under the name "308" (a reference to the number of seats in the Canadian House of Commons), and a later website used the name "338" — after the House of Commons expanded by 30 seats.

Unfortunately, when political nerds try to imitate what a statistical nerd has done without having any understanding of statistics, the outcome is quite predictable: They suck.

There are a few ways that Canadian polling aggregators have failed. This is not intended to be an exhaustive list, and not every polling aggregator has failed in all of these ways; but any of these is severe enough to result in significantly misleading results.

Failing to account for house effects. As I wrote about in 2008, pollsters have "house effects" which skew their polling data. These come from many sources, including polling methods (some voters are more likely to talk to a human; others are more likely to press buttons in response to automated prompts) and how questions are asked (some pollsters ask about the "party X candidate"; others about "party X"; and others about "party X led by leader Y"). But whatever the source of these house effects, ignoring them produces a highly misleading view of the field: For example, the CBC Poll Tracker always shows an upwards spike in support for the Conservative Party of Canada every time a new poll from Angus Reid, DART, or Forum is released. An effective poll aggregator must model house effects and compute a polling trendline using "adjusted" polls. Considering only the most recent poll from each pollster. Polls are inherently noisy; typical reported statistical margins of error are +/- 2%, and those are theoretical ideal values which assume perfect random sampling. With pollsters releasing new polls on a weekly or even daily basis, a large shift (e.g., Nanos Research's poll ending October 3rd, which reported an overnight 3.4% jump in Liberal support) is far more likely to be due to sampling error rather than an actual shift in the underlying numbers. Discarding "earlier" polls loses important information, and effective poll aggregation should always avoid losing information. Mishandling rolling polls. Speaking of Nanos Research: They and Mainstreet Research both report 3 day "rolling" polls; each day they add a new day of polling and remove the oldest day of polling. Sites are handling these polls in at least three wrong ways: Ignoring all but the latest poll; including every third poll (to avoid overlapping dates); and including all of the polls but reducing their weight by a factor of 3 to account for the reused data. The right way for a polling aggregator to handle rolling polls is to "reverse engineer" the original daily data and use those values (which inherently have a much higher margin of error due to the small daily sample sizes). Ignoring the full date range of a poll. Many polls reported during the 2019 Canadian Federal election campaign have been conducted over a 3 day period, and then reported the following day; for example, a poll conducted between October 1st and 3rd would be reported on October 4th. This is not always the case, however: There have been a handful of polls conducted during a single day, and others over the course of an entire week; and while most polls are reported the day following the final day of polling, some are reported late on the final day, and others are reported 2-3 days later. Most polling aggregators ignore the date range and treat polls as if they were conducted on a single day — occasionally the "midpoint" day, but usually the final day. Some aggregators do even worse, and treat polls as having been performed on the day the poll was reported. At times when party support levels are changing — for example, after the Justin Trudeau "blackface" scandal on September 18th (which cut Liberal party support by 1.0%), or after the Leaders' debates on October 7th and 10th (where NDP support surged by 3.2% and Bloc Quebecois support increased by 1.4%) it is essential for polling aggregators to use the correct date ranges. Not accounting for non-sampling polling noise. As I mentioned earlier, polling "margins of error" are theoretical ideal levels based on the assumption of perfect random sampling. Guess what? Nobody is perfect. In practice, some pollsters are far "noisier" than their sample sizes would indicate; while there are some exceptions (in both directions!) the added polling variance from non-sampling error is typically around the same size as the unavoidable error from random sampling. Good polling aggregators should estimate the excess variance for each pollster and use this to compute "corrected" error margins for polls.

Can we do better? Yes we can — and with good aggregation methodology, the results make far more sense. Stay tuned for details about how I'm aggregating Canadian political opinion polls.

Disqus