A colleague pointed me to this article by Sean Westwood, Solomon Messing, and Yphtach Lelkes, “Projecting confidence: How the probabilistic horse race confuses and demobilizes the public,” which begins:

Recent years have seen a dramatic change in horserace coverage of elections in the U.S.—shifting focus from late-breaking poll numbers to sophisticated meta-analytic forecasts that emphasize candidates’ chance of victory. Could this shift in the political information environment affect election outcomes? We use experiments to show that forecasting increases certainty about an election’s outcome, confuses many, and decreases turnout. Furthermore, we show that election forecasting has become prominent in the media, particularly in outlets with liberal audiences, and show that such coverage tends to more strongly affect the candidate who is ahead—raising questions about whether they contributed to Trump’s victory over Clinton in 2016. We bring empirical evidence to this question, using ANES data to show that Democrats and Independents expressed unusual confidence in a decisive 2016 election outcome—and that the same measure of confidence is associated with lower reported turnout.

The debate

My colleague also pointed me to this response by political analyst Nate Silver:

Not only that, but none of the evidence in the paper supports their claims. It shouldn’t have been published. The experiment finds that people are *underconfident*, not overconfident, when they see probabilistic forecasts. It directly contradicts the central tenet of their hypotheses.

In reply, Matt Grossman replied:

Reconciliation is people are bad at estimating win probabilities from vote share & vice versa. So they interpret a win probability of 75% as a landslide (even if they reduce it to 65% in their head) & see an 8 point lead as near 50/50. Turnout responds to perceptions of a toss-up.

To which Nate responded:

That’s the most generous interpretation and it still only rises the paper to the level of “it’s possible the evidence supports rather than contradicts our hypothesis if you introduce a number of assumptions we didn’t test (and ignore a lot of other problems with our experiment)”. . . . I think it’s unethical for them to make such strong claims with such weak evidence. . . . There are many other critiques I have of the experiment, including the lack of context for how election results are framed in the real world (i.e. people see poll results with headlines attached and not just numbers.) But that’s the main flaw and it’s a fatal flaw.

And Oleg Urminsky wrote:

I disagree with the main implication, based on my concurrent research with Lucy Shen . . . First, BOTH probability & margin forecasts are misunderstood. People UNDERestimate closeness of the outcome w/ probability forecasts, but OVERestimate w/ margin forecasts. . . . Second, the degree of misestimation is SMALL when the margins are small (i.e., a close election) and people really misestimate only when the margin is large. But when the margin is large, any resulting bias would have to be huge to actually change outcomes. . . . Third, we looked and found NO effect on voting intentions or other reported election behaviors. . . . So, while we think the fact that forecast framing affects people’s judgment is fascinating enough to have researched it for 3.5 years now, I disagree with the “probability forecasts depress turnout” takeaway.

Many other people participated in this thread too; you can follow the links and read the back-and-forth if you’d like.

Where do we start?

Before going in and looking at the evidence, I come into this question with a mix of contradictory prejudices.

To start with, I admire Nate’s political analysis and his willingness to accept uncertainty (most famously in the lead-up to the 2016 election). At the same time, I’m annoyed with his recent habit of drive-by criticism, where he engages in some creative trash-talking and then, when people ask for details, he disappears from the scene. Yeah, I know, he’s a journalist, and journalists are always on to the next story, they’re all about the future, not the past. But as an academic, I find his short attention span irritating. If he’s not going to engage, why do the trash-talking in the first place.

I also have mixed feelings about the sort of research being discussed. On one hand, I’m on friendly terms with Messing and Lelkes, and I think they’re serious researchers and they know what they’re doing. On the other hand, I’m generally suspicious of claims about irrational voters. On the third hand, I am on record as saying that people should be more likely to vote if an election is anticipated to be close (see section 3.4 of this article).

Getting to more of the specifics, here’s what Julia Azari and I wrote, following the 2016 election:

We continue to think that polling uncertainty could best be expressed not by speculative win probabilities but rather by using the traditional estimate and margin of error. Much confusion could’ve been avoided during the campaign had Clinton’s share in the polls simply been reported as 52 percent of the two-party vote, plus or minus 2 percentage points. That said, when the general presidential election is close, the national horse race becomes less relevant, and we need to focus more on the contests within swing states, which can be assessed using some combination of state polls and state-level results from national polls. An additional problem is the difficulty that people have in understanding probabilistic forecasts: if a prediction that Clinton has a 70% chance of winning is going to be misunderstood anyway, why not just call it 98% and get more attention?

So on the substance I’m in agreement with Westwood et al. that probabilistic forecasts are a disaster.

Consider that hypothetical forecast of 52% +/- 2%, which is the way they were reporting the polls back when I was young. This would’ve been reported as 52% with a margin of error of 4 percentage points (the margin of error is 2 standard errors), thus a “statistical dead heat” or something like that. But convert this to a normal distribution, you’ll get an 84% probability of a (popular vote) win.

You see the issue? It’s simple mathematics. A forecast that’s 1 standard error away from a tie, thus not “statistically distinguishable” under usual rules, corresponds to a very high 84% probability. I think the problem is not merely one of perception; it’s more fundamental than that. Even someone with a perfect understanding of probability has to wrestle with this uncertainty.

Where do we stand?

OK, to assess the evidence I have to read the two above-linked articles: the one by Westwood, Messing, and Lelkes, and the one by Urminsky and Shen.

So now to it.

Westwood, Messing, and Lelkes start by mentioning the rational-choice model of voting: “if P is the (perceived) probability of casting the decisive vote, B is the expected benefit of winning, D is the utility of voting or sense of ‘civic duty,’ and C is the cost of voting, then one should vote if P × B + D > C.”

One thing they don’t mention, though, is the very strong argument (in my opinion) that the “benefit term,” B, can be very large in a national election. As Edlin, Kaplan, and I discussed, voting is instrumentally rational to the extent that you are voting for a social benefit, in which case B will be proportional to the number of people affected by the vote, which is roughly proportional to the number of voters in the election. The probability P is roughly inversely proportional to the number of voters in the election (more evidence on this point here), and when you multiply P x B, the factors of N cancel, hence the anticipated closeness of the election is relevant, even for a large election.

We also discuss how anticipated closeness can affect turnout indirectly. If an election is anticipated to be close, we can expect more people to be talking about it, so voting will be more appealing as a way of participating in this talked-about communal event. In addition, if an election is anticipated to be close, we can expect more intensive campaigning, so as a voter you will get more encouragement to vote.

So, lots of reasons to expect higher turnout in a close election. It does seem plausible that a forecast such as “Clinton is at 52% in the polls, with a margin of error of 4 percentage points” gives more of a sense of uncertainty than a forecast such as “Clinton has an 84% chance of winning.” Both these are oversimplifications because they ignore the electoral college, but they give the basic idea.

In their paper, Westwood et al. give evidence that many people underestimate the closeness of an election when they are giving a probabilistic forecast. I can’t quite see why Nate says that their experiments reveal that people are “underconfident, not overconfident, when they see probabilistic forecasts.” There’s a lot in that paper, so maybe that’s somewhere, but I didn’t see it. Too bad Nate didn’t point to any specifics. Then again, he never pointed to any specifics when he was criticizing MRP, either.

Now to the paper by Urminsky and Shen, “High Chances and Close Margins: How Equivalent Forecasts Yield Different Beliefs,” which begins:

Statistical forecasts are increasingly prevalent. How do forecasts affect people’s beliefs about corresponding future events? This research proposes that the format in which the forecast is communicated biases its interpretation. We contrast two common forecast formats: chance (the forecasted probability that an outcome will occur; e.g., the likelihood that a political candidate or a sports team will win) versus margin (the forecasted amount by which an outcome will occur; e.g., by how many points the favored political candidate or sports team will win). Across six studies (total N = 2,995; plus 12 replication and generalization studies with an additional total N = 3,459), we document a robust chance-margin discrepancy: chance forecasts lead to more extreme beliefs about outcome occurrences than do margin forecasts. This discrepancy persists over time in the interpretation of publicly available forecasts about real-world events (e.g., the 2016 U.S. presidential election), replicates even when the forecasts are strictly statistically equivalent, and has downstream consequences for attitudes toward election candidates and sports betting decisions. The findings in this research have important societal implications for how forecasts are communicated and for how people use forecast information to make decisions.

Ummm . . . wait a second! Urminsky and Shen are not in contradiction with Westwood et al.! Both papers say that if you give people probabilities, they’ll have more extreme beliefs.

After reading Urminsky’s tweets (quoted above in this post), I was all ready to read the two papers and figure out why they come to opposite conclusions. But now I’m off the hook: the papers are in agreement.

As the button says, That was easy.

One thing I will say, comparing the papers, is that Westwood et al. have a bunch of excellent graphs. Urminsky and Shen have some good graphs—I like the scatterplots! Graphs are great. Always do more graphs.

So now that I’ve looked at the two papers, let me return to Urminsky’s remark that “the degree of misestimation is SMALL when the margins are small (i.e., a close election) and people really misestimate only when the margin is large. . . . we looked and found NO effect on voting intentions or other reported election behaviors. . . . I [Urminsky] disagree with the “probability forecasts depress turnout” takeaway.”

He’s making two points here: (a) misperceptions deriving from probabilistic forecasts are small, (b) there were no effects on turnout. I’ll now reread his paper with these two issues in mind.

Let’s start with Study 1 of the Urminsky and Shen paper, which they conducted during the 2016 election. It was a study of 225 people on Mechanical Turk. First, “Participants in the chance-forecast-displayed condition had more extreme reactions to the state of the election conveyed by the forecast than participants in the margin-forecast-displayed condition . . . these results suggest that chance forecasts are seen as conveying a stronger lead than margin forecasts convey. . . . We replicated this finding in an additional study (Study A1 in Appendix 1) . . .” So far, no evidence that the misperceptions were small.

On to Study 2, this time based on 1163 Mechanical Turk participants. Here’s what they found: “participants in the chance-forecast-displayed condition . . . overestimated the margin forecast (60.5% estimated vote share for Clinton vs. 52.6% actual forecasted vote share . . .). Participants in the margin-forecast-displayed condition . . . underestimated the chance forecast . . .” Thinking Clinton was going to get 60% of the vote: that seems like a large misestimation, so I’m not sure why Urminsky labeled it as “SMALL” in his tweet.

What about the results on voting intentions? Studies 3 and 4 of the paper are about sports betting, study 5 is about movie ratings, and study 6 was about drawing balls from an urn. That’s it. But I kept reading, and in the discussion section I found this:

When only one forecast format is widely available, our findings suggest that a systematic bias may result. Could this bias affect election results? As we demonstrated that the forecast format can affect attitudes, it could plausibly affect intention to vote, as well. In particular, if chance (vs. margin) forecasts leave readers with a stronger sense that the election has already been decided, showing chance forecasts might demotivate voters (Westwood, Messing & Lelkes, 2018). However, a presidential election is a high-profile event involving substantial news coverage, personal conversations, and other sources of information and preferences beyond forecasts. As a result, many people are likely to have formed behavioral intentions about whether or not they will vote (as well as about other election-related actions) prior to viewing forecasts, so they should have limited sensitivity to manipulated cues . . .

That makes sense, but it’s just theory, not data. They continue:

In considering the potential impact on elections, it is important to take into account that across our studies, the forecast-format bias was weakest when the margin was narrow (e.g., forecasted election results in Study 2). . . .

But I didn’t think that bias was so small! Thinking Clinton would get 60% of the vote—that’s a big number, it’s a Reagan-in-1984-sized landslide.

But then they come to the data:

More generally, we tested the potential impact of forecast format on intended election behaviors directly in Study 1, in additional election studies in which people were presented with information about changes in chance or margin forecasts over time . . . Forecast format yielded only a non-significant difference in the self-reported likelihood of voting . . . Furthermore, we did not observe a stronger effect of format on behavioral intentions (including voting) among participants living in states where the state-level presidential election was closer, and respondents’ votes therefore were more likely to be pivotal . . .

But, just cos something’s not statistically significant, that don’t mean it’s zero. Also, interactions are notoriously difficult to estimate, so you really really can’t learn anything from the non-statistical-significance of the interaction.

This study was based on only 198 survey respondents. You can learn a lot from 198 people, but not so much in a between-person comparison of a highly variable measure.

To their credit, Urminsky and Shen continue:

These results do not rule out the possibility that our sample size was not large enough to detect a small but real effect of forecast format on voting intention.

The question is, what is “small”? For example, if the people who followed probabilistic forecasts were 5% less likely to vote, and if 2/3 of these people were Democrats, that could make a difference.

Summary

1. The two studies (by Westwood/Messing/Lelkes and Urminsky/Shen) agree that if you give people a probabilistic forecast of the election, they will, on average, forecast a vote margin that is much more extreme than is reasonable.

2. I see no evidence for Nate Silver’s claim that people are “underconfident, not overconfident, when they see probabilistic forecasts.” Nor do I agree with him that “none of the evidence in the [Westwood et al.] paper supports their claims” or that “it shouldn’t have been published.” It makes me lose some respect for Nate that he said this and then didn’t follow up when he was called on it. But it’s not too late for him to either retract this statement or justify it more clearly.

3. Regarding the larger question of whether this sort of thing can swing an election: I don’t know. Like Urminsky, I’m generally skeptical about statistical claims of large effects on behavior from small interventions. First, I’ve just seen too many such claims; second, there really are a lot of things affecting our voting behavior, and it’s hard to see how this one little thing could have such a big effect. On the other hand, in a close election it wouldn’t take much. And I don’t think that Urminsky and Shen’s null finding on voting behavior tells us anything; it’s just too noisy an estimate. So, again, I just don’t know.

4. From a journalistic point of view, the story is just too delicious: the irony that a bunch of political junkies could depress their own turnout by following the news too closely. I’ve argued many times that probabilistic forecasts are overprecise and can lead to loud disputes that are essentially meaningless, and I’ve also discussed the perverse incentives by which probabilistic forecasters have incentives to make their numbers jump around so that they can keep presenting news. So, yeah, if all that artificial news can convince political junkies not to go out and vote . . . that’s the sort of counterintuitive story that can get headlines.

P.S. I agree with this post by Palko that news media horserace coverage is a joke: the partisan media is obviously spewing bias, but much of the nonpartisan media is also printing attention-grabbing crap. So the big picture is that it’s not clear how the voter is supposed to handle this information. Vote margins or win probabilities are the least of the issues.