Introduction

The New York Times and Siena College have partnered to create some really cool “live polls” of key races in the midterm elections. Results are updated in real time as polling participants are dialed, allowing viewers to see just how much the results move around during the polling process.

Early results have shown that races rated as Toss-Ups by Cook Political Report are mostly polling as almost exact ties between the candidates. I recently found myself shouting irrationally at my phone as I watched veteran, viral video star, and all-around baller Democratic Amy McGrath lose her lead to incumbent Republican Andy Barr in the KY-6 poll.

And then I remembered something interesting from my first-year Stats classes: can the Ballot Theorem be applied to this problem?

Ballot Theorem

Bertrand’s Ballot Theorem applies to a simplified election scenario when we only have two candidates (and ignore things like blank ballots, overvotes, etc.). We suppose ballots are counted in a random order, one at a time. We keep a running tally of the margin of votes between the candidates.

The Theorem tells us that the probability that the winning candidate leads throughout the entirety of the counting process is precisely given the winning candidate’s final margin. So if a Democrat beats a Republican by 5% in the final tally, there is a 5% chance the Democrat will lead throughout the entirety of the ballot counting process.

A really nice proof can be found in Chapter 4 of Rick Durrett’s Probability: Theory and Examples. The basic idea builds on the theory of simple random walks: random processes defined by a sequence of variables such that . If we define , then will trace out a path starting from such as the one below:

Here is a sketch of the proof:

Denote the candidates as A and B, and define as A’s final vote margin over B and as the total number of votes. Any ballot counting order in which A leads throughout can be represented as a path from to which never touches 0 again. The path must go through , so we can equivalently state this quantity as the number of paths from to without ever touching 0.

as A’s final vote margin over B and as the total number of votes. Any ballot counting order in which A leads throughout can be represented as a path from to which never touches 0 again. The path must go through , so we can equivalently state this quantity as the number of paths from to without ever touching 0. We make use of the Reflection Principle, which tells us that if , then the number of paths from to that are 0 at some time is equal to the number of paths from to . This implies the number of paths from to which are 0 at some point equals the number of paths from to .

, then the number of paths from to that are 0 at some time is equal to the number of paths from to . This implies the number of paths from to which are 0 at some point equals the number of paths from to . Some combinatorics tells us that the number of paths from to is given by where .

to is given by where . Putting this together, we see that the number of paths from to that are never 0 is given by . Some algebra shows us that:



where the final term is the total number of paths from to . Thus, the proof is complete.

Modifying the Ballot Theorem

Now, in the context of the NYT/Siena Polls, we make a few observations and simplifications:

The random ordering assumption inherent in the Ballot Theorem is more appropriate in this context. In an election, votes are tallied by precinct, inducing a very non-random ordering to the reporting. But in this case, because the polls are conducted by dialing voters at random, the assumption is essentially correct.

To get to basic results, we’ll ignore several complexities: Respondents are allowed to state that they are undecided, but we ignore this and pretend each response is either for the Democratic or the Republican candidate. Poll results are weighted so as to match the demographic profiles of the expected voter population in November. We ignore this and assume the reported results are simply based on proportions of respondents who say they will vote for each candidate. We ignore any issues induced by non-response and time series dependence (highlighted by Professor Gary King on Twitter).



With these simplifications, we can adapt the Ballot Theorem to these polls. In particular, results are first reported when 150 responses are tallied, and the polls terminate at roughly 500 responses. We want to answer the question: if the Dem is leading by responses with votes tallied, and ultimately leads by responses with votes tallied, what is the probability that she remains in the lead for the full tally between and responses?

Again, we can count paths. We are interested in the number of paths from to that are never 0.

to that are never 0. Intuitively, this is equal to the number of paths from to that never hit . By the Reflection Principle, the number of paths from to that do hit is equal to the number of paths from to .

to that never hit . By the Reflection Principle, the number of paths from to that do hit is equal to the number of paths from to . So our total count of relevant paths is given by:



Dividing this by gives us the proportion of paths (and thus the probability of occurrence) in which the Democrat remains in the lead from to responses. This simplifies a bit to:

Prob(Dem always leads after votes) =

Results

We can now apply these results with and to see the effect on the NYT/Siena polls. Below, we plot the probability that the poll shows the Democrat consistently in the lead for a variety of potential values at 150 and 500 responses. Note that we’re only looking at positive values on both axes, since if the winner flipped between 150 and 500 responses, the result would obviously have to cross 0.

A key takeaway from these results: even if the Dem has a pretty big lead at both 150 responses and 500 responses, it’s still plausible that she will lose the lead at some point in the counting. If she leads by just over 5% at both times, for example, then there is only a 56% chance she’ll continually hold the lead throughout (and thus a 44% chance she’ll lose it at some point). And with most of the races polling within a point or two in the end, it’s exceedingly likely that the leader will change throughout the counting.

Conclusions

There are a lot of simplifications here, so the above plot is emphatically not a perfect representation of these probabilities. But even in a simplified setting, we see that the polling leader is very likely to change even after 150 responses — especially in the tight races being polled. So if your preferred candidate is losing her lead, take a deep breath and check back in when the poll is completed.