Sorry, not that nation state…

This is an account of how I re-calculated the results of the 2016 Australian Senate election from the anonymised ballot papers.

You see, calculating who gets elected to the senate in Australia is quite a tricky business, and the software used by the Australian Electoral Commission (AEC) was kept secret despite attempts to release it with a freedom of information request.

This is a thoroughly unsatisfactory state of affairs for a country that purports to be a modern and functional democracy.

Most software is utter trash, and we have no reason to believe that the AEC is more capable of producing error-free code than anyone else. Particularly as, shortly after the election, the government’s code for the online census suffered an absolute meltdown (see: #censusfail).

With this in mind, on July 11 — a week after the election itself — I started hacking away on some code to compute the election result from the CSV files published on the AEC’s website.

STV what now?

The Australian senate uses a proportional representation system called single transferable vote (STV) which allows voters to specify several candidates they’d like to vote for in order of least to most evil. If your first choice doesn’t get enough votes to be elected, your vote is transferred to your second, and so on, until either you run out of people you can bring yourself to vote for, or all the available spots in the senate are filled.

The option to give an incomplete ordering of candidates was a recent addition to the senate voting system, and would have required significant changes to the AEC’s secret sauce, prompting even more reason for scepticism about its accuracy.

How it Works

These are the steps my code takes to bring you a teensy bit of open democracy. To cut to the chase, you can check out the full source code on GitHub.

Download the complete set of ballots for each state from the AEC’s website. Download an ordered list of candidates from my own personal site… this step is kinda dodgy, but my hand was forced when the AEC silently took their copy of this vital file offline… Parse the data into a big old list of ballots, some candidates, and some groups that those candidates belong to. The ballot parsing is the most nuanced, as we have to enforce rules about the number of candidates each voter has to specify, and decide which vote to count if someone has voted both above the line (for groups of candidates) and below the line (for individual candidates). It also doesn’t help that the AEC’s ballot data occasionally contains junk data like ‘*’ instead of numbers. For each state, bucket the ballots by candidate, and iteratively elect candidates who have greater than or equal to a full quota of votes. The quota is just a fraction of the total number of votes so that no more than n senators can be elected if there are only n spots to fill. Ballots get moved around between buckets when their preferred candidate is either elected, or knocked out due to ranking last. Once the iterations of step 4 have elected enough senators to fill all the available spots — stop, and print ’em out!

Step 4 is really where the interesting stuff happens, and I’ve glossed over the details here. The vote_map.rs and voting.rs files contain my implementation, and you can read more about the algorithm on the AEC’s site, in the legislation and on Wikipedia.

Challenges

When the algorithm elects a senator, all their ballots are transferred with a fractional value. Imagine we have a potential senator “Scott Ludlam” who has received 1.5 quotas worth of votes. This is enough to get him elected (yay), but we still have to account for the remaining 0.5 of a quota. To do this, all the ballots in Scott’s pile are transferred to their next favourite candidate, but with a reduced value of 1/3, because 2/3 of their value was “used up” electing Scott.

Although it’s tempting to represent these fractions as floating point numbers, this will result in an inevitable loss of accuracy (read more here), which could throw the whole election result off. Instead, we need to use a rational type that contains separate integers for the numerator and denominator.

Without any rounding, these rationals can get really big over the course of the algorithm, as we have to consider fractions of fractions of fractions, and so on, and the fractions are complicated things like 129,038/10,353,123, not 1/3. At one point before implementing the rounding behaviour specified in the legislation, I witnessed an irreducible fraction with around 3000 decimal digits in both numerator and denominator!

This made my program run really slowly (15 hours for NSW), and consume far too much memory (4GB+). Once I finally braved up to deciphering the legalese of the Electoral Act, I found the rounding rule and got the running time down to less than 20 seconds, with memory usage peaking at just less than 1GB (for the several million ballots).

Results

After all that work, my code calculates a result identical to the official one.

This means we have no evidence to indicate a fault in the AEC’s implementation. However, just because the two implementations happen to agree on the 2016 election, does not imply that they would agree on every election, and in no way rules out the possibility that both implementations are in fact incorrect. The important outcome of this work is that there is now a free and open source version of the algorithm which can be publicly scrutinised, and hopefully used to verify the results of future elections.

Interestingly, the result of the 2016 election seems reasonably stable with respect to minor alterations of the algorithm and ballot validation rules. Before I fixed the rounding behaviour, the algorithm still computed the same results (albeit slower), and likewise when the rules about ballot validity are strictly enforced (which results in thousands of ballots being thrown out).

Further Work

My implementation of the algorithm is by no means perfect, and there are still a few areas in which it is deficient. For example, the order in which candidates are elected differs slightly from the official order, and I should probably work out what the deal is with those invalid ballots in the AEC’s CSV files.

In addition to further improving the code, there are other areas of inquiry that could be interesting to explore:

Electoral fraud detection using statistical analysis. I know very little about this, but I’d like to try it.

Stability/ballot analysis. Are voting errors uniformly distributed across political allegiances? If not, could the AEC tweak the validation rules to alter the result? Does rounding favour small or large parties? Would we better off paying the computational price to use an exact algorithm with no rounding at all?

Pretty graphs related to either of the above.

All in all, this was a fun project, and I hope you enjoyed reading about it!

Thanks,

Michael

Link to source code: https://github.com/michaelsproul/aus_senate