Happy new year everyone!

I thought I would cope with my New Year’s hangover through the traditional method: Ibuprofen, coffee and attempting to solve NP hard combinatorial optimisation problems.

(The fact that I am now free to work on these things as my contract with Google has officially expired may have something to do with why I was looking into this today rather than waiting until I was less hungover)

As some of you might be aware I have a perverse interest in the Feedback Arc Set for Tournaments (FAST) problem. This is as follows:

Given an \(n \times n\) matrix \(A\), find an ordering of \(\{1, \ldots, n\}\), say \(\sigma_1, \ldots, \sigma_n\), which minimizes the sum of backwards weights \(w(A, \sigma) = \sum_{i < j} A_{\sigma_j,\sigma_i}\). This sum can be regarded as in a sense the degree to which the ordering gets the pairwise preferences of \(A\) “wrong”.

The classic example of why this is interesting is the Kemeny Young voting method. There are others (I think it’s useful for graph drawing for example) but I don’t know much about them.

Anyway, solving this is super NP hard. We’re not talking “it’s about as hard as the travelling salesman”, we’re talking “You want to solve this optimally for a hundred nodes? Ha ha. Good luck”. As a result one tends to use heuristic orderings for it.

I was wondering whether you could related this back to voting to get a good heuristic order and it turns out you can, and a really rather good one to boot. So this is a report on an experiment I’ve run which uses the Ranked Pairs voting algorithm to give an \(O(n^3)\) heuristic ordering for FAST that appears to produce quite a high quality result.

You can see all the code for this work here.

Evaluation

The metric I am using to evaluate methods is as follows: Let \(A\) be a matrix valued random variable and let \(f\) and \(g\) be functions which take an \(n \times n\) matrix and return a random permutation of \(\{1, \ldots, n\}\). Then \(f\) is better than \(g\) with respect to \(A\) if \(E(f(A)) < E(g(A))\).

Note that this notion of betterness depends very much on the distribution of \(A\). \(f\) is better for \(g\) for every choice of distribution only if for every fixed \(x\), \(E(f(x)) < E(g(x))\), which is a much stronger and hard to test condition.

I focus on one particular choice of \(A\): \(100 \times 100\) matrices are generated such that if \(i < j\) we pick \(A_{ij}\) to be any integer between 0 and 100. If \(i > j\) we pick it to be any integer between 0 and 70. We then shuffle the coordinates so as to prevent systems that are biased towards picking the coordinates in order from having an advantage. This gives us a matrix which has a rough “direction” but is really quite noisy in how well it establishes that direction.

I then generate 100 matrices and compare two methods by using a monte carlo permutation test for the difference of the means when applied to these matrices. Note: I am aware this is a somewhat dubious decision due to the potential difference in the standard deviations. This step could use improving.

Because I am comparing more than two different methods there is a multiple testing problem. I use the Benjamini-Hochberg procedure to control the false discovery rate to 1%. I think this is valid because of the nature of the correlations between the tests (false discoveries should all be positively correlated) but I’m not really sure.

As a calibration I tested adding duplicates of some methods to see if this would confuse the results and it did not. All the existing results remained statistically significant and no false conclusions were drawn about the relations amongst the duplicates.

Basically: The statistics here are kinda dodgy and I don’t really know enough to know how dodgy they are. There’s a reason working on my statistics is one of my plans for the next few months. However the results are all sufficiently convincing that I’m not too worried about the conclusions being wrong even if the methods are.

Methods compared

I compared six different methods, with varying expectations of some of them being any good (actually I compared more than this, but I threw some of them away because they were not any better and were annoyingly slow). These were:

Generate a random permutation Generate a random permutation then apply a local kemenization procedure (basically: swapping adjacent elements in the order when this will improve the score). Reverse this and apply the local kemenization procedure again. Pick the best of these two. Sort by the assigning a variable \(i\) the score \(\sum A_{ji}\) (this is basically the weight of evidence that various things are less than \(i\)\) and again apply the local kemenization procedure. A greedy algorithm that builds up the list one element at a time by iterating over the indices in a random order and inserting each element in the optimal location in the already built list. Kwik-sort, which is essentially applying quick sort with a random pivot to the order defined by the matrix and ignoring that that order isn’t actually transitive. This is as defined in this paper. I have not tried their proposed LP version of kwik sort because running the LP on this size of problem turns out to be really slow and I got bored of waiting for it. Ranked pairs. This works by building a transitive order over the elements. We iterate over the pairs of indices in decreasing order of weight (i.e. largest first). If the order of this pair has not been established already we “lock it in” in this order and add all of the other pairs that this + transitivity requires.

Results

Here is the program output:

Simple shuffle: 210202.81 +/- 3381.47. 1%=204032.12, 50%=209934.00, 99%=218417.48 Improved shuffle: 204489.03 +/- 2225.88. 1%=199087.83, 50%=204575.50, 99%=209383.01 Smoothed indegree: 178446.51 +/- 1699.39. 1%=174738.72, 50%=178306.50, 99%=183069.17 Insertion order: 175407.30 +/- 1965.16. 1%=171059.98, 50%=175226.50, 99%=179597.99 Kwik Sort: 195279.10 +/- 2923.19. 1%=189704.31, 50%=195339.50, 99%=201568.30 Ranked pairs: 172922.71 +/- 1431.80. 1%=170204.94, 50%=172847.00, 99%=177180.83< Pairwise comparisons: Improved shuffle < Simple shuffle (p=0.0020) Insertion order < Improved shuffle (p=0.0020) Insertion order < Kwik Sort (p=0.0020) Insertion order < Simple shuffle (p=0.0020) Insertion order < Smoothed indegree (p=0.0020) Kwik Sort < Improved shuffle (p=0.0020) Kwik Sort < Simple shuffle (p=0.0020) Ranked pairs < Improved shuffle (p=0.0020) Ranked pairs < Insertion order (p=0.0020) Ranked pairs < Kwik Sort (p=0.0020) Ranked pairs < Simple shuffle (p=0.0020) Ranked pairs < Smoothed indegree (p=0.0020) Smoothed indegree < Improved shuffle (p=0.0020) Smoothed indegree < Kwik Sort (p=0.0020) Smoothed indegree < Simple shuffle (p=0.0020)

(The reason they all have the same p value is that that’s what you get when you never see a result that extreme in 500 random draws in the permutation test)

Ranked Pairs seems to comfortably beat out all the other heuristics by some margin, and the permutation tests agree that this is statistically significant.

As a side note, I wish I’d done this comparison a while ago, because I’m quite surprised to find out how badly kwik sort does. This may be an artifact of the shape of the data. I’m not sure.

I also ran a smaller experiment to compare Ranked Pairs with Schulze voting. It had to be smaller because Schulze voting was too slow to run on the full set of 100 matrices, so the following is a comparison between Schulze and Ranked Pairs on 10 matrices:

Schulze: 176430.30 +/- 1391.75. 1%=174276.96, 50%=176359.00, 99%=178563.24 Simple shuffle: 212209.30 +/- 3389.81. 1%=208202.43, 50%=211704.00, 99%=219189.76 Ranked pairs: 172918.60 +/- 1701.93. 1%=170901.82, 50%=172503.00, 99%=175619.76 Pairwise comparisons: Ranked pairs < Schulze (p=0.0040) Ranked pairs < Simple shuffle (p=0.0020) Schulze < Simple shuffle (p=0.0020)

As Schulze appears to be both dramatically slower and a fair bit worse for this data set, it doesn’t seem worth pursuing as an alternative.