Every year after Warzone: Atlanta, I have opened up Medium to write a blog post about the five games I played. And every year, I get to about game three and realize I find that kind of content incredibly boring.

So this year, I’m not going to try. Instead, this post is a follow up to a document I wrote before the event where I tried to predict the battle point outcomes of the event.

It generally received a positive reception from the community, albeit with some messages from players I predicted telling me I was strongly underestimating them.

Now the results are in, but before we dig into them, a congratulations to our winners:

Sigilite: Andrew Whittaker — Two time Warzone: Atlanta Overall champion, of The Best General renown. Warmaster: Red Powell — A relative newcomer on the competitive 40k scene, he writes a blog, The Worthiest Adversary, as a counter point to Adam Abramowicz’s The Best General. Imperial Artisan : Joseph Behrend — The only winner I don’t personally know, but his painting was exceptional. Imperial Envoy: Horton Doughton — A North Carolina local and follower of Khorne, as well as a long time friend of both The General Staff and Warzone.

They all did very well, if you happen to talk to any of them be sure to tell them a job very well done.

So, onto the predictions and results. Let’s begin with some stats. As a clarification of terms, a “differential” in this case means the difference between someone’s predicted rank and their actual rank. For example, I estimated myself at 3rd, and I came in 7th. That’s a differential of +4, or put another way, I overestimated my rank by 4 spots.

Warzone: Atlanta 2018 Attendance: 118

Average Differential: 3

Median Differential: 0

Correct predictions: 3

Highest differential: 92 (shout out to Michael Ralston, who was predicted at 25th and came in 117th, and in doing so, won our Ezekiel Abbadon 14th Time’s the Charm Black Crusade prize, for having the lowest battle points.)

That, on the surface, seems pretty impressive. The average differential is only 3! The model was on average 3 spots away from reality. However, let’s look at a graph of the results:

The reality, not so much. Red in this graph are the actual results, and blue are the predicted ranks. The averages appear to be decent, but in reality, are just canceling each other out. For each given “rank group” (blue line in the above chart), players both over and underperformed wildly, such that the group average was low, but the overall predictive quality of the model was also poor.

I don’t want to just accept that, though, so I made more graphs for us to look at:

Here, we have the differential of every player versus their predicted rank. Because many people had the same predicted rank, we end up with this stacking of points around each rank.

This graph actually says something relatively interesting about my model, specifically with the trendline above. (For the non-stats nerds, a trendline is just a line that draws the average of the y values [the differentials] for any given x value [predicted ranks])

The model overestimated the players that did well, and underestimated those that did poorly.

Any player above rank 60, on average did worse than the model said they did, while any player below rank 60, on average, did better. That’s a pretty interesting fact. I wanted to visualize this in a different way, so I grouped every group of predicted ranks (everyone predicted to be 5th, 20th, 45th, etc) and summed their differentials. It looks like this:

Here, we see actual conclusions start to emerge. We can see that in the lower ranks, the model was overestimating players, but not by that much, relatively speaking. Into the mid ranks we start to vastly overestimate player’s ability, and finally at the lower ranks we very strongly underestimate player’s ability.

I think this highlights one of the major problems with the model: a player with no data, and a player with a consistent record of poor results, look the same. Because of this, the low end is oversaturated with players who are not nearly as bad as the model thinks they are, but who lack the record to prove it.

There’s one more data set I took a look at and compared the predictions to: all of the intermediate results. I was curious, is the model more accurate at predicting round 4 placings rather than final placings? What about round 2?

This resulted in a lot of graphs. Rather than post them all here, I turned each set into a slow moving gif. The first of which, was of the actual results:

I, personally, don’t feel like this tells us much. It feels like the ranks are just kind of moving around without any clear pattern. So, I looked at our other data representations.

This tells us a lot. We can see the trendline getting less and less negative as the games go on. Essentially saying the model is overestimating the higher ranked players less, and underestimating the lower ranked players less, as more games are played. It’s very encouraging to see, essentially saying that while it has trouble predicting individual players, on the whole it more accurately reflects player skill the more games are played. And finally, we have our most fascinating gif:

In the first few rounds, the differential stayed relatively low for the first 15 or so rank clusters. Then things got strongly shaken up round 3, and never really recovered. Seeing as this is the round I got my teeth kicked in, that makes sense to me. Less personally, 3 rounds is enough for the higher ranked players to accumulate enough battle points to start meeting each other, which necessarily means some of them will move down in the ranks.

And that’s really what I’ve got. As per usual, I make the data backing all of this work public, available here. What do you think? Is this a massive waste of time? Do you find this kind of thing insightful? Is there some stat I missed I could go back and add? Let me know!