A huge amount of discussion about the game XCOM: Enemy Unknown pertains to the random number generation. Many people claim — either seriously, or in jest because they are so frustrated with their luck — that it is broken. Because I’m completely hooked on the game For science, I’ve been playing a lot of XCOM, and I have been recording my shots as I played. For every shot I actively took, I recorded the displayed chance of it hitting, and whether it actually hit or missed. (I ignored overwatch shots because I couldn’t see their probabilities, and also didn’t bother with rockets and other later-game non-gun weapons etc.) I’ve recorded over 1200 shots, and in this post I’ll examine the data to see if XCOM is fair.

The Psychology of XCOM

Keeping this record of hits and misses in XCOM taught me a lot about the psychology of playing the game. The longest streaks of hits that I noticed in the data was one streak with an incredible 18 hits in a row, and another with 19 hits in a row, with the following percentage chances to hit:

Streak 1: 65, 93, 85, 97, 100, 100, 73, 100, 95, 73, 57, 73, 86, 89, 94, 96, 81, 82

Streak 2: 95, 63, 94, 95, 73, 58, 100, 100, 100, 86, 84, 95, 98, 85, 84, 73, 100, 90, 95

What’s interesting is how I felt while playing the missions. I wasn’t sitting there shouting “amazing!” as hit after hit piled on. Both of the above streaks came in “very difficult” terror missions. In the first streak, all the aliens appeared on one turn, and I couldn’t kill the Chrysalids faster than they were turning the civilians into zombies. I downed 20 enemies in all, and lost 3 of my 6 men because I got so overwhelmed. So even when I had amazing positive luck with my shots, I didn’t even notice until I took stock of the spreadsheet after the mission.

On the other hand, here’s a (much more likely) streak from the third game I started, from the first mission, where I missed my first six shots with percentages:

45, 45, 54, 45, 45, 45

I ended up losing two of the four men, and since it was the first mission, I restarted in disgust. You really do notice only the negative streaks, and never the positive streaks.

The Fairness

Below is the best graph I could think of to represent the fairness of the random number generator. On the X axis is the stated chance to hit: the number that popped up in the box when I took the shot. I’ve grouped these into 5% bins, and then plotted a bar for each bin, showing how many of those shots actually hit. In an ideal world, with infinite data, the red line on the graph would pass through the tops of all the bars. It would actually be surprising if this happened exactly: it’s called random for a reason, and with the small amount of points I have in each bin (around 60), it’s likely that the proportions I observed are not perfectly in line with expectation.

That looks reasonably fair to me. I think that graph is the most comprehensible output you’ll get, but “looks about right” is not very scientific! Read on for a more precise methodology.

Significance Testing

The problem with determining whether something is truly random is that you can never be sure. Theoretically, any string of hits and misses is possible in XCOM (except 100% shots missing), so you can never know for sure if it was a broken random number generator or bad luck. The best you can do is collect a lot of data, and see if it’s an unlikely result, and then conclude whether you’re confident that the data came from a random generator.

Here’s the idea then behind testing for random generation. We pick the individual to-hit percentages, e.g. 65%, for which we have the most data (at least 20 shots). We then work out what the chance was of getting a result as extreme as the one we observed. If this chance is low (conventionally, 5% or less), the data is unlikely to have come from a random generator. For example, let’s say that we had fired 25 shots at 85%, and all of them had hit. The chances of this happening is only 1.7%, so unlikely if the generator was truly random.

However: one complication to this method is that if we check several percentages, we are likely to find one that’s extreme. On average, if we check 20 different percentages, we’ll find one that we are 95% sure is too extreme. This is known as a type I error (an awful name!). To control for this, we can use a procedure known as FDR, and your eyes are probably glazing over right now, so let’s get to the result.

The Result

The result is that, based on my data, there was no evidence (at the 95% confidence level) to suggest that the random generator is unfair. If you want to see the data and the working, it’s all available in this spreadsheet.

Caveats

There a few caveats to note. One is that my significance testing is very underpowered: despite recording a lot of shots, I don’t have enough shots for specific percentages to be likely to spot any deviations that aren’t large. (Specifically: at 80% power, for 50 hits, I could only have spotted 20% deviations in the shot percentage.) More data would solve this problem! One alternative would be to use Bayesian methods, where I could express my prior belief that the generator is fair:

The other caveat is a potential problem with the data, caused my own lack of XCOM ability! In the spreadsheet, Game 1 through Game 5 are “Classic”-difficulty Iron Man games. I’d completed Classic Iron Man quite smoothly once already, and figured I’d be fine. Five failures later, and I was left with the problem that I didn’t have enough data for high-percentage shots, which tend to occur later in the game when your soldiers are high-rank — but I got all mine killed! So Game 6 was non-Iron Man, so that I could use a little reloading to carry me safely through to the later game. Shameful.

Reloading causes two problems with the data. One is that if you record some shots, reload, then record again, you’re effectively recording the same thing (due to way the state of the random generator is saved in the game). So I was careful to wipe out any data up until the point I was reloading from. The other problem is that this can still introduce a bias to the data, because you’re more likely to have missed when you trigger a reload, and more likely to have hit when you don’t then choose to reload. I did not reload too often (and usually it was an alien’s hit that caused a reload, not my own miss), but there is that potential small bias in the data for Game 6. I also haven’t tested autocorrelation (“streakiness”), mainly because I’m not sure how to do so on this kind of data. But overall, I’m fairly confident that XCOM is indeed random.