

Author: “No Bugs” Hare Follow: Job Title: Sarcastic Architect Hobbies: Thinking Aloud, Arguing with Managers, Annoying HRs,

Calling a Spade a Spade, Keeping Tongue in Cheek

Pages: 1 2

[[This is Chapter 19 from “beta” Volume VI of the upcoming book “Development&Deployment of Multiplayer Online Games”, which is currently being beta-tested. Beta-testing is intended to improve the quality of the book, and provides free e-copy of the “release” book to those who help with improving; for further details see “ Book Beta Testing “. All the content published during Beta Testing, is subject to change before the book is published.

To navigate through the book, you may want to use Development&Deployment of MOG: Table of Contents.]]

Random number generation is often seen as trivial by game developers (“hey, just call rand() and you’re done!”). Well, it is not really trivial. It is quite easy to make a very silly mistake, which can go unnoticed for years, and when you learn about it – the Pretty Bad Damage has already been done to your game/company/… 🙁 .

Psychological aspects of RNG

One important thing to keep in mind is that humans tend to have very selective memory when it comes to good and bad occurrences. Which translates into:

even if your RNG is statistically perfect, people will still complain 🙁

As Jeffrey Kaplan from Blizzard has put it: “We found that this had a lot of problems where players would run into streaks, and they only remembered the shitty streaks. ” [ShackNews.OnBlizzard].

“Some years ago, I made an interesting exercise and found that I myself was guilty of it too.Some years ago, I made an interesting exercise and found that I myself was guilty of it too. When I was playing Civ IV and my stronger unit got killed by a much weaker unit by AI opponent, I (as probably most of the people out there) was complaining about the whole random thing being rigged in favor of AI. Intuitively, I was perfectly sure that the thing is extremely unfair. However, I started to write down the stats of the occurrences. And found that while they didn’t pass certain spectral tests, overall averages were within expectations. In other words, while the RNG wasn’t rigged as such1 (though spectral correlations MIGHT have been somewhat unfair given the play patterns of me and AI), it certainly did feel unfair.

This might seem as an excuse to use poor RNGs, but it is not. Because

If your RNG is flawed, people will complain even more 🙁

Even more importantly, if your RNG is flawed and it is important enough for players, they may gather stats proving that you are not doing a good job. Also, you will lose an option to prove them that your RNG is good (which is a separate challenge, see discussion in “Dealing with Perception Problems” section below).

RNG Alternatives

Some people in the industry have argued removing an element of luck altogether. Examples of such techniques to avoid removing randomicity include Blizzard’s “progressive percentages” [ShackNews.OnBlizzard] and Prismata’s concept of “DeckHand” [Grant].

However, it should be noted that avoiding randomicity is certainly not a silver bullet and has its own drawbacks. As Jeffrey Kaplan has put it: “The problem was… we found that while it was great that it got rid of the bad streaks, it also got rid of all the good streaks”. “My job here is to tell how to implement RNG properly if you decided that you do need randomicityAnd Prismata, while avoiding randomicity after the game is started, still relies on random choices to start the game with (which means RNG, and also means player complaints [Grant]).

To summarise – whether you need an RNG, is a gameplay choice, and in this book I’m trying to avoid arguing pro or contra specific gameplay choices. My job here is to tell how to implement RNG properly if you decided that you do need randomicity (and for most of the games out there you will, believe me 😉 ).

Inherent Difficulties with Finding RNG Faults

Before we start discussing RNGs, let’s make one unpleasant observation:

Faulty RNG can easily sit there for years without you noticing it 🙁

Faulty RNG may easily produce a bit stream which looks good enough at the first glance (“hey, it certainly looks as a random hex string every time I look at it”), and deficiencies may surface only after serious tool-based analysis as described below.

On the first glance, such well-hidden failures may look as a non-issue, but it is not the case, because such failures can be (a) exploited and (b) proven after the fact.

For example, your players may notice the problem, will start writing about it (and even may start exploiting it – we’ll see an example of it later). Or, some day somebody may gather stats and prove that all the time your RNG wasn’t fair, and was favouring some playing style. Effects of these things surely depend on specifics of your game, but for some games it can easily be devastating, as we’ll see below.

Three RNG Spectacular Failures

“Just to describe what kind of problems a poorly implemented RNG can easily cause, I will provide three real-world examplesJust to describe what kind of problems a poorly implemented RNG can easily cause, I will provide three real-world examples. The first two are not directly related to games, but they are still very characteristic ones to show the scale of potential problems caused by poor RNGs. The third example is a real-world horror story of the game falling victim of its RNG.

Two Big Fat Problems with Crypto RNGs

Spectacular Failure #1. In 2006, Debian developer has made a change to RNG initialization code.2 The change effectively reduced amount of RNG seed material drastically, which in turn has lead to lots of the keys generated by Debian, being insecure. The problem was not discovered until 2008 (illustrating my point above about “sitting there for years without noticing it”), and has caused a whole lot of security problems worldwide. When your keys generated two years ago become vulnerable all of a sudden, it means that all your supposedly secure communications over last two years can be read by the attacker. Ouch!

Spectacular Failure #2 is a more recent one. In 2013, it has been found that Java class SecureRandom as it was implemented on Android, was not exactly secure and often generated the same “randoms” (much more often than with probability of 2^-n which we should expect). This, in turn, has caused a vulnerability which allows to extract private keys used to sign Bitcoin wallets (see, for example, [StackExchange] for explanation). From non-technical point of view – this bug in RNG allowed to steal all the money in Bitcoin wallets… Double Ouch!

Numerous Problems in a Real-World Game-Critical RNG

Spectacular Failure #3 is probably the most interesting one for several reasons. First of all, it is not about cryptography and is directly applied to games. In addition, it illustrates several very typical bugs in custom-made random generators.

This spectacular failure is all about the RNG which was used by Planet Poker, the company which used to run real-money(!) games at the time; while it happened around 2000, the bugs made there are still very typical these days. The story is very-well described in nauseating detail in the original article [Cigital], so I will just provide a short list of Big Fat Mistakes Planet Poker gamedevs have made with their RNG:

First and foremost, they used a pseudo-random generator initialized ONLY WITH SYSTEM TIME (they used the one with a millisecond precision). Moreover, they were doing it FOR EACH AND EVERY HAND(!!) “ attackers were able to know the cards of their opponents at the table (!!!) [Cigital]. In short – attackers were able to know the cards of their opponents at the table (!!!). The idea behind the attack was simple – as soon as we know when the RNG is seeded, and can synchronise our clock with the server clock, we have just a few hundreds of thousands of potential seeds. Then, when we see three cards on the flop (and have our own 2 cards too), we can run these few hundreds of thousands of potential seeds against the RNG, and to find the seed which would generate this flop and our own cards too. This matching seed defines all the cards of the opponents. They were using a pseudo-RNG with a merely 32-bit internal state to produce 52-card decks of cards. However, a properly shuffled 52-card deck can have 52! ~= 2^226 different outcomes. And as pseudo-RNG with a 32-bit state can possibly have only 2^32 different patterns, it means that they would never get a vast majority of the potentially available hands (!) They had a flawed shuffling algorithm which has caused a flaw in statistical shuffling distribution (see [Cigital] for explanation) And they had an off-by-one error (a silly one, but it has also caused a skew in statistical distribution, see [Cigital]). One thing AFAIR not mentioned in [Cigital] but still affecting distribution: when you have N bits, obtaining a random number in range of 0 to M where M is not 2^N -1, via taking a reminder, the random number is inherently skewed (!) – just because the number of states-you’re-converting (which is 2^N), is not exactly even for all the outcomes. Though admittedly it has MUCH milder effect than any of the flaws mentioned above, it is still a flaw. “ they DID publish their shuffling algorithm without making a closed review of it first On the other hand, it IS important to review your RNG algorithms, and MAYBE even to publish them. However, before pushing the button “upload code to your website”, you need to make sure that your RNG is really good. And you won’t be able to check it yourself (even if you’re a security pro) – just because it is YOUR code, it always feels better than it should; in other words, you “know” how it works instead of scrutinizing it. That’s why it is paramount to have a THIRD-party review. Ideally – by a different company specializing in security, though you MIGHT be able to get away with a completely different team within your own company.



BTW, if your game calls for a serious RNG, make sure to read [Cigital] – it provides a Very Good Lesson on “how NOT to implement your RNGs”.

What Qualifies as a Good RNG

“now as we’ve seen what a good RNG is certainly not, we can start thinking about what a good RNG is.Ok, now as we’ve seen what a good RNG is certainly not, we can start thinking about what a good RNG is. Well, unfortunately, you cannot really prove that RNG is good;3 the only thing you can really do, is to prove that RNG is bad.

In short – to perform a black-box test of your RNG, the best thing you can do is to run a series of tests looking for certain patterns within your output.

Testing bit stream – Marsaglia DieHard, NIST, and TestU01

In particular, at some stage most of RNGs out there produce supposedly completely random bit stream one way or another (even if you have an RNG which generates numbers rather than bit stream, it is usually trivial to make it generate bit stream instead – just request integers which are uniformly distributed from 0 to 2^N-1 inclusive).

“You should run BOTH of them if you’re using your RNG in an anywhere serious mannerAs soon as you have your supposedly random bit stream (a looooong one – usually in millions of bits), there is a way to get it tested for certain patterns to be present. In fact, there are two sets of tests which can indicate any problems with your bit stream. Three most useful of these tests are [Marsaglia.Diehard], [NIST], and [TestU01]. You should run AT LEAST TWO of them (better – all three) if you’re using your RNG in an anywhere serious manner (i.e. if it can affect the fairness of your games – or, if we put it in a different perspective – if your players are complaining about your RNG). Note though that interpreting results of these tests can be rather tricky (and may require certain understanding of stuff such as statistically expected outcomes and p-values).

Testing beyond bit stream – Chi Square

As we’ve seen above on the example of Planet Poker, it is very easy to make a mistake in converting your bit stream into the game events (these guys has managed to make four(!) separate mistakes on that way). In turn, it means that

even after your bit stream is good, your game events may still be skewed pretty badly

As a result, for RNG-critical games, I strongly insist on perform testing on game-level data too (i.e. after bit stream is converted to whatever-game-level-events you have – ranging from a shuffled deck to sequences of critical-hit decisions).

The simplest form of such analysis does not require a math degree, and can be done using good ol’ chi square goodness-of-fit test [Wikipedia.Chi.Square.Goodness.of.Fit]. Very briefly, the process, when applied to games and RNGs, usually goes as follows:

We’re gathering stats of our game-level events (ideally – both before production, and while our game is running in production). For example, if, like Planet, we have a game of poker, we can record all the decks dealt.

“ We’re making a hypothesis that distribution of our game-level events is as it should be according to rules of our game. st card dealt, probability of all the 52 cards is the same. 4 To test this hypothesis, we calculate “how many times each of the cards was dealt as the 1 st card of the deck”. In other words, we’re throwing our statistical events into different “bins”, and calculating the number of occurrences within of each “bin”. These numbers will be close, but almost-never will be exactly the same (neither they should). The role of chi-square test is to find out whether the observed deviations from completely-flat distribution are statistically acceptable or not.

card dealt, probability of all the 52 cards is the same. To test this hypothesis, we calculate “how many times each of the cards was dealt as the 1 card of the deck”. In other words, we’re throwing our statistical events into different “bins”, and calculating the number of occurrences within of each “bin”. These numbers will be close, but almost-never will be exactly the same (neither they should). The role of chi-square test is to find out whether the observed deviations from completely-flat distribution are statistically acceptable or not. For each card (from aces to deuces), we’re calculating a value of x = (O-E)^2/ E, where O is an observed value of number of occurrences for this card, and E – is an expected value (which, under our hypothesis of flat distribution, is the same for all the cards, and equal to the total-number-of-cards-dealt / 52). Then, we calculate the sum of all x, which becomes the chi-square value for our experiment.

Then we take a look at the table such as [Chi.Square.Critical.Values], and look for a row with an appropriate number of bins “Degrees of freedom” a.k.a. ν (for equal distribution of poker cards, we’ll be looking for 51 “Degrees of freedom”, as a number of bins = 52 minus 1, as we’re calculating one parameter – the average – from the data itself).

Therefore, if we find that for our test of equal card distribution, if our calculated chi-square value is below 38.6 – then “Probability less than the critical value” is less than 10%, i.e. our hypothesis that everything is fine, stands with probability of 90%. And if our calculated chi-square value is above 65.4 – then “Probability less than the critical value” is over 90%, i.e. our hypothesis that everything is fine, stands with probability of less than 10% 🙁 . “ In science, the “magic number” to challenge hypothesis is accepted to be 5%



Of course, it is by far NOT the only statistical test which is possible to run, but it is the one which can (and IMO SHOULD) be performed yourself. However, for Really RNG-Critical Environments I strongly suggest to hire a company which will be able to perform a much more sophisticated analysis for you.

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.