

Author: “No Bugs” Hare Follow: Job Title: Sarcastic Architect Hobbies: Thinking Aloud, Arguing with Managers, Annoying HRs,

Calling a Spade a Spade, Keeping Tongue in Cheek

[[This is Chapter 13(c) from “beta” Volume IV of the upcoming book “Development&Deployment of Multiplayer Online Games”, which is currently being beta-tested. Beta-testing is intended to improve the quality of the book, and provides free e-copy of the “release” book to those who help with improving; for further details see “ Book Beta Testing “. All the content published during Beta Testing, is subject to change before the book is published.

To navigate through the book, you may want to use Development&Deployment of MOG: Table of Contents.]]

Encrypting UDP

Why Encrypt??

“Yes, you DO need to encrypt your UDP traffic. And no, using UDP is NOT a valid excuse to skip encryptionYes, you DO need to encrypt your UDP traffic.1 And no, using UDP is NOT a valid excuse to skip encryption. Reasons for encrypting your traffic are numerous:

classical reason for encryption is to prevent eavesdropping and session hijacks by the third party while it is Really Important for stock exchanges (and maybe for casino-like games), it is rarely considered as a big concern for most of the other games out there on the other hand, if some artifact within your game costs $20K+ of real-world dollars – you should start thinking about it seriously. In these cases, game account becomes as important as (and for quite a few people out there – much more important than) a bank account. Which carries all the security implications of the bank, including (but not limited to) encryption.

Leaving your traffic unencrypted facilitates proxy bots, and proxy bots can be made next-to-impossible to detect by other means 2

Leaving your traffic unencrypted exposes your protocol to cheaters. While hiding internals of your traffic is security-by-obscurity (which is not real security), on the other hand, as it was discussed in Chapter II, in lots of cases we don’t have any better protection against cheaters 🙁 .

In short –

not only encryption protects your players from classical attacks, it also protects your games against cheaters too.

As a side bonus, with proper encryption you can be sure that network errors which corrupt your packets are not going undetected (and with unencrypted UDP those 16-bit UDP checksums can detect only one out of 60’000 in-transit corruptions, which means that with all those millions of packets you’re sending out each second, some corruptions WILL go undetected, causing all kinds of trouble).

“On the other hand, you need to keep in mind that having encryption does NOT eliminate the need to sanitize your data at the very least on the Server SideOn the other hand, you need to keep in mind that having encryption does NOT eliminate the need to sanitize your data at the very least on the Server Side of things (even with encryption in place, Client can be hacked to send your Server all kind of malicious data – from garbage to fakes).

Isn’t Encryption Damn Expensive?

The next question is the following: well, we DO need to encrypt, but can we afford it? Won’t adding the encryption kill our servers CPU-wise? While we’ll discuss this issue in detail later in Chapter [[TODO]], for now we’ll need just a few very basic observations.

In short, there are two main ways to encrypt things: (a) using symmetric key (a.k.a. “symmetric crypto”, AES-128 and AES-256 being among the most popular ones), (b) using asymmetric keys (a.k.a. “public crypto”, with RSA-2048 still being quite popular in this department).

Symmetric crypto is damn cheap; for x86, it is usually of the order of “100+ Mbytes/second per core”.3 It means that if your server is serving 1’000 players sending 1Kbyte-sized packet 20 times a second to each of them (i.e. quite respectable 160Mbit/sec), symmetric crypto will cost you less than 1/5th of one CPU core.As on a usual “workhorse” server (see Chapter [[TODO]] for further discussion), there is currently around 8-12 of such cores, overall impact of symmetric encryption in such an example scenario amounts to about 2% of additional CPU load; if you ask me, 2% increase in number-of-servers-you-need-to-run is certainly not much to protect your players both from eavesdropping etc. and from cheaters.

“On the other hand, public crypto is MUCH more expensive, but fortunately, it is needed only to establish connectionOn the other hand, public crypto is MUCH more expensive, but fortunately, it is needed only to establish connection (and as a result of such public-crypto-while-establishing-connection, a symmetric key will be generated for subsequent symmetric crypto). Specific numbers vary greatly from one algorithm to another, but as a ballpark number, with TLS/DTLS we can take an estimate of 1’000 connections/second/x86 core.4 So, for our 1000-player server example above, even if all of your players got disconnected and then need to reconnect – you’ll need just about 0.1 second (using all your cores) to connect all of them. [QUIC.Crypto] protocol establishes connection at a significantly lower cost than TLS: for QUIC it is very roughly ~10x better, i.e. with QUIC we can get 10’000 connections/second/core. Note that while you MIGHT think that TLS’s 0.1 sec-to-reconnect-all-your-players is already good enough – we’ll see a bit later that connection establishment costs are VERY important from DDoS point of view (see “Resilience to Crypto-DDoS Attacks” section below).

Contenders for UDP encryption: DTLS and QUIC

In practice, there are two protocols which can currently be used for practical UDP encryption: DTLS (using, for example, [OpenSSL]) and QUIC (using [libquic]). While other UDP-oriented protocols (such as SNEP/SPINS, CurveCP, or MinimaLT) are described in literature, to the best of my knowledge they lack readily-available-and-supported libraries,5 and writing your own crypto-related library usually qualifies as a Pretty Bad Idea for game development.

Now, let’s compare DTLS and QUIC. I won’t go into a lengthy discussion comparing them from theoretical security perspective; much more important for our purposes is an observation that

QUIC is inherently stream-oriented, and we cannot use it “as is” for fast-paced state sync stuff 🙁

(with “fast-paced updates” defined in section “Fast-paced Updates vs Slow-Paced Ones” above).

Potential Hack: using QUIC streams to send independent packets

As QUIC supports streams, and even more importantly, creating a new stream within the same connection, seems to have pretty much zero-cost (as streams are created implicitly), we MIGHT try to use the following trickery to implement unordered-and-unreliable-UDP over QUIC (note that I didn’t do it myself, so it is only a speculation; also I have no idea whether current [libquic] supports it):

on the Server Side: each new fast-sync packet goes into new QUIC stream each such stream MUST have new stream id (even and monotonous per QUIC spec) strictly speaking, care should be taken not to exceed 2^32 limit on stream IDs (though with 20 network ticks/second, it is going to take around 4 years to exhaust the stream ID space) on sending next fast-sync packet (or maybe one packet later), Server SHOULD also abnormally terminate the previous connection

on the Client-Side: on receiving such fast-sync packet-in-a-stream, Client: processes the packet (updating Published Game World State) abnormally terminates respective stream



Once again: I have no idea whether it will work in practice – but feel free to try :-).

Hybrid Implementations

Other alternatives to using DTLS-for-all-communications include:

Using DTLS for fast-paced state-update stuff, and QUIC for slow-paced stream-based updates “ in spite of having two crypto libraries, this MAY make some sense in light of QUIC being more than just a crypto library

Using QUIC for slow-paced stream-based updates, and skipping encryption for fast-paced updates completely Note that this is NOT a really safe option from security perspective, so whenever substantial real money is involved, it SHOULD NOT be used. However, for quite a few games out there, it will work. Potential attack here is about attacker modifying the (unencrypted/unsigned) data coming to the victim’s Client, therefore modifying the world which victim can see; on this way many nasty things become possible. If substantial money is at stake – such attacks CAN be mounted in practice (especially in certain environments such as uni campus). If (in spite of my advice against it) you choose to go this way, MAKE SURE AT THE VERY LEAST to encrypt ALL the data going from Client to Server, and to encrypt ALL the data which is not 100% public (both these things are really important for several reasons!); if your Client-to-Server data or private data doesn’t fit well into QUIC reliable streams – tough luck, it means that you need to use DTLS.



Also keep in mind that for games such as stock exchanges, and for all the credit-card processing, it is usually significantly easier to convince auditors (in the latter case – PCI DSS auditors) that you’re fine security-wise, if you’re using TLS/DTLS (any other protocol will cause raised eyebrows, and in the best case you will need to justify why you’re deviating from what is usually deemed “industry best practices”).

Resilience to Crypto-DDoS Attacks

When speaking about security, it is always about various attacks. For (properly) encrypted connections, dealing with attacks after connection is established, is usually not too difficult; however, DDoS attacks aiming at the connection handshake, become even more easy to mount, after we added encryption 🙁 . I’m currently speaking mostly about “crypto-DDoS attacks”, when attacker is sending garbage within a properly formatted crypto request message and thus causes server to spend lots of time validating that the garbage is not really valid (see, for example, Pushdo SSL DDoS attack [Lewis12]). There is one positive side with this class of attacks though – amplification attacks (including very popular DNS amplifications attacks) usually don’t apply (phew); in particular, it means that 10GBit/s crypto-DDoS attack can count as “rather sizeable” one.

An Example crypto-DDoS Attack

Let’s do some example math. Let’s consider a moderately sized 10GBit/s non-amplified attack on a 100-server MOG (MOG handling like 100K players simultaneously). Let’s assume that our MOG system performs balancing (such as hardware Load Balancing or Front-End Servers, see Chapter VII for further discussion); also let’s assume that the attack is performed by 10000 PCs (each emitting 1Mbit/s on average), each PC having 4 cores on average. Let’s further assume that our ISP can handle these 10GBit/s for us.

“If our handshake packet is 50 bytes at IP level, it means that 10GBit/s attack can cause us ~18M connection requests/second.Now let’s see what it means for our game servers. If our handshake packet is 50 bytes at IP level (which become 68 after adding Ethernet header+CRC), it means that 10GBit/s attack can cause us ~18M connection requests/second.6

As noted above, the connection request requires public crypto, and with DTLS 1.2, we can process around 1000 of such connection requests per second per core. Let’s note that we cannot really dedicate ALL our cores to handling connection requests (the game should go on even when under attack), so let’s assume that we can dedicate one core per server to DDoS handling. It means that our 100 servers will be able to handle mere 100’000 connection requests/second (and we need 180x more to withstand the attack).

Such attacks can be a very unpleasant thing (and limiting incoming connections per IP is rarely an easy task for UDP, so DDoS protection by providers might or might not help in this regard, as thresholds may be too low to trigger protection at that level), so let’s see what we can do about it. Even using optimized algorithms/handshakes (such as those in QUIC) would make it only 10x better for us (still leaving us 18x short).

“Proof of Work” to the Rescue

One way to deal with it is to allow our Server to request Clients to perform some “proof of work” processing7 before we even start analyzing Client’s connection request. Under normal operation, there should be no “proof of work” requested, but if Server is under crypto-attack (which can be detected by time that Server spends on processing connection requests) – it should start requesting “proof of work” from all the Clients which try to connect.

“If we can force all the Clients to make some work which takes ~0.4 seconds of CPU core time to compute – then all 10’000 attacking PCs will be able to make only 100’000 requests/second, allowing us to withstand the attack.If we can force all the Clients to make some work which takes ~0.4 seconds of CPU core time to compute – then each of the 4-Core attacking PCs will be able to issue only 10 requests/second, and all 10’000 attacking PCs will be able to make only 100’000 requests/second, allowing us to withstand the attack. Even better, we don’t even need to calculate exact costs of work – our Server should simply increase amount of work requested while it is under crypto-attack, up to the point until it becomes not-so-affected by the attack. And BTW, if the attacker can see that the attack doesn’t affect you – he usually goes away fairly quickly.

The cost we’re paying for this kind of protection is that we’re causing Clients (including legitimate ones) to connect more slowly while the server is under attack; however, delay of 0.4 seconds is pretty much nothing (and I would argue that even 100x-larger 40 second delay is still better than usual outcome of a DDoS, which is “being unable to connect for hours”).

[[TODO: proof-of-work in TLS1.3?]]

Implementing “Proof of Work” on top of DTLS

Actually, an idea to use “proof of work” to mitigate DDoS attacks to oblivion is certainly not new; it is known at least since [Juels99] and is a part of at least [MinimaLT] protocol.

Below I’ll describe one of the ways of implementing “proof of work” (with an ideology similar to “puzzles” in MinimaLT) on top of the existing DTLS protocol (and on top of a 3rd-party DTLS library):

On the Server-Side, we have a very separate secret key (let’s name it “PuzzleKey”); it MUST be completely independent from all the other keys (for example, taken as a crypto-quality random number) and SHOULD be regenerated from scratch at least on each server restart. An interesting detail is that this key does NOT need to be shared with any other party (so it MUST stay internal to our server)

On the Server-Side, let’s “intercept” all the datagrams sent by the Server (i.e. get output of your DTLS library before it gets sent to the UDP socket) – and find all the DTLS records known as HelloVerifyRequest ones (yes, this CAN be done without breaking encryption or knowing the keys). “ HelloVerifyRequest itself is intended to prevent DDoS, and it does prevent a certain class of DDoS attacks, but not a cryptographic DDoS If we see a HelloVerifyRequest record, we modify the datagram-which-carries-HelloVerifyRequest by adding three fields: Current Server time (it is better to use something along the lines of std::steady_clock here) Challenge (128 or so crypto-random bits should do nicely; 64 crypto-random bits or so should probably do too in practice, though it is a bit less obvious; I’d rather NOT go below 64 bits) Amount-of-work (8 bits will be more than enough) The value of this Amount-of-work depends on the current state of the crypto-attack on the server; if there is no attack detected – Amount-of-work should be 0, if there is an attack which affects the Server – Server SHOULD start incrementing it slowly. When not under attack – Server SHOULD reduce it back (all the way down to zero). ”MAC” In cryptography, a message authentication code (MAC) is a short piece of information used to authenticate a message—in other words, to confirm that the message came from the stated sender (its authenticity) and has not been changed in transit (its integrity). — Wikipedia — For DDoS protection purposes, we need the fastest MAC possible, and from my experience HMAC is somewhat slower for this message sizes (at least on x86 platforms) than CBC-MAC (prepending the tuple length in bytes to our tuple before calculating MAC to make CBC-MAC secure) or CMAC/OMAC On the Client-Side we “intercept” all the datagrams received by the Client right from the UDP socket (that is, before they reach our DTLS library), extract this (Challenge, amount-of-work, MAC) tuple out of it, and put it aside for a little while. After the extraction we strip all the additional data from the datagram (so that the DTLS library on the Client gets the same message as it was sent by DTLS library on the Server) When Client DTLS library responds with a ClientHello record – we again “intercept” the datagram, adding the following fields to it: Current-Server-time (simply copied from server request) Challenge (also copied from server request) Amount-of-work (also copied from server request) MAC (also copied from server request) Puzzle solution: number N, which has SHA-1(N||Challenge) 8 9 to have first Amount-of-work bits from it as zeros. Again on the Server-Side, we “intercept” all the datagrams coming from UDP socket, and are looking for the one with a ClientHello record – and extract (Server-time,Challenge,Amount-of-work,MAC,N) tuple. Then, before passing the datagram to the Server-Side DTLS library (which would require public crypto and therefore would incur substantial CPU costs), we: Check Server-time for sanity (it should always be less than our current Server time, and should be within some reasonable time window of current Server time – in other words, if the message goes back for 24 hours, something is probably wrong here) If this check fails – send a special datagram back (on receiving such a special datagram, Client should re-establish connection from scratch) Check that MAC field extracted from the record, does authenticate tuple (Challenge,Amount-of-work,cookie-from-ClientHello); the check MUST be done using PuzzleKey. This is a symmetric-crypto (=”very cheap”) operation. 10 Check that number N does satisfy “SHA-1(N||Challenge) has first Amount-of-Work bits as zeros” condition. This is a SHA-1 operation, which is very cheap too. Only if all the checks are ok – we’ll strip the extra fields (so that “datagram looks exactly as it was emitted by Client-Side DTLS library”) and pass the datagram to our Server-Side DTLS library



“The trickery described above effectively acts as an additional DDoS-protected transport layer for DTLS; in other words, it doesn’t change anything from DTLS point of view (which means that DTLS security remains perfectly intact)The trickery described above effectively acts as an additional DDoS-protected transport layer for DTLS; in other words, it doesn’t change anything from DTLS point of view (which means that DTLS security remains perfectly intact); it merely sends extra challenges (when Server feels that it is under attack) and filters out packets coming from those attackers who were careless enough to skip doing ‘proof-of-work’.

The idea here is that while there can be two different types of crypto DDoS attack (calculating Puzzles and not calculating Puzzles) on such protected-DTLS, handling both of them is much cheaper than handling an attack on an unprotected DTLS.

If the attacker chooses to calculate Puzzles (and solving a Puzzle is 2^Amount_of_work more expensive than checking it) – then we’ll be able to mitigate the DDoS attack at the cost of each Client performing 0.4 sec worth of CPU core calculations (with 2016 CPUs, very roughly corresponding to Amount-of-Work = 19 or so). If the attacker decides to flood us with fake ClientHello’s without solving the Puzzle – we’ll be performing only very cheap operations (such as one MAC + one SHA-1 calculation), and will be able to do (roughly) 500K such checks per second per core (or 50M checks/seconds using only a single core from all our 100 servers), which is above 18M packets/second which we need to survive our example crypto-DDoS.

As an added bonus, this kind of checks can be even offloaded to separate servers (and at least in theory – even to the servers within your DDoS-protection provider).

On the other hand, note that this additional layer is certainly not a silver bullet; for example, if all our attacking PCs have a GPU such as GTX Titan-X, they will be able to calculate our Puzzles at ~100x faster than CPU, which will force us to increase Client calculation times to about 40 seconds (that’s per core); even this would be better than not-being-able-to-connect-forever, but in reality it won’t be that grim for two reasons:

Fortunately, not all the PCs-forming-the-botnet are that powerful If your game is a PC-based 3D one, you yourself can use GPU to solve the “Puzzle”, reducing Client-Side connection delays by the same factor of 100x or so

Similar Protection for QUIC

Above we’ve discussed a protection from DDoS for DTLS; very similar implementation SHOULD be doable for QUIC, though once again – I didn’t do it myself, so all kinds of monsters can be trying to chase you along this route.

Protection from crypto-DDoS: do you really need it?

The trick to protect yourself from crypto-DDoS described above, is not that complicated, but will certainly take some time to implement. As a result, a reasonable thing to ask is “whether you really need to implement it in advance?”. Honestly, I do not have a firm answer to this question. On the one hand, when you don’t have such protection, crypto-DDoS attack can bring your system to the knees in no time (and protection by DDoS provider might happen to be insufficient). On the other hand, at least as of 2016 crypto-DDoS attacks are very uncommon. Whether somebody will mount a crypto-DDoS attack against your servers – well, you never know in advance.

“Personally, I prefer to think of it as of insurance - when I'm paying my premiums in hope that my money will go to waste.For high-profile games, I would suggest to play it safe and to implement it somewhere around “beta” stages of the game (as changing protocols during “live” game is usually significantly more complicated); OTOH, chances are that you’ll never need to use this feature. Personally, I prefer to think of it as of insurance – when I’m paying my premiums in hope that my money will go to waste.11

Common Encryption-Related Notes

When implementing encryption (whether over TCP or over UDP), there are several very important things to keep in mind; while a detailed discussion on these issues will follow in Chapter [[TODO]], here I will simply summarize the most important points out of it without going into explanations:

DO check Server-Side certificate on the Client To generate Server-Side certificate, DO run your own Certificate Authority and embed root CA certificate within your Client. DO NOT use root certificates which come installed into Client OS. NB: this is a security-by-obscurity feature, which is not needed for non-game apps. Also it does NOT apply to stock exchanges and alike games (more precisely – to games where there is no risk of Client being hacked). For detailed discussion, see Chapter [[TODO]] DO obfuscate root CA certificate as it is stored within your Client. NB: again, it is a security-by-obscurity feature, not necessary outside of games or for stock exchanges etc.

Regarding choosing a DTLS library: “ for not-so-security-critical games, I would say that it doesn't matter too much which (D)TLS library you're using. 12 and libraries which don’t support DTLS 1.2, and you should be fine. Personally, I have had quite good experience with OpenSSL (and no, Heartbleed did not change my positive take on OpenSSL); however, feel free to experiment with GnuTLS, 13 mbed TLS (former PolarSSL), and Botan. 14 Note that if you do NOT need UDP/DTLS – choice of TLS libraries becomes wider; see [[TODO]] section below for discussion For Really Security-Critical Games (such as stock exchanges) – I would try to make double-encryption (and yes – I’ve done it myself too). In particular, such double-layer encryption MAY be structured as follows: An “outer” layer of encryption would be just your usual transport-layer encryption (covering UDP or TCP which goes from Server to Client). An “inner” layer of encryption would be point-to-point encryption going from Server-Side Event-Driven Object to Client-Side Event-Driven Object. You MAY keep this layer optional (just for most-critical-messages) or all-the-time depending on your game. If you want to be Really Secure – use different (and Really Independent) libraries for each of these layers (and use different cipher suites too). Then, a bug in any of the libraries won’t hit your security too much. Even better security-wise, you MAY want to use different protocols for different encryption layers (like “QUIC for outer one and DTLS for inner one”); in this case even if there is a vulnerability in one of the layers, your most-critical-data will still be protected. As a side bonus, you will be able to brag about your system being double-encrypted While we’re at it: you MUST be 200% sure that none of the keys (or more generally, no encryption state) is ever shared between two encryption layers. If your encryption layers are not 100% independent – you can easily end up with a completely-insecure thing.

Which cipher suite to use: to certain extent it is a matter of personal choice, but as of beginning of 2016, I would pick something along the following lines: for not-so-secure games: something like TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256; that is, if your DTLS library supports it (and if not – try to pick something which is not too different, AND MORE IMPORTANTLY – something which doesn’t have anything marked insecure in [Wikipedia.TLS] 15 ) if using ECDSA, you MAY settle for 160-bit keys, for RSA – for 1024-bit keys (NB: we’re still speaking about not-so-security-critical games here). Yes, 224/233 ECDSA and 2048 RSA keys are better, but you need to double-check impact of DDoS attacks (and probably think about their mitigation) before going there. one note about ECDSA in our context: as long as we’re using downloadable clients, we can use pretty much any elliptic curve supported by our DTLS library. However, if you’re dealing with browser clients – you DO need a “compatible” ECC curve such as P-256 (or even better – have a fallback to good old RSA). for Really Security-Critical games: something like TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, plus some other cipher suite of your choice (see above about double-encryption). And yes, 224 ECDSA / 2048 RSA key length becomes the absolute minimum here (with 256+ ECDSA/3072+ RSA keys recommended, and careful choice of ECC curves recommended too).

“ Whatever library you're using, DO disable everything-you-don't-need (if possible - in compile-time) In particular, DO disable DTLS 1.0 (as you control both sides of communication, backward compatibility is not an issue) NEVER EVER touch with a 6-yard stick any cipher suite which uses ADH (=”Anonymous Diffie Hellman”, a.k.a. ANON_DH). 16 The same goes for AECDH (ANON_ECDH), RC4, and MD5. 17 DO disable all the cipher suites which you’re not going to use. Compile out (using #defines) whatever you can compile out, and disable in configuration whatever you don’t need, but cannot compile out.

For the Client-Side, DO link your TLS/DTLS library statically. It means NOT using TLS/DTLS which come with your OS. NB: once again, it is a security-by-obscurity feature, applicable only in game environments (which need to rely on it because no other protections are really available); moreover, in some of non-game environments such practice can be seen as detrimental to security.18

[[To Be Continued…

This concludes beta Chapter 13(c) from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”. Stay tuned for beta Chapter 13(d), describing optimizing TCP for game-like uses.]]

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.