

Author: “No Bugs” Hare Follow: Job Title: Sarcastic Architect Hobbies: Thinking Aloud, Arguing with Managers, Annoying HRs,

Calling a Spade a Spade, Keeping Tongue in Cheek

We continue our article on implementing network part of your game engine. Our today’s topic is about using UDP for game engines.

Previous parts:

Part I. Client Side

Part IIa. Protocols and APIs

Part IIb. Protocols and APIs (continued)

Part IIIa. Server-Side (Store-process-and-Forward Architecture)

Part IIIb: Server Side (deployment, optimizations, and testing)

Part IV: Great TCP vs UDP Debate

After reading Part IV, you’ve hopefully decided what do you need to use – TCP or UDP (or, [Stevens] forbid, both). This Part V is for those who needs UDP (and we’ll cover TCP specifics in the next Part VI).

Upcoming parts:

Part VI. TCP

Part VIIa. Security (TLS/SSL)

Part VIIb. Security (concluded)

36. DO use UDP packets as „fire and forget“ – OR use existing „reliable UDP“ library

As it has been mentioned in Part IV, UDP works the best in either “fire and forget” scenarios, or using existing “reliable UDP” library.

“If the state of the player can be conveyed by a pair of (X,Y) coordinates, and you're sending packets containing X and/or Y when one of them changes - then even if one packet is lost, the next one will fix the situation and will provide the right coordinates.“Fire and forget” scenarios are those where you can “fire” a packet and forget about it, not caring if each single packet is delivered at all. It needs to rely on some kind of recovery from (single or multiple) packet loss; this recovery is provided by subsequent packets. “Fire and forget” has been observed to work well either in VoIP applications, or in game applications where every next packet contains all the relevant information, and where even if a packet is lost, the next packet (potentially different from the previous one) will correct the loss. For example, if the state of the player can be conveyed by a pair of (X,Y) coordinates, and you’re sending packets containing X and/or Y when one of them changes – then even if one packet is lost, the next one will fix the situation and will provide the right coordinates. As long as your game can work under similar conditions – you’re fine with “fire and forget”, and are fine with using UDP (especially if you also have sub-second timing requirements, see Part IV for details). Note that this approach is quite similar to Unity3D’s “unreliable state synchronization”. Further note that it is generally better to send both coordinates even if only one of them has changed (to avoid situations when one coordinate is updated, and another one is very stale), and that it is generally better to re-send current coordinates at least every N ms (where N depends on specifics of your game) to avoid “very stale” coordinates to be stuck “forever”. Importance of the latter one is greatly increased over the Internet, where packets are often lost in bunches: i.e. you often have several packets lost in a row, and then situation is back to normal.

“Reliable UDP” is all about implementing a reliable channel over unreliable UDP. As it was mentioned in item #29 of part IV, implementing reliable UDP yourself is a Big Pain in the (ahem) Neck; there are lots of caveats, and the testing is quite complicated and very time consuming. As a result, I do not recommend to do it yourself; instead, you should use one of the existing libraries; there are three of them which are quite popular ([Enet], [UDT], and [RakNet]), and while I personally didn’t use any of them, both look much more solid than any of us can write without spending at least several months on it.

So, there are these two approaches for UDP-based games, but can we use both of them? The good news is yes, you can combine “fire and forget” and “reliable UDP” approaches, sending certain time-critical things as “fire and forget”, and sending mission-critical-but-not-time-critical things over “reliable UDP”. Not so good news is that such a combination needs you to be quite careful, and you should consider scenarios such as “what will happen if ‘fire and forget’ packet sent before ‘reliable UDP’ one, will arrive after it?” (which is perfectly possible with UDP, and will happen sooner rather than later, in a real-world over-the-Internet operation, see also item #37a below).



37. DO take Lag and Jitter into account

“RTT is essentially a sum of latencies in both directions (from A to B and from B to A), so if latencies in both directions are about the same (which is usually the case), one-way latency is roughly half of RTT.In general, lag and jitter are not specific to UDP, and manifest themselves for absolutely any Internet connection. However, as we’ve discussed in Part IV, the main reason to using UDP is time constraints, so for UDP-based games Internet lag usually becomes much more of a problem than for TCP-based ones. For network connections, lags/latencies are usually measured in terms of Round-Trip Time (RTT); RTT is essentially a sum of latencies in both directions (from A to B and from B to A), so if latencies in both directions are about the same (which is usually the case), one-way latency is roughly half of RTT. For example, for the trans-atlantic connection, typical round-trip times (RTTs) are on the order of 100ms (or more); moreover, it cannot possibly become drastically better: for the London-to-New-York connection, the distance is 3’459 miles, and even if the signal will travel at the speed-of-light-in-vacuum and without any delays on intermediate amplifiers and routers, it cannot possible take less than 36.9ms.

Jitter Packet delay variation is the difference in end-to-end one-way delay between selected packets in a flow with any lost packets being ignored. The effect is sometimes referred to as jitter, although the definition is an imprecise fit.— Wikipedia —Another problem to be aware of a so-called “jitter”. When you’re sending several packets separated by equal intervals (say, 50ms), the delay for each individual packet may (and often will) be different, so on receiving side intervals between the packets may look, for example, as (40,47,56,50,54,43). For the Internet (and without any proximity considerations), you should expect typical jitter of single-digit milliseconds, but your app shouldn’t crash even for 100ms+ jitter (which is quite rare, but happens).

So, what to do about these two Bad Things? Unfortunately, there is no good answer 🙁 . One obvious thing is to restrict players for one single server to a local area, where these effects, while present will be much less pronounced. Unfortunately, these solutions are very far from being universal. For your usual LAN, you can get 1ms delay with sub-1ms jitter (NB: for wireless LAN, it might be significantly worse, though it depends), but having your game as LAN-only is way too restrictive for most of the modern games.

An intermediate solution might be to have game servers on several of so-called “rings” or “Internet exchanges” (such as LINX in London, or AMS-IX in Amsterdam, see list of exchanges in [WikipediaInternetExchanges]). For those exchanges which are city-local, you can expect numbers such as (very roughly) 10ms lag, and 1-2ms jitter, though be prepared for sharp (like 2x) increases in lag and jitter, which may last for a few hours; these sharp increases may be either regular (such as “every evening”), or irregular. For non-city-local exchanges, numbers will depend on physical distance across the exchange, but will generally be still much better than for connections between the same cities, but for ISPs which are not on the exchange.

“If you need a globally-reachable server with single-digit-ms lag for all the players, it is simply not possibleIf you need a globally-reachable server with single-digit-ms lag for all the players, it is simply not possible 🙁 (at the very least, because of the speed of light limitations). While you might save a few ms by having servers in several datacenters across the globe, and having a dedicated line between these datacenters, the gains will be rather small and probably not worth it (except, maybe, for stock exchanges); in any case, if you have two locations separated by D kilometers, you cannot possibly get RTT less than 2*D/c = D/150 milliseconds (where c ~= 300’000 km/s is a speed of light in vacuum).

And last but not least: to make sure that your app handles these nasty things properly, start testing as soon as possible (including testing over trans-atlantic connections, see item #24 in Part IIIb for details). In addition, UDP allows you to test simulating packet loss manually (see item #40 below), which you should also start ASAP.



37a. DO take Packet Reordering into account

Yet another problem that you need to care about when using UDP, is that even correct packets may arrive in a wrong order. That is, it is perfectly possible (though not too likely) to send packet A, then packet B, but to receive packet B first and packet A second. With TCP, such situations are handled “under the hood” by the TCP stack (so you won’t receive portion of data carried by packet B, until packet A arrives), but with UDP, you’re on your own 🙁 .

Generally, for “fire and forget” scenarios you should consider implementing some kind of intra-packet ‘counter’ field, incremented for each packet sent, so out-of-order packets can be silently discarded on receiving side. As reordering window is usually quite limited, you may usually limit your ‘counter’ field to 2 bytes, though you need to handle wraparounds in this case.



38. DO restrict your datagrams to approx. 500 bytes of payload

Unlike TCP, UDP is datagram-oriented protocol. One of implications of it is that when writing and implementing UDP protocol, you need to care about datagram size, and dealing with datagram size is an annoying thing.

“The second school of thought (which I tend to agree with, at least when it comes to the uncontrolled Internet) says that it is much better to make sure that IP-packets-which-carry-your-UDP-datagrams, are not fragmented.Actually, there are two schools of thought about maximum UDP datagram size. One school of thought says: “strictly speaking, the maximum UDP datagram size is 65535 – 8 (UDP header) – 20 (typical IP header) = 65507 bytes (when taking into account maximum IP header, it lowers to 65467 bytes). You might say: “hey, this is more than I will ever want, great!“. However, in practice it is not that easy. The problem is that such large packets are likely to be above so-called MTU (=Maximum Transfer Unit), and to transmit them, IP stack will need to fragment-defragment them. The problem with it is that fragmented IP packets tend to cause quite a lot of trouble – including certain firewalls disallowing them for security reasons, them having higher chances of being dropped in case of router overload, and having somewhat longer packet travel times under certain circumstances (and as we’ve gone for UDP because of the interactivity, this is not a good thing).

The second school of thought (which I tend to agree with, at least when it comes to the uncontrolled Internet) says that it is much better to make sure that IP-packets-which-carry-your-UDP-datagrams, are not fragmented. To achieve this, in 99.(9)% of cases it is sufficient to restrict your IP packet (that is, IP-header + UDP-header + payload) to 576 bytes (NB: strictly speaking, 576-byte IP packets, are not strictly guaranteed to traverse the Internet without being fragmented, but in practice it is an extremely safe number to avoid any problems). To fit into that magic 576-byte MTU and avoid fragmentation, your UDP payload should be restricted by 576-60-8=508 bytes. (note that this calculation accounts for maximum IP header size, which is 60 bytes; in practice, typical IP header is 20 bytes, but IMHO those extra 40 bytes you can more-or-less expect to go without fragmentation, are rarely worth the risk, though your mileage may vary).

If you really really need more than that, you might want to try maximum datagram size of 1400 bytes or so (aiming to fit into Ethernet frame of 1500 bytes with all the possible additional headers such as ADSL headers etc.); if you’re trying it – please let me know whether it worked for you or not.



39. DON’T trust UDP checksums for Critical Data

“You've received an UDP packet from a reliable source. Does it really mean that you've got the same packet which has been sent by sender? Not reallyYou’ve received an UDP packet from a reliable source. Does it really mean that you’ve got the same packet which has been sent by sender? Not really, let’s see why it is the case.

Integrity of UDP packet is kind of “guaranteed” by UDP checksum; the problem is that UDP checksum is only 16 bits long. It means that for every 216=65536 corrupted packets (and assuming that corruption randomly changes all the bits), you can expect one to go through without corruption being detected (in practice, corruptions are not random, so it is not that bad, but corrupted-packets-with-correct-checksum do happen).

It means that if you’re transferring some Really Critical Data over the UDP – you SHOULD add some kind of checksum beyond UDP checksum. The question “what kind of checksum to use” is a bit tricky and the answer is “it depends”.

SHA The Secure Hash Algorithm is a family of cryptographic hash functions published by the National Institute of Standards and Technology (NIST) as a U.S. Federal Information Processing Standard (FIPS)— Wikipedia —As one example, you can use one of SHA crypto hash functions (truncated, say, to 64 bits) to make a 64-bit=8-byte checksum (bringing probability of corrupted packet going through, to 2-64=5e-20, which can be seen more or less as “never happens on your lifetime”). On the other hand, SHA are relatively expensive functions (in terms of CPU usage). If this is concern, you may want to consider CRC-32, which is certainly better than the default 16-bit UDP checksum. However, CRC-32 has probability of allowing corrupted packet with 2-32=2e-10 per corrupted packet, so for with huge number of packets (such as billions per day), corrupted packet might still slip past CRC-32, so it all depends on “how critical your data is” (i.e. what will happen if data gets corrupted – nothing? just one client crashes? the whole server crashes? you’re out of business? IRL nuclear war starts?) Fortunately, there is CRC-64, which should be good enough for most of randomly corrupted packets, but see some further considerations below.

In addition to considering random data corruption which we’ve discussed above, when speaking about really critical data, you should always consider an intentional attack. In other words, you should always think in terms of “what will happen if somebody corrupts it intentionally?” If the answer is “something really bad may happen”, then you probably need some kind of authentication, provided by cryptographic methods (for example, TLS over “reliable UDP”, or [DTLS] for “fire and forget” packets, for further details see item #56 in Part VIIa). And (as a compensation for all the authentication/encryption hassle) – as soon as you have proper authentication/encryption, it also protects you from unintentional data corruption, so you can consider it as “perfectly safe” data without any additional checksums (more specifically – typical checksum for modern protected channels is at least 128 bit long, and 2-128 is 3e-39, which certainly qualifies as “will never happen”; from a bit different point of view – if crypto-community is satisfied that nobody will be able to break it even intentionally and even having NSA-scale resources – you can be reasonably sure that random corruption won’t slip in).



40. DO Test while Simulating Packet Loss, Jitter, and Packet Reordering

If UDP has other advantages over TCP in addition to sub-second interactivity, it is an ability to test your app while simulating packet loss, jitter, and packet reordering easily. And whenever you can test something easily with network stuff – you should do it (see also item #24 in Part IIIb).

Implementing it is quite simple – you can more or less easily write a proxy UDP server, which will merely forward the data, while introducing random packet loss, jitter, corrupt packets, packet reordering and so on. While this testing doesn’t eliminate the need to test your app over the Internet (as described in item #24 in Part IIIb) – it is much simpler to test things in a simulator, and will allow to analyze in detail a much wider range of issues which might (and will) happen to your game engine or app.

41. DO Try to Comply with RFC 5405

“For an UDP application with many millions of users, it is quite easy to become a 'resource hog' and to eat too much of the Internet resources, eventually hurting overall Internet performanceFor an UDP application with many millions of users, it is quite easy to become a “resource hog” and to eat too much of the Internet resources, eventually hurting overall Internet performance. TCP deals with it itself, so for a TCP application with the same number of users it is much less of a problem than for UDP. That’s why they’ve wrote a special RFC [RFC5405], which describes DO’s and DON’Ts for a UDP app from the point of view of “not harming the rest of the Internet”.

It lists quite a few requirements (including payload size restrictions similar to those we’ve discussed in item #38 above), and there is a certain possibility that you won’t be able to comply with some of them due to your own requirements (for example, implementing PMTUD is probably way too much, using UDP checksum, in case if you have the encryption, doesn’t make too much sense, and exponential back-off may go against timing requirements imposed by your game).

However, you at least should read this RFC, understand what they mean, and try to comply with those rules which don’t conflict with the interests of your players. In other words:

Before breaking the rule – at the very least you should know that you’re breaking it, and why you’re breaking it.

42. DO consider Compression (especially if your traffic costs are significant)

Huffman coding The output from Huffman's algorithm can be viewed as a variable-length code table for encoding a source symbol... The algorithm derives this table from the estimated probability or frequency of occurrence for each possible value of the source symbol.— Wikipedia —For UDP, compression is usually ruled out outright, but sometimes it might be a good idea. First of all, if you have “reliable UDP” – take a look at discussion of compression in item #49 in upcoming Part VI; whatever compression works for TCP – will likely work for “reliable UDP” too.

For “fire and forget” UDP, as a rule of thumb, you won’t be able to use stuff such as ZIP (bzip2/…) – in general, any LZ77- or LZW-like algorithm with a lookup for the stuff-which-has-been-recently-transmitted, isn’t likely to be efficient for unreliable UDP datagrams (in essence, you won’t be able to refer beyond one single datagram, as any other datagram might be unavailable on receiving side). However, if when taking a look at your datagrams, you can see that many of the bytes being the same (like: “there are lots of zeros there“) – using good old Huffman coding (or it’s close cousins, such as much slower arithmetic coding, or faster Huffman-like encoding part from [LZHL]), with pre-defined and pre-shared frequency tables (obtained during measurements of your real-world uncompressed traffic and pre-popuated to both the client and the server), might get you as much as 15-20% gain (in theory even more, but 15-20% is more or less realistic upper bound).

While this technique is rarely worth the trouble when starting the project, at some point down the road it might become a good thing to have. This is especially true if you’re paying per byte and your traffic costs are significant.

One further note of caution: as a rule of thumb, you SHOULD NOT rely on compression to satisfy limits on payload size (see item #38 above); all the gains from lossless compression are average gains, and pretty much any compression may increase packet size in some rare cases. Of course, if you can prove that, say, certain number of bytes in a packet are always zeros – then you may be able to derive a guaranteed gain from the compression.

To be continued…

Today we’ve discussed UDP-specific issues for game engines. Stay tuned for Part VI, TCP. Don’t worry – there are only two parts to go beyond this one 😉 .

EDIT: The series has been completed, with the following parts published:

Part VI. TCP

Part VIIa. Security (TLS/SSL)

Part VIIb. Security (concluded)

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.