

Author: “No Bugs” Hare Follow: Job Title: Sarcastic Architect Hobbies: Thinking Aloud, Arguing with Managers, Annoying HRs,

Calling a Spade a Spade, Keeping Tongue in Cheek

From the people that brought you the terror of Wednesday and the horror of Friday comes a nightmare that’ll make the rest of the week good…MONDAY!!! Garfield and Friends —

It’s Monday again, and here goes another part of our multi-mega-part article on network support for game engines.

Previous parts:

Part I. Client Side

Part IIa. Protocols and APIs

Part IIb. Protocols and APIs (continued)

Part IIIa. Server-Side (Store-process-and-Forward Architecture)

Part IIIb: Server Side (deployment, optimizations, and testing)

Part IV: Great TCP vs UDP Debate

Part V. UDP

After reading Part IV, you’ve hopefully decided what do you need to use for your game – TCP or UDP. Previous Part V was about UDP, and this Part VI is for those who needs TCP.

Upcoming parts:

Part VIIa. Security (TLS/SSL)

Part VIIb. Security (concluded)

43. DON’T assume that you get exactly one recv() call for each of your send() calls

“Relying-on-TCP-doing-exactly-one-recv()-call-for-each-send()-call is one of the most popular bugs in programs using TCPWith UDP, whatever data you push to a single sendto() call, you’ll get in a single recvfrom() call (well, you have no guarantee that it will reach the other side at all, or it can reach the other side more than once, but whenever it reaches the other side – each recvfrom() call on receiving side will exactly match a sendto() call which has been made on the sending side). TCP is very different in this regard, and it can be quite confusing. In fact, it is that confusing that I would say that relying-on-TCP-doing-exactly-one-recv()-call-for-each-send()-call is one of the most popular bugs in programs using TCP.

By definition, TCP is a stream, so whatever you’ve pushed to send(), you should get from recv() (well, saving for corrupted-packets-which-werent-detected-by-16bit-checksum, see the next item).

However, what you’ve pushed via single call to send(), is not guaranteed to be received in a single call to recv(). In other words, send() call does not establish any kind of boundary, so if you’ve called send() 20 times with 80 bytes of data in each call, you (at least in theory) may get any combination of recv() calls on the other side of communication, from a single recv() call with all 1600 bytes, up to 1600 recv() calls with one byte each, and anything in between. In practice, these extreme cases are not likely to happen, but over the Internet you will certainly get sequences such as 536 bytes – 536 bytes – 48 bytes – 80 bytes – 160 bytes – 80 bytes – 160 bytes, or 80 bytes – 1460 bytes – 60 bytes, and tons and tons of others, that’s for sure.

To make things worse, if you’re doing only intra-LAN testing (or even worse – only the same-machine testing), you may observe that recv() calls do have 1-to-1 correspondence to send() calls; however, such behavior is not guaranteed in any way, and inevitably breaks badly as soon as you start to deploy your game to the Internet.

In practice, this means that if you need to pass messages over TCP, you have to provide your own information about message boundaries to make sure you can parse your stream correctly. The simplest way to implement messages over TCP (which isn’t necessarily the best one for your purposes, but usually will do as a starting point) is to say that every message starts with 2-byte message size. Then follows the message itself, then the stream contains another 2-byte message size, then goes another message, and so on.



44. DON’T trust TCP Checksums for Critical Data

Just like UDP checksum, TCP checksum is also only 16-bit long. It carries implications which are very similar to those we’ve seen for item #39 in Part V for UDP. In practice, if you’re transferring a 10GB-file over the relatively-poor Internet connection, chances of it arriving broken over plain TCP, are substantially above zero (which means “I’ve seen it myself quite a few times”).

“If you have some critical data, don't rely on TCP checksums – they will keep failing intermittently for some of your playersHence, there is a need to have another checksum if you Really need your data to be reliable. For example, it is common for open source projects to provide something like SHA-1 checksum alongside with a download; not only it serves as a sort of protection from Bad Guys feeding you a maliciously modified executable1, but also helps to detect broken downloads. Even better, most of modern installers do have built-in checksum check before they start working.

Bottom line: if you have some critical data, don’t rely on TCP checksums – they will keep failing intermittently for some of your players. For critical data, you should add another checksum (see, for example, those ones discussed in item #39 in Part V). One exception to this rule is if you’re using TLS: TLS effectively provides at least 128-bit checksums, which do qualify as “broken packet won’t slip in, ever” (see also item #39 in Part V for a bit more detailed discussion of probabilities).

45. DO Consider using TCP_NODELAY

To improve interactivity of TCP connections, one MAY consider TCP_NODELAY parameter of setsockopt() function.

The idea of TCP_NODELAY is to bypass one of TCP mechanisms which hurts interactivity. By default, TCP stream uses a so-called Nagle’s algorithm to collect data from several send() calls and to make one TCP packet from all of them together. While doing it does help to reduce a number of TCP packets as a “defence from careless applications” [WikiNagle], it does increase delays, which may hurt interactivity (roughly – Nagle’s algorithm introduces delay of the order of Round-Trip-Time, RTT, which can be of the order of 100-200ms for trans-atlantic connections).

To disable Nagle’s algorithm, you can call setsockopt() with TCP_NODELAY parameter, instructing TCP stack to “send whatever data is passed to send() right away”. This avoids delay imposed by Nagle algorithm, that in turn tends to improve interactivity over TCP quite a bit (at least for that RTT due to Nagle’s algorithm, see above).

“In addition, if TCP_NODELAY is set, it MIGHT help to mitigate consequences of exponential backoff algorithm in case of lost packets.In addition, if TCP_NODELAY is set, it MIGHT help to mitigate consequences of exponential backoff algorithm in case of lost packets. By default, TCP automatically retransmits the data if the other side doesn’t acknowledge it; so far so good. However, to avoid network congestion, TCP doubles retransmission timeout for each retransmit. This is what causes additional problems for interactivity over TCP. This doubling of retransmission timeout (known as “exponential backoff”) means that while the first lost packet may be retransmitted in say, 100ms,2 then after just 5 packets lost in a row, timeout will grow to 3.2 seconds, that may be already too much for your game.

If you’re using TCP_NODELAY, situation MIGHT be different with regards to exponential backoff. For example, if you’re calling send() every 50 ms for an TCP_NODELAY socket, in theory it should cause packets to be sent at least every 50ms, avoiding those over-1-second delays mentioned above even in cases of multiple lost packets. However, from what I’ve seen, this TCP_NODELAY feature (or is it misfeature?) of avoiding-exponential-backoff doesn’t work universally on all the platforms, so before relying on it – try it on all platforms of the interest, and look at the resulting traffic in Wireshark.

Yet another implication of TCP_NODELAY TCP connection is that it (normally) sets so-called PSH (PUSH) flag for outgoing TCP packets. This flag should instruct receiving side of TCP to indicate that the data need to be “pushed” to the application without waiting-for-more-data-to-arrive on receiving side; quite expectedly, this also tends to improve interactivity. However, on the flip side, it increases number of calls issued by TCP stack to the application, which might cause a CPU performance hit (which is pretty much inevitable when improving interactivity, but you still need to know that such a side effect is possible).

Bottom line: while TCP_NODELAY can be a Really Powerful Tool to improve interactivity of TCP connections, you shouldn’t take a decision about it lightly, especially on existing systems. If you’re using TCP_NODELAY – test your app thoroughly on a variety of different platforms (handling of TCP_NODELAY has been observed to differ between platforms), don’t forget about potential performance issues (see above), and eliminate multiple send() calls per single logical message (see below).

45a. DON’T use multiple send() calls per logical message, especially when using TCP_NODELAY

By default (i.e. if you’re not using TCP_NODELAY), TCP stack is very forgiving in terms of “how you can call send()”. In other words, without TCP_NODELAY, you may get away with calling send() with two bytes, and immediately calling send() with the other two bytes once again, and TCP stack will combine these two calls into one single packet for you (see about Nagle’s algorithm in the item #45 right above).

“The best way of handling it is to prepare your message in memory completely, and to send() it all at once.With TCP_NODELAY, such calling patterns quickly become disastrous: whenever you call send() for a TCP_NODELAY socket, TCP stack sends a TCP packet right away, so those two calls to send() mean sending two TCP packets. As TCP+IP headers are at least 20+20=40 bytes, then for the example above (2+2 bytes in two send() calls), when using TCP_NODELAY you’ll send 40+2+40+2=84 overall bytes instead of 40+2+2=44 bytes you really need to send. This is 90% of unnecessary overhead(!) and it can be much worse if your message is made by more-than-two-send()-calls.

The best way of handling it is to prepare your message in memory completely, and to send() it all at once. In fact, it is useful not only if you’re using TCP_NODELAY, but also without TCP_NODELAY: each send() call normally causes a switch to kernel mode, and switches to kernel mode are quite expensive (in the range of several hundreds of CPU clocks) just for the switch; doing the same trivial data combining in user space is usually significantly faster at least for desktop and smartphone CPUs3 (even though it involves making an extra copy of your message).

However, doing all of the combining yourself is not always an option, in particular if you’re using some 3rd-party library which calls send() recklessly itself. In this case, on Linux systems you may want to use setsockopt() with TCP_CORK parameter (see [RedHat] for detailed explanation of TCP_CORK). While for Windows there is no TCP_CORK, an equivalent workaround technique has been reported as working there [StackOverflow].

However, both TCP_CORK and Windows workaround mentioned above, while solving the “unnecessary packets being sent” problem, seem to cause even more switches to kernel mode, which makes them even more expensive in terms of CPU clocks. Therefore, if you have a choice, I strongly recommend to prepare the whole message you need to send, in memory, and then send it via a single send() call.

46. DO Consider Keep-Alive with Your Own Timeout

I know that I will be beaten black and blue for this suggestion, especially by hard-core zealots of congestion control on the Internet. However, being a strong advocate of the position that it is Internet which should serve users’ needs, and not the other way around, I am sure that it is our responsibility as developers, to make life of the users simpler. And while I agree that congestion control is important (though effects of TCP exponential back-off for congestion control purposes are currently disputed [MondalEtAl]), the needs of end-users should still come first.

“All of us are seeing 'hung' HTTP connections on a regular basis – that is, when we've clicked a link, and the page loading is stuck forever.Now to the problem in hand. All of us are seeing “hung” HTTP connections on a regular basis – that is, when we’ve clicked a link, and the page loading is stuck forever. And as soon as we’re tired of waiting and hitting “refresh” – the page is served immediately. These observations can be partially attributed to psychology: we tend to remember only those instances when “refresh” has helped, and don’t remember those when it didn’t. However, in practice, this effect of “new-TCP-connection-behaves-better-than-existing-one” does exists too and goes well beyond psychological effects described above. In particular, personally I tend to attribute the real part of the observed effect to the same “exponential backoff” algorithm which I’ve already mentioned quite a few times. For example, in the case of the temporary ‘black holes’ which do occur over the Internet, “exponential backoff” algorithm, given the worst-case timing, may cause doubling the time-when-the-site-is-unaccessible. In other words, if there is a connectivity problem, then due to the exponential backoff algorithm the site might be not accessible via old TCP connection for twice longer (!) than it is inaccessible via new TCP connection. Additional considerations may include different processing for SYN and non-SYN packets, but they are difficult to analyze and confirm.

However, it doesn’t matter much why new TCP connections tend to behave better – what is really important that they do behave better, which has been confirmed in practice for a hundreds-of-thousands-simultaneous-players game. So, what should we do to make sure that the user is not “stuck” with not-really-functional TCP connection at the time when the connectivity does exist (in particular, when it has been recently restored, while “old” TCP connection still doesn’t have a clue that connectivity has been restored)?

“Your users will appreciate being able to play instead of looking at the game being 'stuck'One way I know to handle it, is to have your own keep-alive mechanism on top of TCP, with an automated re-connect when keep-alive timeout expires. This keep-alive mechanism would make sure that a message is sent over TCP connection, say, every one second (as a side bonus, if you’re using TCP_NODELAY, keep-alive alone might make your system more responsive, see item #45 above). Then, on receiving side, if there is nothing coming in for, say, 3-5 seconds – you do know that the connection is in trouble, and that you’d better to try establishing a new one (without dropping the existing one). If while establishing a new connection, the old one comes to life – all the better, keep using the old one. If you cannot establish the connection (that is perfectly possible) – at least you’ve done everything you could under the circumstances, and your servers are indeed unreachable from this user’s place. However, if you can establish the new connection and the old one is still dead – switch to the new one, your users will appreciate being able to play instead of looking at the game being “stuck”. You have very little to lose (except for extra programming work), and you can improve your user’s experience quite a bit.

Another approach which MIGHT help in this regard, is having your own keep-alives together with TCP_NODELAY. If you’re using this combination (and if TCP_NODELAY does ignore “exponential backoff”, see item #45 for details on it), it MIGHT happen that you won’t need to re-establish connection, but you need to test this approach very thoroughly on all the relevant client platforms before relying on it.

The third approach MIGHT be to try adjusting timeouts for built-in TCP KEEPALIVE function. While default values for KEEPALIVE are set to 2 hours (making it perfectly useless for games except maybe for offline chess), there seems to be a (non-standard) way to adjust KEEPALIVE parameters on per-connections basis (look for SIO_KEEPALIVE_VALS for Windows, and TCP_KEEPIDLE etc. for Linux, though the latter won’t allow you to go below 1-second delays). While I have no idea how it will behave in practice with intervals which are that much smaller than default ones, it might be worth a try (especially if combined with TCP_NODELAY). Once again – before relying on such non-standard stuff, make sure to test it thoroughly on all of your target platforms.

And if you’re too lazy to do any of these – at the very least provide your user with a “refresh connection” button similar to what you have in browser – while it is much less user-friendly than the solution described above (and goes against item #4 in Part I, which says “DON’T use User as a Freebie Error Handler”) – it is still much better than forcing the user to exit your game to solve this “connection being stuck” problem.



47. DO use a Single Connection to a Front-End Server

If your game is successful, you will most likely need to have many servers to serve all your players. And as described in item #20 in Part IIIb, it is usually a Good Idea to separate your servers into ‘front-end’ cheap-and-easily-replaceable servers which do nothing but handling connections and traffic, and ‘back-end’ servers which run the real logic.

After doing it this way – you will realize that your client is able to have one single TCP connection to a front-end server, and the front-end server will then pass your connections to the back-end servers as necessary.

This approach has its cost, as you’ll need to implement support for a kind of ‘virtual connections’ over single TCP connection (which in turn will require an application-level firewall, which will be discussed in item #58 in Part VIIb).

However, if you do spend time on implementing it – you will get quite a few benefits. First of all, you won’t need to handle situations when the client is only partially connected (is connected to one server, but has lost the connection to another one) – the mere thought of performing such analysis makes me dizzy. And you won’t need to explain to the end-user what is going on in such “partially connected” scenarios, which is a Big Headache to say the least. And of course, you’ll get some performance benefits too – server-side processing CPU power necessary to process TCP will drop significantly (about Nx, where N is average number of back-end servers where user is connected), and traffic will drop quite significantly too (due to reduced number of packets, and to better compression); in practice up to 30-50% reduction in traffic has been observed, but this is not the limit.

“if you're using TCP, don't be lazy and do implement these virtual connections over a single TCP connection.Yet another benefit of these ‘virtual connections’ over a single TCP connection, is that you’ll be able to implement prioritization relatively easily (see item #17c in Part IIb for details). When you have several different TCP connections, prioritizing between them is extremely difficult (QoS doesn’t work over the Internet, see item #17b in Part IIb), and if one of your TCP connections is for playing, and another is for DLC download, there is quite a big chance that DLC one will eat up enough bandwidth to make playing impossible. When both playing and DLC download are going over the same TCP connection, pretty much everything is under your control, and while TCP outgoing buffers may and will cause some delays – for games where over-1-second delays are acceptable, gameplay should be fine even in presence of DLC downloads over the same connection.

Bottom line: if you’re using TCP, don’t be lazy and do implement these virtual connections over a single TCP connection. They will provide enough benefits to justify development time.

48. DO Consider using SO_LINGER with l_linger=0

This is another suggestion for which I will be trashed (this time by network admins, who tend to consider RST packets being a Big Problem per se).

The issue here is about TCP graceful shutdown, which is used by default. In other words, whenever you’re closing a TCP socket, it actually stays open until all the communication is gracefully terminated. While this is a Good Thing for file transfers and similar exchanges, for highly interactive games this waiting for undetermined-time until the message is delivered, is not really something you want to rely on. More importantly, this default approach tends to cause some of the sockets (number of such sockets depends on Internet connectivity at the given moment) stuck on the server side in not-so-obvious states, notably TIME_WAIT. As server “knows” about them (you can see them on server using something like netstat command), sockets in TIME_WAIT state are causing server resources to be tied until certain timeout expires, that in turn, may cause your server to refuse connections for the reasons which are not apparent to you and worse, are completely unclear to your players (in practice, I’ve seen it becoming quite a Big Problem in cases of mass disconnects/reconnects caused by whole-ISP-such-as-Comcast going down and then coming back in 1-2 minutes after a typical BGP convergence).

To avoid it, you can switch your sockets to so-called “abortive shutdown”, using SO_LINGER option of setsockopt() function with l_onoff = 1, and l_linger = 0. It will solve the above problem of TIME_WAIT states, at the cost of terminating your TCP connections with the RST packet routinely (instead of usual 4-way shutdown using FIN packets), which is often considered as “bad network practice”. However, this l_linger=0 option is perfectly legal (it exists for all the Berkeley sockets I know about, and RSTs DO happen regardless of this option, though much less frequently), and has been observed to cause less problems that it solves. In practice, problems observed were mostly due to the existing monitoring software considering RST a Bad Thing and complaining too much about high rate of RSTs – which is indeed unpleasant, but is not comparable to server-being-unable-to-handle-incoming-users-due-to-those-pointless-connections-in-TIME_WAIT-state; problems with ‘stray packets’ which are often mentioned as a Supposedly Good Reason for this TIME_WAIT delay, weren’t observed to cause anything worse than effects caused by corrupted-packet-which-slipped-past-TCP-checksum (which need to be handled in any case, see item #44 above, and those additional-checksums on top of TCP did help against these ‘stray packets’ too). While your mileage may vary, I would certainly include this l_linger=0 option into consideration.



49. DO Compress your Data

As it has been discussed in item #42 in Part V, some methods of compression might work even for UDP packets. Data compression for TCP streams tends to be significantly more efficient than the compression for UDP, especially as LZ77-like algorithms, which don’t really work for UDP, have a potential to work for TCP. However, for interactive communications over TCP (i.e. with compressor.flush() being called for the compression algorithm every 100-200 bytes, so you can start transferring your message right away) algorithms such as ZIP are still quite inefficient (see, for example, [Ignatchenko98]).

“For game and game-like traffic, I've personally observed as much as 2x improvement due to LZHL compression.On the other hand, there are algorithms out there which are optimized specifically for interactive communications over TCP (or more generally, for reliable stream-based protocols); one of such algorithms is LZHL written back in 1998 by my esteemed translator [Ignatchenko98]. It has been used in large-scale deployments with hundreds of thousands of simultaneous connections, and has been observed to be highly efficient. Of course, the compression efficiency depends on the nature of the data you’re compressing, but just to give an idea of what you might expect: for game and game-like traffic, I’ve personally observed as much as 2x improvement due to LZHL compression.

In addition to obvious reduction in traffic costs, one of the less obvious benefits of the compression is to keep marshalling simple (without thinking too much about optimizing network traffic at the marshalling layer and above), while allowing to keep network traffic minimal.

Note that when you have both encryption and compression, they SHOULD be applied in the following order: plain-text => compressed-text => encrypted-text (if you do the encryption first, then no compression can possibly optimize the properly encrypted data). One interesting observation in such compression+encryption schemas, is that adding compression (highly efficient one, such as LZHL) to the protocol stack before the encryption may actually decrease the overall CPU load [Ignatchenko98].



50. DO listen() on port 443 (among other things)

As it has been briefly mentioned in Part IV, not all TCP ports are always available to your players, especially if they’re on a journey and need to use hotel’s Wi-Fi to access the Internet. To deal with it, a very simple approach of listening on port 443 (and including it to the list of ports-to-try if everything else fails), has been found quite efficient.

“I can assure you, your users will appreciate this ability (or they won't appreciate competitor who didn't do it, which is pretty much the same thing).Port 443 is a standard port for HTTPS, and works pretty well everywhere. It means that if your user is in a hotel which has blocked everything but HTTP and HTTPS (that happens all the time, as lots of hotel ISPs have quite peculiar approach and answer the question of “what can qualify as an Internet connection?” as “the very bare minimum to avoid us being sued for false advertisement”) – your app is still likely to connect via port 443. I can assure you, your users will appreciate this ability (or they won’t appreciate competitor who didn’t do it, which is pretty much the same thing). As far as I know, there are no apparent downsides of providing such a port, at least for TLS connections. If you’re using TLS – then all the traffic beyond initial handshake is encrypted, and it is not possible for anybody in between to distinguish HTTPS and your traffic (except, maybe, for a session length and/or TLS record patterns which are specific to HTTPS, which nobody is supposed to look at). Hence, passing your TLS-but-non-HTTPS traffic over 443 port shouldn’t break anything (at least unless the ISP is trying to detect and prevent such things).

On the other hand, I would recommend against using port 443 as the-port-for-all-the-communications; instead, I would suggest using some other port as a default, and use port 443 only as a “last resort”. There is a convention that port 443 is for HTTPS, so while I’m open to using it for other purposes when everything else fails (i.e. I see it as user needs trumping all the not-so-carved-in-stone conventions), I’d rather stay a “reasonably good Internet citizen” when the user can be made happy without going against such conventions.

One downside of listening on port 443 is that you’ll get a lot of connections from the hacker bots trying to break a webserver-which-they-hope-is-sitting-behind-your-port-443. Fortunately, as you don’t have any webserver there, blocking these attacks doesn’t require any special work beyond usual “be careful with any data coming in”; in other words, if you have an open port – be ready to handle attacks on it, and consider those bots trying to GET /…/…/isapi.dll over your port 443, as free testers to see if you have implemented very basic hacking prevention measures and basic error handling properly.

To be continued…

Today we’ve discussed TCP-specific issues for game engines. Stay tuned for the last part in this network-for-game-engines series: Part VII, Security (it is going to be a large one, as there are lots of ways to implement security in a wrong way, and only a few ones to implement it correctly).

EDIT: The series has been concluded, with the following parts published:

Part VIIa. Security (TLS/SSL)

Part VIIb. Security (concluded)

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.