

Author: “No Bugs” Hare Follow: Job Title: Sarcastic Architect Hobbies: Thinking Aloud, Arguing with Managers, Annoying HRs,

Calling a Spade a Spade, Keeping Tongue in Cheek

Pages: 1 2

[[This is Chapter 13(d) from “beta” Volume IV of the upcoming book “Development&Deployment of Multiplayer Online Games”, which is currently being beta-tested. Beta-testing is intended to improve the quality of the book, and provides free e-copy of the “release” book to those who help with improving; for further details see “ Book Beta Testing “. All the content published during Beta Testing, is subject to change before the book is published.

To navigate through the book, you may want to use Development&Deployment of MOG: Table of Contents.]]

As we’ve (sort of) described problems and benefits resulting from using UDP, let’s see what can be done with TCP in the context of games. In most aspects, Websockets are very similar to plain TCP (see also “Websockets: TCP in Disguise” section below).

The Most Popular Bug when using TCP

Assumption is a mother of all {disasters|screw-ups|…} — attributions vary —

Before we start with game specifics, I want to mention one bug which from my experience is by far The Most Popular Bug in TCP-related code (whether it is game code or not). Just one example – at some point in my programming career, I joined a Really Big Company; the project I joined, was three-days-before-release, and on my very first day in their team I found this very bug in their code.

The problem (or maybe misunderstanding?) which a lot of developers are having with TCP (and other stream-oriented solutions such as pipes), is that they erroneously assume each call to send() function (or equivalent) to result in exactly one call to recv() function (or equivalent). While during local-machine testing (and sometimes LAN testing) this assumption MIGHT stand (though even there it is not guaranteed in any way), this assumption falls apart pretty much the same second you’re moving to the Internet.

To avoid running into this problem, it is necessary to remember that

TCP is a byte stream, the whole byte stream and nothing but the byte stream 1

“In particular, call to send() does NOT put any markers into the byte stream, so if you want to pass some messages over TCP, you need to put those markers into the stream yourselfIn particular, call to send() does NOT put any markers into the byte stream, so if you want to pass some messages over TCP, you need to put those markers into the stream yourself (effectively creating your own subprotocol on top of TCP).

In one example, we may decide that we want to communicate messages over TCP, and say that our TCP stream always consists of 2-byte message-size (do NOT forget to agree whether it is going to be big-endian or little-endian), and then of the message-size bytes which represent message body.

Note that on the receiving side we will need to have two loops (each with recv() inside) to parse this format: the first kinda-loop will read message-size (yes, it is possible that recv() will get only one byte), and the second loop will read message body until message-size bytes has been read. While such parsing is certainly not a rocket science, it does require attention to details.

While we’re at it, let’s also discuss message sizes; quite a few people might think that 2-byte message-size is not enough for their messages, and expand to 4-byte ones – and these days even to 8-byte message-size. While doing it, you need to keep in mind that accepting arbitrary large message-sizes creates a potential for a DoS attack; if an attacker can force you to allocate 1000G of RAM (and it doesn’t matter much whether you’re allocating it all-at-once or in chunks) – you’re ending up in a Big Trouble. It means that:

You DO need a limit on your message sizes While it IS possible to deal with Really Large Messages, in practice handling them is too error-prone to do it on a regular basis. In other words – for file transfer services with only 5 different messages and inherently Large Files, handling Large Messages in an ad-hoc manner might be a viable option, but for games it Very Rarely qualifies as a good idea.

The smaller this limit is – generally the better; going above few-M-in-size is rarely a good idea (and if you can fit into 64K – it is even better)

Therefore: 8-byte message-size is almost-universally a Bad Idea 4-byte messages-sizes are ok, but usually require additional check for not exceeding a pre-defined threshold (such as few-M-in-size) if you can fit your messages into 2-byte message-size – it is even better

“ If you’re transferring fragments of max 64K each, but assembling up to 64K of those fragments on receiving side before processing – from DoS point of view it is not better than having one fragment of 4G size

TCP: Reducing Latencies

Nagle algorithm and TCP_NODELAY

When using TCP, take into account that quite a few TCP features which affect interactivity (usually – to the worse 🙁 ). One of them is so-called “Nagle Algorithm” (which is by default enabled on a TCP connection). Nagle algorithm (when enabled) restricts the connection to having only one single “packet in transit” – that is, unless outgoing TCP buffer has got a full packet of MSS size, where one typical MSS value is 1460. I won’t argue too much whether having Nagle as a part of TCP specification is a Good Idea,2 but we need to work with TCP-which-we-have.

Effects of Nagle algorithm on games is often devastating. For example, if Nagle is enabled on a TCP connection, and over 5 seconds we’re trying to send 100 updates (at 20 network ticks/second) of 50 bytes each over a connection with RTT=100ms, then in reality packets will be sent only every 100ms (as soon as the previous packet gets acknowledged; by that time the whole outstanding packet will be only around 250 bytes, which is much less than typical MSS, so most of the time it is an acknowledgement which will be a trigger for sending a new packet, and not a packet becoming full).

To disable Nagle algorithm setsockopt() function with TCP_NODELAY parameter may be used.3

TCP with TCP_NODELAY: minor caveat

One thing to be remembered when using TCP stream with TCP_NODELAY flag on, is to

call send() ONLY when the whole packet is ready

With TCP_NODELAY, each call to send() causes a TCP packet to be sent; the stream is still correct, and your program will still work, but it will cause a significant (and unnecessary) overhead. For example, if you’re implementing the protocol mentioned above (2-byte-message-size + message-body-of-message-size) in the following manner:

void MsgSender::send_msg(const Message& msg) { uint16_t sz = msg.sz; send(sock, &sz, 2, 0); //(*) send(sock, msg.buf, sz, 0); //(**) }

– then it will work more-or-less ok without TCP_NODELAY,4 but with TCP_NODELAY it will cause two packets to be sent (the first one in line (*), and the second one in line (**)), causing an additional 40+ bytes of overhead (20 bytes for IP header, another 20 for TCP header, and that’s not counting Ethernet headers).

Things become even worse if you’re constructing your packet with more-than-two send() calls; for example, if you’re writing each field with a separate send(), and your message consists of six 4-byte fields, then you’ll get an overhead of 40+*5=200+ bytes for an otherwise 40+24=64-byte packet, ouch!

Bottom line:

If you’re using TCP_NODELAY, avoid multiple send() calls for the same logical message at all costs

If you Really Really cannot combine your send() calls together – use TCP_CORK (or a reportedly equivalent workaround for Windows described in [StackOverflow]); while these options will incur additional costs CPU-wise (due to multiple kernel-level calls), they will still save your traffic.

“even if you’re NOT using TCP_NODELAY, combining calls to send() is usually a good ideaAnd while we’re at it: even if you’re NOT using TCP_NODELAY, combining calls to send() is usually a good idea (to avoid doing a relatively expensive kernel call more than once), though “avoiding at all costs” might be a bit of overkill in some cases, as memory copying and especially allocations can easily outweigh gains from avoiding one kernel call (kernel call is usually in the range of 300-500 CPU clocks, so one extra call is not that much). Ideally, CPU-performance-wise, to combine buffers residing in different places in memory, you should aim to use some kind of “vectored I/O” (also known as “scatter/gather”), such as sendmsg() in Linux or WSASendMsg() in Windows.

TCP with TCP_NODELAY: still not a match to UDP with fast-paced sync algorithm, BUT might be necessary at least for TCP fallback

Now let’s compare pushing fast-paced updates via TCP-with-TCP_NODELAY with the UDP algorithm aimed for fast-paced sync (the one from “Fast-paced Updates: Compression without built-in reliability” section above).

Actually,

if we’re using TCP_NODELAY, and there is no packet loss whatsoever, the difference between TCP and UDP is negligible.

“if all packets reach the Client, there isn’t that much difference between TCP and UDPFor each TCP-with-TCP_NODELAY send() call there is a TCP packet sent, and for each UDP’s sendto() call there is a UDP packet sent, and if all packets reach the Client, there isn’t that much difference between the two approaches. Yes, TCP has a bit more overhead, but on the other hand, for TCP we can use compression-compared-to-previous-packet (and for UDP we’re bound to use compression-compared-to-last-acknowledged-packet, see “Fast-paced Updates: Compression without built-in reliability” section for discussion); in any case, the difference will be pretty much negligible in the Grand Schema of things (that is, as long as there is no packet loss(!)).

However, the very first lost packet will show the difference. With UDP we won’t care about it and will keep sending packets to the other side, so just one network tick later the situation will be corrected (that is, if there are no two packets lost in a row). With TCP, however, situation will be quite different and much more complicated.

Usually, with TCP_NODELAY enabled, our next call to send() (on the next network tick) will cause sending-side TCP stack to send our second packet to the Client. However,

on the receiving side this second packet WILL NOT be delivered to our program for quite a while

This happens because TCP is a stream and part of this stream (corresponding to the lost packet) is missing, so receiving-side TCP stack will not let application access the second packet until the first one (originally lost) arrives. Drats, drats, and double-drats! We do have all the information we need, on the receiving computer, but we cannot access it!

Sending-side TCP stack will retransmit the lost packet, but it will happen around 2*RTT time later.5 As typical RTT for a multi-player game is within 50-100ms, it means that for TCP with TCP_NODELAY we’re speaking about “lag spike” of about 100-200ms (for UDP state-sync at 20 network ticks/sec – it is only 50ms, and even less if network ticks are more frequent).

Therefore:

If your game is sensitive to 100ms-or-so delays – you ARE better with UDP However, as UDP is not that-universally available as TCP (see discussion on it in [[TODO]] section above) – I still suggest to implement TCP as a fallback.

On the other hand, if you’re ok with 1+ second delays – you MIGHT be able to get away with TCP-only (though keep reading for further related trickery below)

“Hanged” TCP connections

“If you have seen that some page in your browser got stuck, you hit refresh, and bingo! – it is here in no time, chances are that you’ve just seen such “hanged” TCP connection.One Big Problem when it comes to using TCP for games, is related to “hanged” TCP connections. If you have seen that some page in your browser got stuck, you hit refresh, and bingo! – it is here in no time, chances are that you’ve just seen such “hanged” TCP connection. Yes, quite a bit of these situations can be attributed to coincidences and human “selective memory” (we tend to remember these things better than the opposite ones), but in real world “hanged” TCP does happen (and for a player, it can be Very Annoying – especially as usually there is no “refresh” button in games).

My First Guess – Exponential Backoff

I’ve seen a LOT of these “hanged” connections in games (both as player and as developer), and tend to attribute them primarily to TCP’s “exponential backoff” algorithm (see, for example, [RFC6298]). Whenever sender’s TCP stack doesn’t get an ACK to the sent packet within reasonable time, it keeps retransmitting the packet, but

Doubles retransmission time after each attempt

Now let’s consider a connection which has not-too-high-by-modern-standards 5% end-to-end packet loss rate (you can be sure that such a thing will happen for quite a few of your players) – let’s also assume for the time being that packet loss is completely random.6 Let’s assume that RTT is 100ms, and that first retransmit happens in 200ms from the first time when the packet got sent. Now, the second retransmit will happen in 400ms after the first one, the third one – in 800ms after the second one, and so on, and the sixth one – in 6.4 seconds after the fifth one (i.e. 12.6 seconds from the original packet), which is already too much for quite a few games out there.

“In practice, however, it will be several orders of magnitude more frequent because of correlations between packet losses.With the completely random distribution, chances of getting 6 packets lost in a row are 0.05^6 ~= 1e-8, which may seem as “it is not going to happen”. However, if you’re sending 20 packets per second, it would mean that each of your players will experience “hanged” connection problem every few days, which is not that good. In practice, however, it will be several orders of magnitude more frequent because of correlations between packet losses. In particular, with modern “active queue management” algorithms becoming widespread, chances of packet loss rate going to 20+% for a few seconds are rather high (and actually, these situations are to be expected).

Sudden IP Change – the Curse of Mobile

On mobile devices (phones and tablets, AND on WAN-connected PCs too) there is a rather well-known issue, which arises when you’re moving. In such cases, your IP address CAN be changed. What happens in this scenario, is quite bad for TCP :-(.

Whenever your IP address changes over an existing TCP connection, packets coming from server to that TCP connection, won’t come back to your device anymore. Then, your TCP connection on the Client Side will stay in a “hanged” state, until your Client sends some TCP packet, which reaches the Server, and Server (most likely) issues an RST in response. However, there are numerous guys on the way who will be willing to drop this RST (or even your original packet, as it lacks valid TCP context), so this RST may never reach your device, leaving your TCP “perfectly hanged forever” :-(.

OS features which switch providers automagically (such as WiFi Assist), while generally good for end-user experience, tend to cause more frequent IP switches, and exasperate this problem (that is, unless you’re actively fighting it according to this very book ;-)).

Other Possibilities

I’ve heard quite a few alternative plausible theories explaining “hanged” TCP connections7 ranging from PMTUD (mis)-implementations causing persistent packet loss in case of route changes, to different handling of SYN packets by routers and especially firewalls.

However, it doesn’t really matter what exactly causes those “hanged” connections.8 Whatever the reason for them, the only thing which really matters from our purely pragmatic perspective is a very practical observation that

If your TCP connection “hangs” for several seconds, there is a 30% to 70% chance that a new connection will be ready to transmit data before the “hanged” connection goes back to life (YMMV, batteries not included)

Dealing with “Hanged” connections – Opportunistic Re-Establish

Given the observation above, the most obvious way to deal with “hanged” TCP (that is, besides “let’s drop TCP completely” ;-)) goes along the following lines:

Detect that connection got “hanged” (how to do it is a separate story described below)

Try to establish a second TCP connection “ If during this process the original connection springs back to life – drop the second one and resume working over the first one

As soon as second one is ready to be used (this usually includes TLS session if applicable) – drop the first connection and switch to the second one.

This schema has been seen to work reasonably well for a not-so-fast major game (with acceptable delays around 5 seconds). Applying it to faster games, however, faces significant problems, mostly due to “hanged” detection taking too much time; we’ll discuss alternatives for fast sim-based games later, in “Dealing with “Hanged” Connections – Dual TCP” section.

One important property of the algorithm above is that we don’t really take any risks latency-wise – if the original connection goes back to life while we’re establishing the second one, we just go back to the original connection without losing anything latency-wise.

Detecting “Hanged” connection – app-level Keep-Alives

“every second or so (if there was no other traffic), transport layer of our Server will send a special “Keep-Alive” message over TCPOne way to detect those “hanged” connections is application-level Keep-Alives. For example, every second or so (if there was no other traffic), transport layer of our Server will send a special “Keep-Alive” message over TCP, and if the Client didn’t see any of the messages (Keep-Alive or not) for 5 seconds on their side – the process of establishing a new connection (described above) is started. On the Server side, if there is no activity for 15 seconds, Server can simply drop the connection to release resources (leaving it to the Client to re-establish connection).

This thing was seen to work reasonably well, and IMHO at least once it has made a significant contribution to the player perception of “these guys have better connectivity then competition”.

Detecting “Hanged” connection – TCP-level Keep-Alives

I am a scientist. I don’t take any risks! — Scientist from 'Garfield and Friends' —

An alternative to app-level Keep-Alives is to use TCP-level Keep-Alives. TCP does have it’s own mechanism for Keep-Alives, but until recent years, there was no API to control times for TCP Keep-Alives, and with default being 2 hours, it was rather useless for games. However, with a relatively recent addition of TCP_KEEPINTVL/TCP_KEEPCNT/TCP_KEEPIDLE for Linux (and SIO_KEEPALIVE_VALS for Windows) it became possible to control keep-alive times via a simple setsockopt() call.

That being said, I still prefer application-level Keep-Alives to TCP-level ones. There are two reasons for it (that’s besides me being a DIY guy in general ;-)):

Controlling Keep-Alive times is not a standard feature, and it is often simpler to implement app-level keep-alive then to check that it is available for all the Client platforms

“ when using TCP-level Keep-Alives, we're bound to take a risk that connection was about to restore connectivity just at the moment when it was broken by TCP Keep-Alive

Dealing with “Hanged” Connections – Dual TCP

Of course, if your game sends packets every 50ms no matter what, you don’t need a special Keep-Alive; on the other hand, if you’re doing it – chances are that detect-then-reestablish way of handling that “hanged” TCP connection will take too much time for your game 🙁 .

“In this case, Client keeps two TCP connections to the same Server, and Server sends each message to both TCP connections pretty much simultaneously (even better - shifting one message by half of your network tick or so).For such cases, dual TCP connections can be used. In this case, Client keeps two TCP connections to the same Server, and Server sends each message to both TCP connections pretty much simultaneously (even better – shifting one message by half of your network tick or so). On the Client side, whichever message arrives first – it gets processed (and the subsequent duplicate silently ignored). If one of connections gets “behind” the other one too much – it is considered a “hanged” connection and is dropped-and-re-established.

For the data flying in the opposite direction (from Client to Server) it works pretty much the same (though Servers don’t re-establish connections themselves, they drop the offending one, and also MAY signal Client to re-establish “hanged” connection on remaining connection).

Honestly, I didn’t use this schema myself, BUT I’ve heard rather good things about it (and I think it has Really Good potential). Feel free to try it, but don’t hit me too hard if it doesn’t work 😉 .

As a side bonus, different parts of such Dual TCP can be bound to different interfaces on the Client side, to achieve redundancy over two connections (and get packets delivered with a minimally possible jitter too).9

The only negative downsides of Dual TCP are related to a bit more resources needed on the Server-Side (those TCP connections tend to eat RAM, but fortunately RAM of the order of 32K per player is not-that-precious these days), and more importantly – twice more traffic coming from the Server (and as you’re normally paying for outgoing traffic – it can be rather important 🙁 ). On the other hand, it can be seen as a monetization opportunity too (i.e. Dual TCP – or Dual UDP for that matter – can become a paid option, or VIP option, or whatever-else-your-marketing-guys-decide; while I didn’t see it myself implemented in games, I know quite a few players in different genres who would readily pay for such a feature). Alternatively, you may want to enable Dual TCP only for Important Tournaments.