

Author: “No Bugs” Hare Follow: Job Title: Sarcastic Architect Hobbies: Thinking Aloud, Arguing with Managers, Annoying HRs,

Calling a Spade a Spade, Keeping Tongue in Cheek

[[This is Chapter 13(e) from “beta” Volume IV of the upcoming book “Development&Deployment of Multiplayer Online Games”, which is currently being beta-tested. Beta-testing is intended to improve the quality of the book, and provides free e-copy of the “release” book to those who help with improving; for further details see “ Book Beta Testing “. All the content published during Beta Testing, is subject to change before the book is published.

To navigate through the book, you may want to use Development&Deployment of MOG: Table of Contents.]]

Socket peculiarities

Most of the time, to work with network (both TCP and UDP), you will be using so-called Berkeley Sockets. I won’t go into detailed discussion of them (you can find pretty much everything you need on “how to use Berkeley Sockets” subject, in [Stevens]). However, there are several socket-related things which are not that well-known, and this is what I’ll try to discuss here.

To IPv6 or not to IPv6?

“Do we need to support IPv6?One common question which arises these days, is “whether we need to have our Clients and Servers support IPv6?” – or in a more strong version, “maybe we can support ONLY IPv6?”

In short – these days ALMOST-ALL player’s devices support IPv4, and only like 10% support IPv6 (though the number is growing).[Google] On the other hand, there are reports of some ISPs using IPv6-only within their networks (and converting them to IPv4 via NAT64/DNS64); the number of such setups is expected to grow as exhaustion of IPv4 space goes ahead.

When applied to the games, it (rather counterintuitively) means the following:

You MUST support IPv4 both on Server and Client

You SHOULD support IPv6 on the Client. This is necessary to deal with those IPv6-only player ISPs It MAY be achieved by simply taking an IPv6 address from getaddrinfo() (though with a fallback to other IPs, including IPv4 to account for potential misconfigurations)

You MAY support IPv6 on the Server if you like it. However, even if your Server doesn’t support IPv6, pretty much all the real-world Clients-supporting-IPv6 will be able to connect to it anyway via NAT64/DNS64 (a bit more on it will be discussed below). Supporting IPv6 on Server can be generally done either via listening on two sockets (one IPv4, and another IPv6 with IPV6_V6ONLY option), or via single IPv6 socket with IPV6_V6ONLY turned off; there are, however, some OS-specific peculiarities in this regard, see [StackOverflow.BothIPv4IPv6] for details.



Horrors of gethostbyname(): using getaddrinfo() instead, or NOT using anything at all

On the client, we will often need to convert DNS name (like us-server.ourgamedomain.com) into an IP address. This process is known as “name resolution”.

To DNS or Not to DNS?

Actually, even before we start discussing “HOW to do name resolution”, we need to think a bit on “WHETHER we need name resolution at all?” Actually (unlike, say, web browser), games CAN have IP addresses of their servers embedded into their Clients and avoid name resolution at all.

“Each and every admin-who-knows-his-stuff, will ask “Are You Crazy?” on the very mention of such an option.Whether it is a good idea to embed IPs into the Client – is arguable. Each and every admin-who-knows-his-stuff, will ask “Are You Crazy?” on the very mention of such an option. But let’s see what are the real reasons behind DNS names (opposed to IPs).

In the context of games, one reason to have your IP addresses obtained via DNS rather than embedded into your Client, is that IP addresses can change. Granted, it does happen, but for servers IP the change is a Very Infrequent occurrence (like “from once per several years to never ever”), plus, if your hosting ISP happens to force an IP change, you will know about it at least 2 months in advance, etc. etc. As most of the games out there are routinely updated much more frequently than that, it is not a real problem.

Another reason for using DNS is to allow fast adding/removal of servers when they’re added/removed from the pool of active servers. And depending on the way you’re balancing your servers, this IS one valid reason to use DNS,12 though its efficiency is limited because of DNS propagation times being of the order of hours.3

On the other hand, using DNS has been seen to cause problems for some players in some cases. Yes, failing DNS is not really a problem of your game, but if, whenever DNS server of player’s ISP is down for 15 minutes, your game is accessible while competitor’s one is not – well, you DO get a bit of competitive advantage (that’s for free BTW); as a side bonus, it also greatly reduces incentives to mount DDoS against your DNS.

As a result, my advice with regards to “To DNS or not to DNS” goes as follows:

“ DO have BOTH DNS names and IP addresses in the list-of-servers stored within your Client 4

Your Client should try all of them one by one (when you get a list of IPs when resolving your DNS-name via getaddrinfo(), you MUST try all the addresses you get from there) Make sure to have a timeout on the Client side, in case if you did connect but didn’t receive anything from the server side for a while



This way, you’ll be fine BOTH if IP address has changed, and if player’s DNS server cannot resolve your DNS name for whatever reason (which can range from player-ISP’s DNS server failure to DDoS on your DNS provider infrastructure).

gethostbyname() vs getaddrinfo()

Since the time immemorial, The Way to do DNS name resolution was via function gethostbyname(). Unfortunately, this function is ridden with numerous problems:

It returns a pointer to a static variable, making it non-thread-safe (ouch!) 5

It is blocking

It doesn’t support IPv6

Most (all?) of the time, gethostbyname() returned only one IP address from all the IP addresses advertised by DNS (we’ll see why it is important, in a moment).

Not surprisingly, with all the problems of gethostbyname(), there is a newer-better replacement, it is getaddrinfo() function. I don’t know of any cases when you’ll need to use gethostbyname() these days (well, maybe saving for some Really Obscure Platform which still lives in 1980’s and doesn’t implement getaddrinfo()). In short:

Use getaddrinfo() and forget about gethostbyname()

However, even despite several major improvements, getaddrinfo() is still a blocking function. While non-blocking alternatives do exist (such as getaddrinfo_a() on Linux and GetAddrInfoEx()-with-OVERLAPPED on Windows), they’re not too universal 🙁 . Fortunately enough:

you don’t normally need getaddrinfo() on the server side. While we’re at it – DO NOT use reverse-DNS-lookup on your production servers; in other words – when logging Client’s IP – DO NOT try to log it as DNS name, settle for a plain IP. for Clients, it is usually a Good Idea to have a separate thread which does nothing but receives-DNS-resolution-requests, calls blocking getaddrinfo(), and sends the results back.

“usually I am arguing for using Client-Side Random Load Balancing (as was described in Chapter VII), opposed to DNS Round-Robin.When it comes to multiple addresses returned by getaddrinfo(), let’s recall that usually I am arguing for using Client-Side Random Load Balancing (as was described in Chapter VII), opposed to DNS Round-Robin. For the rationale, see Chapter VII, but in short – while they DO look very similar, DNS round-robin is subject to MUCH more severe disbalances due to DNS caching (and Client-Side Random Load Balancing is not affected by caching much). To implement Client-Side Random Load Balancing via getaddrinfo(), you can do the following:

Get the list of server IPs (either from getaddrinfo(), or from embedded list within the Client) Choose one IP address at random (DO use something better than time-based srand(time(0)) for randomness) Try connecting there If not successful – take this IP out of the list and take another IP address at random from the remaining IPs Rinse and repeat (starting from step #2) if still not successful – rinse and repeat (starting from step #1)

Yet another peculiarity in this regard is related to IPv6. In presence of so-called DNS64, even if you don’t have an IPv6 address in your Zone File, your Client still MAY get an IPv6 address. As a rule of thumb, you should just use this “synthetic” IPv6 address – it is rarely malicious and it will allow your Client to work over those IPv6-only networks which sit behind NAT64/DNS64.

Scalability Issues

select vs epoll vs kqueue vs Completion Ports

The very first question which usually arises at the beginning of a discussion about scalability and sockets, is almost-universally a religious-war-like question of “what is better – epoll or kqueue or Completion Ports?” As with quite a few things ;-), I have my own answer to this question (and I do know that I will be hit hard for articulating it). My take on it goes as follows:

In the context of games, there is little difference between different non-blocking network APIs

Yes, it also means that select() is going to work reasonably well too (though YMMV). By all means, try to experiment (I mean on the Server-Side), but don’t expect miracles from platform-specific APIs. As one example, in two major works comparing select()/poll()/epoll() ([GammoEtAl] and [Libenzi]) we can see that for workloads-without-idle-connections, performance of select()/poll()/epoll() is more or less on par, and only when the number of idle connections goes high, epoll() starts to take a significant lead. However, as it is very common (and recommended) for game servers to drop idle connections after a very brief period of inactivity, this advantage of epoll() doesn’t really manifest itself in games.

What is Really Important, however, is to make your calls non-blocking and process more-than-one connection per thread

Indeed, with 1’000 (10’000 if we’re speaking about front-end servers, see Chapter VII for discussion on front-end servers) players per server and one connection-per-thread we’ll have 1K-10K threads (running over only 10 or so CPU cores), which will cause too much otherwise-unnecessary context switching if run simultaneously.

on limitations of select()

That being said, select() (being the oldest one from the bunch) has a rather nasty limitation. There is a limit on 1024 file handles for select().6 It might seem as not a big deal, but unfortunately it is NOT a limit on “number of file handles which you’re waiting for in select() call”, but rather a limit on “number of overall file handles within the process”(!!). While this limit can be raised (on Linux – via __FD_SETSIZE and ulimit, see, for example, [StackOverflow.Over1024]), this is a rather nasty property of select(), and if you’re hitting this limit, you MAY be better using alternatives such as poll() or epoll()/kqueue() (and the change from select() at least to poll() is usually a very simple one).

TCP: multiple sockets per thread

Personally, for TCP connections in the game-like contexts I’ve had very good experience with the following rather simple architecture:

There is a fixed number of maximum TCP sockets-per-thread (in practice – between 32 and 256)

“ Each of these threads has an input queue of data-to-be-sent for all associated sockets

Each thread is using some-kind-of-non-blocking-IO and single wait (select/poll/WaitForMultipleObjects/epoll/kqueue) for all these sockets (plus the input queue!), and processes all the input/output from them as needed (this processing includes encryption)

Of course, there is also an additional thread which handles accept()’s on the listen()-ing socket, but it works only when we have a new connection, so it is not really loaded. I didn’t see one single thread handling all accept()s running into any performance problems (that is, as long it does nothing but accept()-then-push-accepted-socket-to-the-input-queue-of-some-thread), but if you ever run into it, you MAY be able to have more than one such accept()-ing thread.7

As we can see, within this architecture the number of handles per select()/WaitForMultipleObjects() call is quite limited, so most of the problems related to having-too-many-handles-in-one-call are gone.8 On the other hand, it reduces the number of concurrent threads by 1.5-2.5 orders of magnitude, bringing the number of threads down to around 40 for 256 connections/thread and 10K connections; this is not that different from optimum for a typical-for-server-side 12-core-server-with-HT.

“I am not saying that this architecture is the only viable one, but it does work for TCP for sure (and performs reasonably well too)I am not saying that this architecture is the only viable one, but it does work for TCP for sure.9 And it performs reasonably well too; for example, for one specific game, this architecture has been compared to a Completion-Port-based one (in production), and the performance differences were found to be negligible.10

UDP: Only-One-Socket Problem

With UDP, achieving scalability becomes significantly more complicated 🙁 . In particular, with all-UDP-traffic-going-over-one-single-port we have only one socket for all the 1’000-10’000 connections. It means that our thread-reading-on-UDP-socket is rather likely to become overloaded :-(. In practice, there are at least three different architectures aiming to address this problem.

“The very first (naïve) approach is to give each of your Clients its own UDP port number, which allows to have a socket for each of them too.The very first (naïve) approach is to give each of your Clients its own UDP port number, which allows to have a socket for each of them too. Then – you can do pretty much what we’ve done for TCP (multiple-sockets-per-thread stuff). This approach does work, but also limits the number of clients-per-IP to the number-of-UDP-ports-you-can-have (which is usually in the range of several hundred, and often is not enough).11

The second take would be to modify the model above slightly to have one-UDP-port-and-one-UDP-socket-per-thread instead of one-UDP-port-and-socket-per-player. This one will work (as noted above, even for 10K players and 32 sockets/thread there is only 300 threads or so), but still has some (admittedly rather minor) drawbacks; in particular, it is usually considered a not-so-good-idea to show relations within your system to outside world (as pretty-much-everything-you-reveal MIGHT be used to attack your server); also exposing the port doesn’t allow for easy movement of your users across the threads (which in turn can affect load balancing between the threads).

The third option is to have your UDP-reading thread to do only a very basic job of determining-which-processing-thread-incoming-packet-belongs-to, and to dispatch it there (using some kind of a non-blocking queue, more on queues in Chapter [[TODO]]). In this case it is rather unlikely that your UDP-reading thread will become overloaded, and processing threads will do their job exactly like in a TCP case. And in the unlikely event that your UDP-reading thread becomes overloaded just by receiving-and-dispatching – you can have more than one thread reading the same socket;12and as all the UDP-reading threads are only dispatching – it won’t matter too much where the packet arrives, though occasional packet reorderings will happen. See also discussion on implementing this UDP threading architecture on top of Reactors, in Chapter VII.

Which option to choose – it depends. I would stay away from option #1, but both options #2 and #3 do have their own merits. Option #2 is usually a bit faster (there is less data passed around), which is more prominent especially on server boxes which are pretty much all NUMA these days. Option #3, on the other hand, encapsulates your server-side implementation better (and hides more implementation details from the view of potential attacker).

IMNSHO, the most important thing in this regard is to avoid tying all of your code to one specific option right away, but rather to have your UDP-threading architecture completely isolated from the rest of your code, so that when you come to the point when this choice becomes Really Important – you can change it without any changes to your Game Logic.

“The whole task of optimizing performance beyond, say, 20-50K packets/second per box tends to be Quite Elaborated, and involves quite a few things which are platform- and hardware-dependent.The whole task of optimizing performance beyond, say, 20-50K packets/second per box tends to be Quite Elaborated, and involves quite a few things which are platform- and hardware-dependent. Chances are that you won’t need to go further than that, but if you do – make sure to read an interesting exercise described in [CloudFlare]; while mere receiving the packets (as described in [CloudFlare]) is different than receiving-and-processing them (as we need for a game server), if you want absolutely-best performance, you MIGHT need to play with stuff such as RX queues as described there (see also discussion on Receive Side Scaling a.k.a. RSS, and Receive Packet Steering, a.k.a. RPS, in one of the following sources: [Balode], [kernel.org], and [MSDN]).

[[TODO: recvmmsg() – referenced in Chapter VII on Server-Side Architecture]]

[[TODO: RSS/RPS/RFS and netmap/DPDK/RIO, see also Chapter VII on Server-Side Architecture]]

Testing

When implementing network protocols, you DO need to test your implementation very thoroughly (even more so if you’re developing your own protocol). As Glenn Fiedler has put it in [GafferOnGames.PacketFragmentation]:

“Writing a custom network protocol is hard . So hard that I’ve done this from scratch at least 10 times but each time I manage to fuck it up in a new and interesting ways. You’d think I’d learn eventually but this stuff is complicated. You can’t just write the code and expect it to work. You have to test it!” – Glenn Fiedler

and I can sign under each and every of these words. Now let’s see how such testing can/should be done.

Wireshark

One of the main tools you will need to use when debugging your network protocol (or your implementation of an existing protocol) is [Wireshark]. It is even more true if you need to debug your over-TCP protocol.

While debugging and testing your own network protocol, just install Wireshark on your development machine and monitor all the packets going between your Client and your Server; I am sure you will learn quite a few new things about your protocol even if you previously thought it worked perfectly; this applies regardless of you using TCP or UDP.

Wireshark and encryption

One of the many things which Wireshark can do, is decrypting TLS traffic (seems also to apply to DTLS, though I’ve never used DTLS decrypting myself). Of course, it is not possible to decrypt traffic without a key, but there is a way to supply your server’s private key to Wireshark (see [Wireshark.SSL] for details).

Note that NOT all the cipher suites are supported by Wireshark, so you MAY need to adjust your ciphersuite-of-choice to be able to decrypt your traffic with Wireshark.

tcpdump+Wireshark in Production

“One interesting (and not-too-well-known) feature of Wireshark is that you can use it to analyze production communications without installing Wireshark on your production serverOne interesting (and not-too-well-known) feature of Wireshark is that you can use it to analyze production communications without installing Wireshark on your production server (that is, at least if you’re running your servers on Linux). Usual sequence in this case goes as follows:

You need to analyze what is going on with a specific player

You find out her IP address

You running tcpdump (easily available for all Linuxes) on your server to get the traffic (into a “capture file”), filtering for that IP address (using tcpdump’s option such as “src host <IP> or dst host <IP>”). While you’re at it, make sure to use tcpdump option “-n” to avoid reverse DNS lookups

You download that capture file to your development environment

You run Wireshark to see the capture file in a parsed format (feeding your server’s private key to Wireshark to decrypt traffic as described above if applicable)

Bingo! You can see what has happened with that unfortunate player, and maybe even fix the bug affecting hundreds of others.

While this option should be considered as a “last resort”, I’ve seen it used in production to solve issues which were otherwise-next-to-impossible-to-identify.

“Soak Testing”, Simulation, and Replay

“typical running time for “soak test” is “overnight”, and “soak test” is considered passed if the next morning there are no apparent problemsIn [GafferOnGames.PacketFragmentation], Glenn Fiedler mentions “soak testing”. While I myself didn’t name this thing “soak testing” before, I’ve done LOTS of it (and I like the term too 😉 ). The idea (when applied to network/distributed testing) is to make a test run of your implementation with more-or-less random data, and as-random-as-possible usage patterns; typical running time for “soak test” is “overnight”, and “soak test” is considered passed if the next morning there are no apparent problems (like core dumps / asserts / hanged connections / etc.).

As noted above, this kind of “soak testing” is pretty well-known among network developers. However, I tend to add two (IMHO very important) things to it.

Network Problems Simulation

First of all, I suggest to run “soak tests” while simulating problems at network layer. The rationale for it is trivial: in LAN (and even worse on your local machine) chances are that you will never face packet loss (especially “two in a row” packet loss), large and inconsistent latencies, reordered packets, corrupted packets, etc. To make sure that your program does work in real-world outside of LAN, you do need to test it in presence of network problems such as those described above.

To get a “bad network connection” in your lab, I know three different approaches:

Write your own Bad-Network simulator at UDP level. As UDP is a packet-level protocol, you can just write your own wrapper around sendto() (and/or around recvfrom()) and insert all-the-nasty-stuff-you-want, there. This does work (and when dealing with UDP, I prefer this way personally), but it is restricted to UDP only (for TCP, you don’t have control over packets, so simulating packet loss is not really feasible at app-level) Having your test traffic routed via “latency simulator”, such as Linux-box-with-netem. This is a bit less flexible than Option #1 (you’re restricted to whatever-your-latency-simulator-can-do), but in practice, it is usually enough for in-lab testing, and it works both for UDP and for TCP One variation includes running Linux-with-netem inside VM; this way you’re able to run all the tests on one single developer machine “ However thorough our simulations are, we cannot imagine what kind of weird stuff real-world-Internet can throw at us While you’re at it, and if you’re using TCP, make sure to use all kinds of your Client platforms over this Really Bad connection. TCP stacks are notorious for having quite different implementations, and those differences have potential to hit you in pretty bad ways.

Replay Testing as a Big Helper for “Soak Testing”

One problem with “soak testing” is that most of the time we’re actually hunting for those elusive valid-packets-coming-in-an-unusual-order patterns (see discussion in Chapter V). And when the problem hits – usual response is “add more logging – run again – hope-that-the-same-problem-will-occur-this-time”.

This approach does work, BUT it tends to take LOTS of trial-and-error. I was using it myself for years – that is, until I’ve figured out that deterministic-record-and-replay helps with debugging deterministic systems (including network protocols) A LOT. Let me elaborate on it a bit.

First, let’s note that most of the time, pretty much any network protocol is described in terms of a state machine (and if by any chance it is not – protocol description can be trivially rewritten this way).

Then, let’s observe that protocol state machines are an ideal fit for “Reactors” (such as those described in Chapter V, and also known as event-driven programs or ad-hoc state machines) – and are often implemented as such. And now, if we just add determinism (as described in Chapter V) – then bingo! We’ve just got an ideal way to test our implementation in a post-mortem of a failed-soak-test.

Normally, if we have our network protocol implemented as a deterministic Reactor (a.k.a. deterministic state machine), development process goes as follows:

We run “soak test” while recording all the messages/packets coming to each of the sides of our conversation, into an input-log 13

“ If/when “soak test” fails, we can easily reproduce all the sequence of events which has lead to the problem For example, we can go as follows: We can usually run 10-hours-before-last-2-seconds-before-the-failure in just 20 minutes (for fully deterministic stuff, we are not bound to run it at the same pace, and usually replay runs MUCH faster than original stuff) We can make a snapshot (of the state of our Reactor) at this point to be able to run it from this point pretty much instantly We can launch the debugger and execute our protocol handler exactly as it behaved under these conditions, showing exactly as the bug has brewed and unfolded



Honestly, after spending a substantial portion of my life on debugging of network stuff in a usual (non-replay) manner, I can say that for a complicated network protocol, replayable debugging can reduce debugging time by as much as an order of magnitude(!). BTW, most (though admittedly not all) of my fellow network protocol developers loved this replay technique too. In addition, it was observed that this replay technique tends to improve quality of the resulting protocol/implementation; with replay in place, we can say that we can identify and fix each and every failure which has happened during out “soak testing”; this statement doesn’t stand when using usual trial-and-error-based fixes during “soak testing”.

Of course, it IS possible to debug network protocols (and implementations) in a traditional trial-and-error style, but I’ve tried both, and I strongly prefer the replay-based one.

[[TODO: big provider down. handling massive connectivity problems]]

[[To Be Continued…

This concludes beta Chapter 13(e) from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”. Stay tuned for beta Chapter 14, describing marshalling and encodings.]]

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.