

Author: “No Bugs” Hare Follow: Job Title: Sarcastic Architect Hobbies: Thinking Aloud, Arguing with Managers, Annoying HRs,

Calling a Spade a Spade, Keeping Tongue in Cheek

This post continues an article about implementing network support for game engines (this includes game engines intended for one single app). We analyse all the genres and platforms, except for browser-based games.

Previous parts:

Part I. Client Side.

Present Part IIa starts discussion on specifics of protocols and APIs – i.e. how to organize interactions between different nodes, and between different layers. Due to the size of Protocols and APIs topic, we’ve split it into Part IIa and Part IIb, so Part IIa covers roughly a half of all the material on it.

Upcoming parts include:

8. DO Ensure that All the Data is Marshalled

Marshalling is the process of transforming the memory representation of an object to a data format suitable for storage or transmission— Wikipedia —If written in C/C++, network applications MUST NOT send C structures over the network. Yes, I really mean it. Avoid it from the very beginning, or you will find yourself in a lot of trouble later. Moreover, any kind of network engine MUST provide a way for the game app to marshal it’s data.

While absence of marshalling doesn’t affect the end-user directly, it is still one of the largest mistakes network developers can make. The problem with sending C structures over the network without any marshalling, is that when the game developer wants to port their game to another platform (or even to switch to another compiler), it will stop working all of a sudden (and I didn’t even start discussing issues related to #pragma push and so on).

Marshalling is not difficult, if done from the very beginning, and all the apps simply MUST marshal data before sending it over the network, no exceptions. In particular, it means you SHOULD encourage the game developers who’re using your engine, to do the same; this is best to be done by exposing your marshalling mechanism to the game developers.

In general, marshalling can be either provided by your programming language, or (esp. for C/C++) you’ll need to implement it yourself.

8a. DIY Marshalling: DO use “simple streaming” APIs

DIY (=”Do It Yourself”) marshalling can be implemented in many ways. One simple and efficient way is to provide “simple streaming” compose/parse functions such as OutputMessage& compose_uint16(OutputMessage&, uint16_t) / uint16_t parse_uint16(Parser&) – for all the data types you need to pass over the network. In this case, OutputMessage is a class/struct which encapsulates a concept of „message which grows when something is added“, and Parser is an object created from an input message, which has a pointer/reference to the input message as well as an offset where the parsing occurs now.

This asymmetry between composing and parsing (composing is done directly to the message, and parsing requires creating separate Parser object) is not exactly mandatory, but in practice it is usually a good thing (in particular, it allows you to avoid storing parsing information within the message, allows repeated parsing, makes parsing formally constant with respect to the message – which is what really happens, and so on). In general, this simple approach is workable even in fairly large-scale environments, but requires quite an effort on the game development side to keep things consistent between composers and parsers.

Then, an example of composing might look as follows:

uint16_t abc, def;//initialized with some meaningful values OutputMessage msg; msg.compose_uint16(abc).compose_uint16(def);

with a corresponding example of parsing being:

InputMessage& msg;//initialized with a valid incoming message Parser parser(msg); uint16_t abc = parser.parse_uint16(); uint16_t def = parser.parse_uint16();

VLQ A variable-length quantity (VLQ) is a universal code that uses an arbitrary number of binary octets (eight-bit bytes) to represent an arbitrarily large integer.— Wikipedia —One of the advantages of this kind of “simple streaming” compose/parse API (as well as for anything which is written on top of it, such as IDL described below, and unlike compose/parse APIs which are based on explicit size handling) it doesn’t really matter which format to use – fixed-size or variable-size (i.e. encodings such as VLQ, and/or null-terminated string encodings, are perfectly feasible). On the other hand, it is very difficult to rival it’s performance (even caller knows message size in advance, provided that you add function such as void reserve(OutputMessage&,size_t max_sz); ).



8b. DIY Marshalling: DO provide some IDL with an IDL-to-code compiler

An improvement over plain compose/parse described above, is to have some kind of declarative way to describe messages (some kind of an Interface Definition Language – IDL), and to have it compiled into sequence of those compose_uint16() / parse_uint16() . In one example, a declaration of might look as an XML declaration

<struct name=“XYZ“> <field name=“abc“ type=“uint16“ /> <field name=“def“ type=“uint16“ /> </struct> <message name=“ZZZ“> <field name=“abc“ type=“uint16“ /> <field name=“zzz“ type=“XYZ“ /> </message>

Then, you need to provide a compiler which takes the declaration above and produces something like the following:

struct idl_struct_XYZ { uint16_t abc; uint16_t def; void compose(OutputMessage& msg) { msg.compose_uint16(abc); msg.compose_uint16(def); } void parse(Parser& parser) { abc = parser.parse_uint16(); def = parser.parse_uint16(); } }; struct idl_message_ZZZ { uint16_t abc; idl_struct_XYZ zzz; void compose(OutputMessage& msg) { msg.compose_uint16(abc); zzz.compose(msg); } void parse(Parser& parser) { abc = parser.parse_uint16(); zzz.parse(parser); } };

Writing such a compiler is trivial (at most a few days by somewhat experienced developer; BTW, it can be written more easily in a language such a Python – with Python I estimate writing this compiler in half a day).

One thing to note is that your Input Definition Language itself doesn’t necessarily need to be XML – for example, for those developers familiar with YACC, parsing the same example IDL rewritten in C-style

struct XYZ { uint16 abc; uint16 def; }; message struct ZZZ { uint16 abc; struct XYZ; };

won’t be any more difficult (once again, the whole compiler shouldn’t take more than a few days – that is, if you’ve already worked with YACC/Bison and Lex/Flex at least a little bit).

One other way to implement marshalling is via RPC calls; in this case, RPC function prototype is one’s IDL. It should be noted, however, that blocking RPC calls are generally a Bad Thing for Internet applications (this is be discussed in item #12 in Part IIb); on the other hand, the reasoning behind item #13 do not apply to Unity 3D-style no-return-non-blocking RPC, which are ok, though personally I still prefer to have structures mapped to the messages as it more explicitly explains what is going on.

8c. 3rd-party Marshalling: DO use platform- and language-neutral formats

For the non-C programming languages, the marshalling problem is usually not “whether to marshal”, but “what to use for marshalling”. In theory, any serialization mechanism will do, but in practice, platform- and language-neutral serialization/marshalling mechanisms (such as JSON) should be preferred over platform- or language-specific ones (such as Python pickle).

8d. DO use binary formats for frequent intra-game interactions

For data formats, there is a very strong not-so-recent trend to use text-based formats (such as xml) over using binary formats (such as VLQ or ASN.1 BER). For games, it may or may not make sense. While text formats do simplify debugging and provide generally better interoperability, they’re inherently larger (usually this manifests itself even after compression), and take more processing time, which will hit you as soon as the game becomes popular (both in terms of traffic and CPU time on server). My experience shows that for highly interactive processing in games binary formats are generally preferable (though exceptions are possible depending on specifics such as volumes, frequency of exchanges etc.)

For binary formats, to simplify debugging and improve interoperability, a standalone program able to parse messages according to IDL and print them in text format, comes quite handy. Even better is to have the same thing also as a library to be used for logging/debugging purposes.

8e. DO use text formats for infrequent external interactions

Unlike intra-game interactions, external interactions such as payments, are usually text- (XML-) based. This is generally a Good Thing. For infrequent external interactions, all the arguments against text formats become much less pronounced (due to infrequent nature), but debugging/interoperability becomes much more important.

8f. DO consider ASN.1 before throwing it away

ASN.1 is a standard... that describes rules and structures for... transmitting... data in telecommunications ...independent of machine-specific encoding... Software tools can be used for the validation— Wikipedia —One of the binary formats (NB: strictly speaking, ASN.1 can also generate and parse XML via XER) which you certainly should take a look at, is ASN.1. It allows for generic marshalling, has it’s own IDL, is used in communications (most common use of ASN.1 on the Internet is as an underlying format for X.509 certificate) and on the first glance looks as “just the ticket” for binary marshalling. On the second look, you’ll either fall in love with it, or will detest it because of associated complexity, but you’ll never know until you try.

Personally, I’m not a big fan af ASN.1 (it is IMHO too bulky, and performance of home-grown streaming-like API is substantially higher – at least unless you compile ASN.1 into code), but I don’t think that this feeling of mine applies universally to all the games out there. So, I think that you SHOULD take a look at ASN.1 and at available libraries (especially at an open-source ASN.1 compiler [asn1c]), and see if you like it for your specific project.

With asn1c compiler, performance-wise ASN.1 will come closer to homegrown streaming parsing described above, though I have my doubts about ASN.1 being able to match simple streaming (mostly due to significantly higher number of allocations necessary to perform ASN.1 parsing); however, if anybody ever makes a benchmark comparing the two – please let me know, with asn1c it is not that obvious. Also if performance difference is small in the grand scheme of things (and even 2x performance difference in marshalling isn’t likely to be too visible in overall performance), other considerations such as development time, become more important, and here ASN.1 might (or might not) be a good choice depending on details of your project. One note of caution though: when speaking about development times, game developer’s time is more important than network-engine-developer time, so you’ll need to think what kind of IDL your game developers will prefer – the one like described above, or ASN.1’s (BTW, if they prefer custom-made simple IDL, you still can use ASN.1 underneath, providing compiler from your own IDL to ASN.1, which shouldn’t be too complicated).

Bottom line: while I personally don’t exactly like ASN.1, it might work for you (as long as you take into account considerations above).

8g. DO remember about Little-Endian/Big-Endian caveat

Endianness Big-endian systems store the most significant byte of a word in the smallest address... Little-endian systems, in contrast, store the least significant byte in the smallest address.— Wikipedia — When implementing your compose_*() / parse_*() functions (those which deal with multi-byte representations) in C/C++, you need to keep in mind that on different platforms the same integer is represented by different sequence of bytes. For example, on so-called “little-endian” systems (notably x86), you’ll have (uint16_t)1234 represented as bytes 0xD2, 0x04. On so-called “big-endian” systems (mostly big iron such as AIX etc.), the same (uint16_t)1234 will be represented as bytes 0x04, 0xD2. That’s why if you simply write uint16_t x = 1234; send(socket,&x,2); , it will send different data on little-endian and big-endian platforms.

In practice, for games it is rarely a real problem, as most of the CPUs you’ll need to deal with, are little-endian (x86 is little endian, and while ARM can be either little-endian or big-endian, both iOS and Android currently make it little endian). However, for the sake of the correctness, it is better to keep it in mind and do one of the following:

marshal data byte by byte (i.e. sending first x>>8, and then x&0xFF – this will produce the same results regardless of endianness).

use #ifdef BIG_ENDIAN (or #ifdef __i386 etc.) stuff to produce different versions for different machines. NB: Strictly speaking, BIG_ENDIAN macro is not sufficient to perform cast-based marshalling; on some architectures (notably SPARC) you cannot read unaligned data, so it won’t work. For ARMv7 and up CPUs situation is even more complicated: while technically not all the instructions support misalignment, for marshalling code compilers tend to generate code which is using misalignment-safe instructions, so cast-based parsing works; still, I’d rather not use this trickery for ARMs at the moment.



(or etc.) stuff to produce different versions for different machines. NB: Strictly speaking, BIG_ENDIAN macro is not sufficient to perform cast-based marshalling; on some architectures (notably SPARC) you cannot read unaligned data, so it won’t work. For ARMv7 and up CPUs situation is even more complicated: while technically not all the instructions support misalignment, for marshalling code compilers tend to generate code which is using misalignment-safe instructions, so cast-based parsing works; still, I’d rather not use this trickery for ARMs at the moment. use functions such as htons() / ntohs() . NB: these functions produce so-called “network byte order”, which is big endian (it is just so happened)

The last option is usually recommended everywhere in the literature, but in practice it is not that obvious: first, as you have all your marshalling encapsulated, the second option ( #ifdef BIG_ENDIAN ) isn’t really too bad (and may save you a few clocks as you can keep it little-endian on 99% of your target machines). On the other hand, it is very unlikely that you’ll ever see any observable performance difference. What matters is to keep it in mind, exact implementation doesn’t matter much.

Personally, when I am concerned about performance, I prefer the following approach: to have “generic” byte-by-byte version (the one which works everywhere regardless of endianness and doesn’t rely on ability to read unaligned data), and then to have cast-based specializations for platforms of specific interest (such as X86), for example:

uint16_t parse_uint16(byte*& ptr) { //assuming little-endian order on the wire #if defined(__i386) || defined(__x86_64__) || defined(_M_IX86) || defined(_M_X64) uint16_t ret = *(uint16_t*)ptr; ptr += 2; return ret; #else byte low = *ptr++; return low | ((uint16_t)(*ptr++)) <<8; #endif }

This way, you have a very reliable version which works everywhere (under “#else”), and another specialized slightly-faster-for-free version working on your platform of the most interest.

As for the other programming languages (such as Java): while underlying CPUs are still little-endian or big-endian, languages such as Java don’t allow to observe the difference between the two, so this problem is non-existent there.

8h. DO Remember about Buffer Overwrites and Buffer Overreads

When implementing your parsing routines, make sure that they are not prone to malformed-packets attacks (i.e. that malformed packet cannot cause buffer overflow). For details, please see item #57 in Part VIIb. One additional thing to remember is that not only buffer overwrites which are dangerous: buffer overreads (for example, calling a strlen() on a supposedly null-terminated string within the packet, in case if the string is apparently not null-terminated) can cause core dump (0xC0000005 exception on Windows), most likely crashing your program.

9. DO have a Separate Network Layer with a Well-Defined Interface

“the interface of your library... MUST NOT blindly mimic TCP sockets or whatever other system-level API you're usingWhatever you’re doing with the network – it certainly deserves a separate library (within or alongside the rest of your game engine) to encapsulate all the network stuff you need. Even if this library is rudimentary now – it will likely evolve into something more complicated later. And your library should be separated enough from the rest of your engine. It means: “don’t mix 3D and networking together; the farther you can separate them – the better“. In general, your networking library SHOULD NOT depend on your graphics library, and vice versa. NB: for those who think that nobody will ever write a graphics engine which is tightly coupled to the networking engine – take a look at Gecko/Mozilla, you’ll be quite surprised.

One note of caution: the interface of your networking library requires to be tailored to the needs of your application (and MUST NOT blindly mimic TCP sockets or whatever other system-level API you’re using). The task you have in the game app is usually to send/receive a message (with or without a guaranteed delivery), and API of your library should reflect it. One example of a good (though not universal) abstraction is Unity 3D: their network API provides either message delivery or non-guaranteed state synchronization, both of which are good abstraction choices for the task they have in real-time games.

What else (beyond encapsulating system calls into your abstract API) belongs to the network layer? There is more than one way to skin this cat, but I am usually including all the stuff which is related to delivering network messages into the “main thread” (see item #1 in Part I. Client Side), and to processing them there. As well, marshalling/unmarshalling (see item #8 above) IMHO also belongs to the network layer.

Oh, and it should go without saying that any system-level network calls may only appear within your network layer, and should never ever be used outside of it. The whole idea is to encapsulate your network layer and provide clean separation of concerns, isolating application-level from ever bothering with irrelevant communication details.

10. DO understand what is going on “under the hood”

“While using framework is fine, forgetting about it is not.When developing your network engine, it looks very tempting (at least on the first glance) to use just some kind of framework (such as TCP sockets), which will do things automagically for you, and to forget about it. However, if you want to keep players happy, it becomes a bit more complicated. In short: while using framework is fine, forgetting about it is not. In practice it means that as soon as your team is over 2 people, your team generally should have a dedicated network developer, who knows how this magic really works.

In addition, the overall project architect MUST understand at least most general of the limitations imposed by the Internet (such as inherently non-guaranteed nature of IP packets, how guaranteed delivery is achieved over non-guaranteed, typical round-trip times, etc.), and all the team members MUST understand that there are messages out there, going over the network, and that the messages can be arbitrarily delayed (in case of guaranteed message delivery) or lost (in case of non-guaranteed delivery).

It can be summarized as a following table:

Team Member Knowledge Dedicated Network Developer Everything about the library and underlying mechanisms Overall Project Architect General Limitations of the Network All other team members Messages-over-the-network and either potential delays or potential loss

11. DON’T assume that all Users have the Same App Version (a.k.a. DO provide a way to Expand Game Protocol)

“Even if you're forcing the auto-update every time the game app starts, there still are users who were online at the moment of the update, and those who found a way to bypass game updatesDespite having an auto-update (within your network library or otherwise), keep in mind that there are users who haven’t updated the app yet. Even if you’re forcing the auto-update every time the app starts, there are still users who were online at the moment of the update, and those who found a way to bypass game updates (the reasons for such bypassing can vary, from general dislike of updates to modding).

Two common ways to deal with it are:

provide a mechanism so app developer has an app protocol version embedded into the app, to check it on server and to send those users with outdated clients away, hard-forcing them to update. to provide a way to handle protocol differences in a “graceful degradation” style, disabling only that functionality which is not present with available version of the protocol.

Going way (2) is generally more difficult, but provides some additional comfort for the end-users (that is, if it is done carefully). Generally, you should provide both these mechanisms within your engine, so the app developer can choose what she needs right now (and in the long run, they will often need both even during the life span of one single app).

One way to handle approach (2) is based on an observation that most of the protocol changes in a more or less mature app, are related to adding new fields to the protocol. It means that you can provide a generic function such as end_of_parsing_reached() for your marshalling layer, and then app developer can add these new fields to the end of the message, and have

if( parser.end_of_parsing_reached() ) additional_field = 1; else additional_field = parser.parse_int();

at the point of parsing of her potentially modified message. If using your own IDL (see item #8b above), it might look as

<struct name=“XYZ“> <field name=“abc“ type=“uint16“ /> <field name=“def“ type=“uint16“ /> <field name=“additional_field“ type=“uint16“ default=“1“ /> </struct>

with corresponding changes to compose() / parse() generation.

This simple approach with an „optional field“ being added to the end of the message, works surprisingly well, though requires game developers to be careful with how exactly protocol is extended. Of course, not all the protocol changes can be handled in this manner, but if app developer can handle 90% of her protocol updates in this manner, and reduce the number of hard-forced (1)-style updates ten-fold, her users might (or might not – depending on amount of trouble careless updates has caused) appreciate her efforts.

To be continued…

Apparently, Part II became so large that it itself needs to be splitted. Stay tuned – Part IIb, which advances discussion about protocols and APIs beyond formats and marshalling, is coming next Monday.

EDIT: The series has been completed, with the following parts published:

Part IIb. Protocols and APIs (continued)

Part IIIa. Server-Side (Store-Process-and-Forward Architecture

Part IIIb. Server-Side (deployment, optimizations, and testing)

Part IV. Great TCP-vs-UDP Debate

Part V. UDP

Part VI. TCP

Part VIIa. Security (TLS/SSL)

Part VIIb. Security (concluded)

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.