Picturing WebSocket Protocol Packets

(I'm using JavaScript in this article. If you're reading this via the news feed, go to the original version to see the missing parts.)

I recently wrote a WebSocket server in Erlang. I've gotten fond of separating even desktop apps into two programs: one to handle the graphics and interface, and one for the core logic, and they communicate over a local socket. Any more it makes sense to use a browser for the first of these, with a WebSocket connecting it to an external program. The only WebSocket code I could find for Erlang needed existing web server packages, which is why I wrote my own.

The WebSocket spec contains this diagram to describe the messages between the client and server:

0 1 2 3 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 +-+-+-+-+-------+-+-------------+-------------------------------+ |F|R|R|R| opcode|M| Payload len | Extended payload length | |I|S|S|S| (4) |A| (7) | (16/64) | |N|V|V|V| |S| | (if payload len==126/127) | | |1|2|3| |K| | | +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - + 4 5 6 7 + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | Extended payload length continued, if payload len == 127 | + - - - - - - - - - - - - - - - +-------------------------------+ 8 9 10 11 + - - - - - - - - - - - - - - - +-------------------------------+ | |Masking-key, if MASK set to 1 | +-------------------------------+-------------------------------+ 12 13 14 15 +-------------------------------+-------------------------------+ | Masking-key (continued) | Payload Data | +-------------------------------- - - - - - - - - - - - - - - - + : Payload Data continued ... : + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | Payload Data continued ... | +---------------------------------------------------------------+

This is a confusing a diagram for a number of reasons. The ASCII art, for example, makes it hard to see which lines contain data and which are for byte numbers. When I first looked at it, it made me think there was more overhead than there actually is. That's unfortunate, because there's a simplicity to WebSocket protocol packets that's hard to extract from the above image, and that's what I want to demonstrate.

Here's the fixed part of the header, the 16-bits that are always present. This is followed by additional info, if needed, then the data itself. The number of bits is shown below each field. You should keep coming back to this for reference.

[See the original or enable JavaScript.]

F = 1 means this is a complete, self-contained packet. Assume it's always 1 for now. The main use of the opcode (Op) is to specify if the data is UTF-8 text or binary. M = 1 signals the data needs to be exclusive or-ed with a 32-bit mask. The length (Len) has three different encodings depending on much much data there is.

Messages to the server are required to have a mask, so here's what packets look like for each of the three length encodings.

[See the original or enable JavaScript.]

The first has a length of 60 bytes, the second 14,075, and the third 18,000,000. Special escape values for the 7 bit Len field indicate the presence of additional 16 or 64 bit length fields.

Packets from the server to the client don't use the mask, so the headers are shorter. Again, for the same three data lengths:

[See the original or enable JavaScript.]

The remaining part is what fragmented messages look like. The F bit is 1 only for the Final packet. The initial packet contains the opcode; the others have 0 in the opcode field.

[See the original or enable JavaScript.]

This message is 8256 bytes in total: two of 4096 bytes and one of 64. Notice how different length encodings are used, just like in the earlier examples.

(If you liked this, you might enjoy Exploring Audio Files with Erlang.)

permalink November 14, 2016

previously