Over the past year, prior to leaving 343, I spent a large amount of time working with the WebSockets protocol and upgrading the Halo Services to support it. In order to solidify my knowledge and provide a handy refresher for when this information invariably gets context switched out of my brain in the future, I decided to write a primer on WebSockets. Hopefully other people will find the introduction to this new protocol useful as well.

Overview

In December 2011 the IETF standardized the WebSocket protocol. Unlike the typical Request/Response messaging patterns provided by HTTP, this network protocol provides a full-duplex communication channel between a host and a client over TCP. This enables server sent events, reactive user experiences, and real time components.

The WebSocket protocol provides some advantages over the traditional HTTP protocol. Once the connection has been established, there is a point to point system of communication where both devices can communicate with one another simultaneously. This enables server sent events without using a work around like Comet or Long Polling. While these technologies work well, they carry the overhead of HTTP, whereas WebSocket frames have a wire-level overhead of as little as two bytes per frame. The full-duplex communication and low packet overhead make it an ideal protocol for real-time low latency experiences.

An important note: The WebSocket protocol is not layered on top of HTTP, nor is it an extension of the HTTP protocol. The WebSocket protocol is a light weight protocol layered onto of TCP. The only part HTTP plays is in establishing a WebSocket connection via the HTTP Upgrade request. Also the HTTP Upgrade request is not specific to WebSockets but can be used to support other hand-shakes or upgrade mechanisms which will use the underlying TCP connection.

Open a WebSocket Connection

A client can establish a WebSocket connection by initiating a client handshake request. As mentioned above the HTTP Upgrade request is used to initiate a WebSocket connection.

GET /chat HTTP/1.1

HOST: server.example.com

Upgrade: websocket

Connection: Upgrade

Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==

Origin: http://example.com

Sec-WebSocket-Protocol: chat, superchat

Sec-WebSocket-Version: 13

If all goes well on the server and the request can be accepted then the server handshake will be returned.

HTTP/1.1 101 Switching Protocols

Upgrade: websocket

Connection: Upgrade

Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

If an error occurs and the server cannot accept the request, than a HTTP 500 should be returned to indicate that the request has failed and that the protocol is still HTTP.

Once the client server handshake is completed the TCP connection used to make the initial HTTP request has now been upgraded to a WebSocket connection. Messages can now be sent from either the client to the server or the server to the client.

Code

As a developer most of the nuances of the WebSocket handshake are hidden away by the platform specific APIs and SDKs. In the .NET world Windows 8 and Windows Server 2012 introduced native support for the WebSocket protocol. In addition Internet Explorer 10 introduced native support for the WebSocket protocol as well. Also a variety of other platforms support WebSockets.

Client

Using the .NET 4.5 Framework the client code to establish a WebSocket connection in C# would look like this.

ClientWebSocket webSocket = null; webSocket = new ClientWebSocket(); await webSocket.ConnectAsync(new Uri("ws://localhost/Echo"), CancellationToken.None);

Once the connection succeeds on the client the ClientWebSocket object can be used to receive and send messages.

Server

Using the .Net 4.5 Framework on a simple server using HttpListener, the C# code to accept a WebSocket request and complete the hand-shake would look like this.

HttpListenerContext listenerContext = await httpListener.GetContextAsync(); if (listenerContext.Request.IsWebSocketRequest) { WebSocketContext webSocketContext = await listenerContext.AcceptWebSocketAsync(); WebSocket webSocket = webSocketContext.WebSocket } else { //Return a 426 - Upgrade Required Status Code listenerContext.Response.StatusCode = 426; listenerContext.Response.Close(); }

The call to AcceptWebSocket request returns after the server handshake has been returned to the client. At this point the WebSocket object can be used to send and receive messages.

WebSocket Messages

WebSocket messages are transmitted in “frames.” Each WebSocket frame has an opcode, a payload length, and the payload data. Each frame has a header. The size of the header is between 2-14 bytes. As you can see the header overhead is much smaller than the text based HTTP headers.

Headers

0 1 2 3 4 5 6 7 8 9 A B C D E F Final Reserved Bits OpCode Mask Payload Indicator Extended payload length ( present if payload is longer than 125 bytes ) Extended payload length ( present if payload length is >= 2^16 ) Extended payload length ( present if payload length is >= 2^16 ) Extended payload length ( present if payload length is >= 2^16 ) MaskingKey ( present if masking bit is set ) MaskingKey ( present if masking bit is set )

The first 9 bits sent in every WebSocket frame are defined as follow

Final Bit (1 bit) – Indicates whether the frame is the final fragment of a message, as a large message can be broken up and sent over multiple frames. A message that is one frame long would also set this bit to 1.

(1 bit) – Indicates whether the frame is the final fragment of a message, as a large message can be broken up and sent over multiple frames. A message that is one frame long would also set this bit to 1. Reserved (3 bits) – These must be 0, and are currently reserved for extensions.

(3 bits) – These must be 0, and are currently reserved for extensions. OpCodes (4 bits) – Opcodes define how the payload data should be interpreted

(4 bits) – Opcodes define how the payload data should be interpreted Masking (1 bit) – Indicates if the payload data is masked. The WebSocket protocol specifies that all messages sent from a client to a server must be XOR masked.

The variable length of a WebSocket header is based on the size of the payload and the masking-key

Payload Length (7 bits, 7 + 16 bits, 7 + 64 bits) – Bits 10-16 of the header are the payload indicator bits. The number of bits used to encode the payload length varies based on the size of the payload data. 0-125 bytes : payload length encoded in the payload indicator bits 126 – 65,535 bytes : The payload indicator bits are set to 126, and the next two bytes are used to encode the payload length. >65,535 bytes: 127 is encoded in the payload indicator bits, and the next 8 bytes are used to specify the payload length.

(7 bits, 7 + 16 bits, 7 + 64 bits) – Bits 10-16 of the header are the payload indicator bits. The number of bits used to encode the payload length varies based on the size of the payload data. Masking-key (0 or 16 bits) – If the masking bit is set, then the 32 bit integer used to Mask the payload is specified in this field. If the masking bit is not set than this is omitted.

OpCodes

The following table below defines WebSocket frame OpCodes. Applications should only set the Text or Binary OpCodes to specify how the payload data in the frame is interpreted.

Code Meaning Description 0x0 Continuation Frame The payload in this frame is a continuation of the message sent in a previous frame that did not have its final bit set 0x1 Text Frame Application Specific – The payload is encoded in UTF-8 0x2 Binary Frame Application Specific – The payload is a binary blob 0x8 Close Connection Frame Specifies that the WebSocket connection should be closed 0x9 Ping Frame Protocol Specific – sent to check that the client is still available 0xA Pong Frame Protocol Specific – response sent after receiving a ping frame. Unsolicited pong messages can also be sent.

Code

Sending and receiving WebSocket messages is easy using the .NET Framework APIs.

Receiving a Message

byte[] receiveBuffer = new byte[receiveBufferLength]; while (webSocket.State == WebSocketState.Open) { WebSocketReceiveResult receiveResult = await webSocket.ReceiveAsync(new ArraySegment<byte>(receiveBuffer), CancellationToken.None); }

The WebSocketReceiveResult object contains the information sent in one WebSocket frame including the OpCode, Final Bit Setting, Payload Length, and CloseStatus & Reason if its a Close Connection Frame. The receiveBuffer will be populated with the data sent in the payload.

Sending a Message

Sending a message is also simple and an Async method is provided in the .NET 4.5 Framework. The code below echos the message received back over the channel. The data, Message Type, and Final Bit are specified in the parameter list.

await webSocket.SendAsync(new ArraySegment<byte>(receiveBuffer, 0, receiveResult.Count), WebSocketMessageType.Binary, receiveResult.EndOfMessage)

Close a WebSocket Connection

Either endpoint can close the WebSocket connection. In order to do this the endpoint starts the WebSocket Closing Handshake. The initiating end point sends a WebSocket message with a closing status code, and an optional close reason (text), and sets the Opcode in the message to the Close Connection Frame (0x8). Once the message is sent the endpoint will close the WebSocket connection by closing the underlying TCP connection.

As an application developer it is important to note that either endpoint, server or client, can initiate the closing handshake. Practically this means both endpoints need to handle receiving the close frame. It also means that some messages may not be delivered, if the connection is closed while the messages are in transit.

Connection Close Code

Connection Close frames should include a status code, which indicates the reason the WebSocket connection was closed. These are somewhat analogous to HTTP Status Codes.

Code Definition Description 1000 Normal Closure The purpose for which the connection was established has been fulfilled 1001 Endpoint Unavailable A server is going down, or a browser has navigated away from a page 1002 Protocol Error The endpoint received a frame that violated the WebSocket protocol 1003 Invalid Message Type The endpoint has received data that it does not understand. Endpoints which only understand text may send this if they receive a binary message and vice versa 1004 -1006 Reserved Reserved for future use 1007 Invalid Payload Data The payload contained data that was not consistent with the type of message 1008 Policy Violation Endpoint received a message that violates its policy 1009 Message Too Big Endpoint received a message that is too big for it to process. 1010 Mandatory Extension An endpoint is terminating the connection because it expected to negotiate one or more extensions 1011 Internal Error The server is terminating the connection because it encountered and unexpected error 1015 TLS Handshake Used to designate that the connection closed because the TLS handshake failed.

Connection Close Code Ranges

Code Definition 0-999 Not Used 1000-2999 Reserved for use by Protocol Definition 3000-3999 Reserved for use by libraries, frameworks & applications. These should be registered with IANA 4000-4999 Reserved for private use and can’t be registered.

Code

Once again most of the details are dealt with by WebSocket libraries in your framework of choice. Application developers must decide when the connection should be closed, should set the appropriate connection close code and may also set a connection close reason.

The .Net Framework makes this very easy, by providing an asynchronous method, which takes in the connection close code, and close reason as parameters.

await webSocket.CloseAsync(WebSocketCloseStatus.NormalClosure, "Normal Closure", CancellationToken.None);

Microsoft WebSocket Implementations

As mentioned before Windows 8 and Windows Server 2012 introduced native support for the WebSocket protocol. Also because the Xbox One is running a variant of the Windows 8 operating system it also has built in support for WebSockets.

.Net 4.5

Version 4.5 of the .NET framework introduced support for WebSockets through the System.Net.WebSockets namespace. The underlying connection is passing through HTTP.sys in the kernel so timeout settings in the HTTP.sys layer might still apply.

WinRT

WinRT only exposes APIs for creating a WebSocket client connection. There are two classes to do this in the Windows.Networking.Sockets namespace, MessageWebSocket & StreamWebSocket.

Win32 (WinHTTP)

The WinRT API is also available to C++ developers. For developers that want more control WinHTTP provides a set of APIs for sending WebSocket upgrade request, and sending and receiving data on WebSocket connections.

JavaScript

All the latest versions of common browsers, with the exception of Android, support the WebSocket protocol and API as defined by the W3C.

SignalR

The ASP.NET team has built a high-level bi-directional communication API called SignalR. Under the hood SignalR picks the best protocol to use based on the capabilities of the clients. If WebSockets are available it prefers to use that protocol, otherwise it falls back to other HTTP techniques like Comet and Long Polling. SignalR has support for multiple languages including .NET, Javascript, and iOS and Android via Xamarin. It is an open source project on GitHub.

Conclusion

WebSockets are a great new protocol to power real time applications and reactive user experiences due to its lightweight headers, and bi-directional communication. It is also a great protocol for implementing Pub/Sub messaging patterns between servers and clients. However WebSockets are not a silver bullet for networked communications. WebSockets are incredibly powerful but do also have their drawbacks. For instance because WebSockets require a persistent connection, they are consuming resources on the server and require the server to manage state. HTTP and RESTful APIs are still incredibly useful and valid in many scenarios and developers should consider the uses of their APIs and applications when choosing which protocol to use.

You should follow me on Twitter here