Implementing the Binary Memcached Protocol with OCaml and Bitstring

I’ve known OCaml for a while, but I never really put it to use outside of academic work back in university. After reading Real World Ocaml my excitement for Ocaml reappeared and I wanted to try out Core and Async. At the back of my mind, I also remembered bitstring, an awesome Ocaml library for creating and pattern matching on bitstrings. To cover all three, I decided to write a Memcached client using the binary protocol.

I’ll try to cover my experience in a number of blog posts. In this first post, I’ll be talking about using bitstring for parsing and constructing binary data for the Memcached protocol.

Constructing Requests and Parsing Responses

I started out by studying the Memcached binary protocol. The general format of a packet is a fixed size header followed by three optional variable size components (command-specific extras, the key and the value):

Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0/ HEADER / / / / / / / +---------------+---------------+---------------+---------------+ 24/ Command-specific extras (as needed) / +/ (note length in the extras length header field) / +---------------+---------------+---------------+---------------+ m/ Key (as needed) / +/ (note length in key length header field) / +---------------+---------------+---------------+---------------+ n/ Value (as needed) / +/ (note length is total body length header field, minus / +/ sum of the extras and key length body fields) / +---------------+---------------+---------------+---------------+

The header is almost similar for requests and responses and can be described in the following way:

Request header: Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0| Magic | Opcode | Key length | +---------------+---------------+---------------+---------------+ 4| Extras length | Data type | Reserved | +---------------+---------------+---------------+---------------+ 8| Total body length | +---------------+---------------+---------------+---------------+ 12| Opaque | +---------------+---------------+---------------+---------------+ 16| CAS | | | +---------------+---------------+---------------+---------------+ Total 24 bytes Response header: Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0| Magic | Opcode | Key Length | +---------------+---------------+---------------+---------------+ 4| Extras length | Data type | Status | +---------------+---------------+---------------+---------------+ 8| Total body length | +---------------+---------------+---------------+---------------+ 12| Opaque | +---------------+---------------+---------------+---------------+ 16| CAS | | | +---------------+---------------+---------------+---------------+ Total 24 bytes

Let’s go through the header fields one by one:

Magic is a magic byte which distinguishes requests from responses. It must be 0x80 for requests and 0x81 for responses.

is a magic byte which distinguishes requests from responses. It must be for requests and for responses. Opcode specifies the type of command, e.g. GET is 0 and SET is 1 .

specifies the type of command, e.g. is and is . Key Length specifies the size in bytes of the Key -part of the packet.

specifies the size in bytes of the -part of the packet. Extras length specifies the size in bytes of the Command-specific extras -part of the packet.

specifies the size in bytes of the -part of the packet. Data type is currenly not used and should always be 0 .

is currenly not used and should always be . Bytes 7-8 is either Reserved in the request or Status in the response. The status corresponds to an error code or 0 if successful.

in the request or in the response. The status corresponds to an error code or if successful. Total body length is the total size of the body in bytes, i.e. Total body length = Key length + Key length + Value length . The value length is not given explicitly in the header, but can be calculated as Total body length - Key length - Extras length .

is the total size of the body in bytes, i.e. . The value length is not given explicitly in the header, but can be calculated as . Opaque is 4 bytes which are passed along in a request and passed back unmodified in the response.

is 4 bytes which are passed along in a request and passed back unmodified in the response. CAS is used for data version checking. When storing a value you can optionally set its CAS value. Future modifications will be rejected if a matching CAS value is not provided.

As you can see, the only real difference between a request and a response header is that bytes 7-8 are reserved in the request header and used for status in the response header. Ignoring this difference, we can define a unified type for Memcached headers:

type header = { opcode : int ; (* maps to Memcached commands, e.g. GET is 0 *) key_length : int ; (* length of the key component *) extras_length : int ; (* length of the extras component *) status : int ; (* should not be used in requests *) body_length : int32 ; (* total length of the body *) opaque : int32 ; (* is returned unmodified in the response *) cas : int64 ; (* used for data version checking *) }

I’ve omitted data type, since it’s currently not used as noted above.

Now that we have a header type, we just need to figure out how to marshal and unmarshal it from binary data. We’ll use the bitstring library for this purpose. It provides a very convenient way to construct binary data and pattern match on binary data, reminiscent of bit syntax in Erlang. Given a header, we can construct a bitstring like so:

(* header_to_bitstring : header -> bitstring *) let header_to_bitstring h = BITSTRING { 0x80 : 8 ; h . opcode : 8 ; h . key_length : 16 ; h . extras_length : 8 ; 0 : 8 ; h . status : 16 ; h . body_length : 32 ; h . opaque : 32 ; h . cas : 64 }

BITSTRING { ... } will return a value of type bitstring . Each line inside the braces has the form value : size . The first line thus says the first 8 bits should hold the value 0x80 (the magic byte for requests).

Given a bitstring we can similarly parse it to a header in the following manner using the bitmatch operator:

(* header_of_bistring : bitstring -> header option *) let header_of_bitstring bits = bitmatch bits with | { 0x81 : 8 ; opcode : 8 ; key_length : 16 : unsigned ; extras_length : 8 : unsigned ; 0 : 8 ; status : 16 ; body_length : 32 : unsigned ; opaque : 32 ; cas : 64 } -> Some { opcode ; key_length ; extras_length ; status ; body_length ; opaque ; cas ; } | { _ } -> None

bitmatch bits with [patterns] will match bits of type bitstring against each pattern, which follows the same syntax as for BITSTRING . As you can tell, it’s much simpler to build and parse binary strings with bitstring compared to manual bit fiddling!

Both of these functions work with the type bitstring defined in the bitstring library. So how do we read a bitstring from IO or write a bitstring to IO? The bitstring library defines a number of functions which interact with channels (channels are the IO abstraction in the standard library):

(* read all of a chan to a bitstring *) bitstring_of_chan : in_channel -> bitstring (* read at most n bytes of a chan to a bitstring *) bitstring_of_chan_max : in_channel -> int -> bitstring (* write a bitstring to a chan *) bitstring_to_chan : bitstring -> out_channel -> unit

Given those, it should be simple to parse a header from a channel:

(* read_header : in_channel -> header *) let read_header channel = let bits = Bitstring . bitstring_of_chan_max channel 24 in header_of_bitstring bits (* write_header : out_channel -> header -> unit *) let write_header channel header = let bits = header_to_bitstring header in Bitstring . bitstring_to_chan bits channel

At this point we can read and write Memcached headers to and from channels. To handle complete Memcached packets, we need to consider the variable sized body too. Let’s define a packet type:

type packet = { header : header ; extras : bitstring ; key : string ; value : string ; }

The key and value are strings, while the extras contain binary data (a bitstring). We’re now ready to read and write entire packets from channels. Here’s the code:

(* make_request_packet : int -> string -> string -> bitstring -> packet *) let make_request_packet opcode key value extras = let extras_length = ( Bitstring . bitstring_length extras ) / 8 in let key_length = String . length key in let value_length = String . length value in let body_length = Int32 . of_int ( extras_length + key_length + value_length ) in let header = { opcode ; key_length ; extras_length = extras_length ; status = 0 ; body_length = body_length ; opaque = 0 l ; cas = 0 L ; } in { header ; extras ; key ; value } (* write_packet : out_channel -> packet -> unit *) let write_packet channel packet = let header_bits = header_to_bitstring packet . header in Bitstring . bitstring_to_chan header_bits channel ; Bitstring . bitstring_to_chan packet . extras channel ; output_string channel packet . key ; output_string channel packet . value (* read_body : header -> in_channel -> packet option *) let read_body header channel = let body_length = Int32 . to_int header . body_length in let value_length = body_length - header . extras_length - header . key_length in let bits = Bitstring . bitstring_of_chan_max channel body_length in bitmatch bits with | { extras : 8 * header . extras_length : bitstring ; key : 8 * header . key_length : string ; value : 8 * value_length : string } -> Some { header ; extras ; key ; value ; } | { _ } -> None (* read_response_packet : in_channel -> packet option *) let read_response_packet channel = match read_header channel with | Some header -> read_body header channel | None -> None

As a trivial example, we can now send a GET request for the key foo to a Memcached server running on localhost port 11211:

let localhost = Unix . inet_addr_of_string "127.0.0.1" in let addr = Unix . ADDR_INET ( localhost , 11211 ) in let in_chan , out_chan = Unix . open_connection addr in let req = make_request_packet 0 "foo" "" Bitstring . empty_bitstring in write_packet out_chan req ; flush out_chan ; match read_response_packet in_chan with | Some packet when packet . header . status = 0 -> Format . printf "Found value: %s

" packet . value | Some packet -> Format . printf "Key not found" | None -> print_endline "Request failed" ; Unix . shutdown_connection in_chan

Obvious this is still a very low level interface, but it’s easy to build on top of the building blocks we’ve implements. Defining functions with a simpler signature to get and set keys should be easy, but I’ll leave that as an exercise for the reader.

Next time…

In this blog post we’ve developed a primitive OCaml library for talking to Memcached with the binary protocol using bitstring. The code is based on the standard library, which is honest not that great, and all IO is synchronous. In the next installment we’ll try using Core and Async for asynchronous IO. Stay tuned!