I used bitstring to reverse engineer the Windows registry “hive” format. I know that bitstring is my own program, but coming back to it two years after I wrote it and using it again for this, I really think this is a brilliant tool. (Bitstring wasn’t my idea — it was inspired by the bitstring manipulation feature in Erlang).

C is supposed to be a good natural programming language for dealing with bits and bytes, right? The ocaml-bitstring program, which analyzes hive files in far more detail than the C program, is half the size and just as fast.

As an example, here’s how we load the hive file and analyze the first part of the header:

let bits = bitstring_of_file filename (* Split into header + data at the 4KB boundary. *) let header, data = takebits (4096 * 8 ) bits, dropbits (4096 * 8 ) bits let () = bitmatch header with { "regf" : 4*8 : string; seq1 : 4*8 : littleendian; seq2 : 4*8 : littleendian, check (seq1 = seq2); last_modified : 64 : littleendian, bind (nt_to_time_t last_modified); 1_l (* major *) : 4*8 : littleendian; minor : 4*8 : littleendian } -> (* ... *)

The bitmatch statement elegantly matches the file. It rejects the file if the first four bytes aren’t “regf” (the file magic number) or if the major version number is not 1. It then unpacks the following fields, converting from the file’s littleendian ordering to host ordering, converting the NT timestamp into a time_t and so on.

Although not shown there, bitstring will also work just fine on arbitrary bit boundaries, albeit more slowly because the generated code is able to make fewer optimizations.

Even though the Windows hive file format is moronic, I successfully used bitstring to reverse engineer it in about 3 days, with some help from the contradictory and often incorrect public documentation out there.