Generating a Thumb2 Disassembler From the Specification

For me, the fun in writing a disassembler drops off sharply after the first. It becomes just plain tedious. For the thumb2 disassembler, we looked for a more interesting way to construct one, and found that there are benefits beyond reducing boredom: serviceability and accuracy.

Spoiler, what we're doing is generating a disassembler from the ARM specification (spec) itself. This involves some intermediate steps and parsing, which we discuss below.

Step 1: Easily Parsable Spec

The idea is to take the very verbose specification and convert it to easily processed plain text. We avoided parsing the PDF itself due to time constraints and the relative immutibility of the PDF in case corrections will need to be made.

We consider just the instruction encoding section of the spec, specifically the table and instruction entries. A sample portion of a table entry follows:

The text form of this is:

thumb32: extract32 111,op1.2,op2.7,xxxx,op.1 on op1,op2,op 01,00xx0xx,x load_store_multiple 01,00xx1xx,x load_store_dual_exclusive_table_branch 01,01xxxxx,x data_processing_shifted_register 01,1xxxxxx,x coprocessor_instructions

Note that the text is nearly identical to the spec, but tweaked slightly in form to be more easily consumed by the code generator. The thumb32: names the table. The extract32 command declares what constant bits and variable bits the disassembler might find. And finally the patterns like 01,00xx0xx,x tell where the disassembler should go next should the pattern match.

The travel between tables eventually stops at an instruction entry that looks like this:

And the text form is:

sxtab16: Encoding T1 ARMv6T2, ARMv7 fmt SXTAB16<c> <Rd>,<Rn>,<Rm>{,<rotation>} extract32 11111,010,0,010,Rn.4,1111,Rd.4,1,(0),rotate.2,Rm.4 pcode if Rn == '1111' then SEE SXTB16; \ d = UInt(Rd); \ n = UInt(Rn); \ m = UInt(Rm); \ rotation = UInt(rotate:'000'); \ if BadReg(d) || n == 13 || BadReg(m) then UNPREDICTABLE;

Here the sxtab16: names the instruction. The code generator discerns between table entries and instruction entries by seeing if an extract or Encoding statement follows. Note that the pseudocode (pcode) specifies important details that we generate code for too.

Aside: A Graph Perspective

The initial table serves as the root node. The tables to which this root table refers are its branch nodes. Those may branch also, and and so on. The instruction entries are the terminal nodes.

There are almost 300 nodes total, representing the tables and the instructions. Here is a closer zoom where some of the named nodes are readable:

A fun perspective to adopt is that the disassembler is a driver on this road network and the instruction words are the directions. The driver first considers the bits that will direct coarse grain turns, toward a cardinal direction or neighborhood. Later, additional bits are considered, directing fine grain turns to the right street and house. An undefined instruction is simply a bad address.

Step 2: Parsing Pcode, Generating

The pseudocode can contain many things, like certain conditions under which an instruction is undefined or when to consult another table. Here is the psuedocode for sxtab16:

pcode if Rn == '1111' then SEE SXTB16; d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); rotation = UInt(rotate:'000'); if BadReg(d) || n == 13 || BadReg(m) then UNPREDICTABLE;

Where UInt() is defined in the spec and rotate was a sub region of bits within the instruction word.

No matter how many times I make a recursive descent parser, I blow a lot of time looking stuff up and doing LALR conversions and stuff so this time I outsourced to Grako parser generator. A starting portion of our grammer is:

start = statement [';'] $; statement = 'if' expr0 'then' statement | "UNPREDICTABLE" | "UNDEFINED" | "NOT_PERMITTED" | "NOP" | "nop" | "SEE" whatever | tuple '=' expr0 | expr0 '=' expr0; tuple = '(' ('-'|expr0) { ',' ('-'|expr0) }+ ')'; ...

The generated code is nearly a line by line translation, and can use variable names like Rn that were named earlier in the bit regions of the instruction word:

/* pcode: if Rn == '1111' then SEE SXTB16 */ if((res->elements["Rn"]) == (0xF)) { /* wipe state, transfer to sxtb16 */ } /* pcode: d = UInt(Rd) */ res->elements["d"] = (unsigned int)(res->elements["Rd"]); /* pcode: n = UInt(Rn) */ res->elements["n"] = (unsigned int)(res->elements["Rn"]); /* pcode: m = UInt(Rm) */ res->elements["m"] = (unsigned int)(res->elements["Rm"]); /* pcode: rotation = UInt(rotate:'000') */ res->elements["rotation"] = (unsigned int)((res->elements["rotate"]<<3)|(0x0)); /* pcode: if BadReg(d) || n == 13 || BadReg(m) then UNPREDICTABLE */ if(((BadReg(res->elements["d"])) || ((res->elements["n"]) == (13))) || (BadReg(res->elements["m"]))) { res->flags |= FLAG_UNPREDICTABLE; }

This is a deliberately simple example, where the code for wiping state and transferring is omitted and there are redundant storages within the disassembler result res . BadReg() is a c++ function defined in the disassembler like the spec says, and res->flags speaks for itself.

Step 3: Travelling the Graph, Generating Table Code

Referring back to the the 32-bit Thumb instruction encoding table from step 1, the generated code is very straightforward, masking off the named fields and then testing them for values that will lead the disassembler to the next table:

thumb32(struct decomp_request *req, struct decomp_result *res) { int rc = -1; uint32_t instr = ((*(uint16_t *)req->instrStream)<<16) | *(uint16_t *)(req->instrStream + 2); uint32_t op1 = (instr & 0x18000000)>>27; uint32_t op2 = (instr & 0x7F00000)>>20; uint32_t op = (instr & 0x8000)>>15; if(((op1 & 0x3)==0x1) && ((op2 & 0x64)==0x0) && 1) return load_store_multiple(req, res); if(((op1 & 0x3)==0x1) && ((op2 & 0x64)==0x4) && 1) return load_store_dual_exclusive_table_branch(req, res); if(((op1 & 0x3)==0x3) && ((op2 & 0x70)==0x20) && 1) return data_processing_register(req, res); if(((op1 & 0x3)==0x1) && ((op2 & 0x40)==0x40) && 1) return coprocessor_instructions(req, res); ... res->status |= STATUS_UNDEFINED; return 0; }

Step 4: Ending the Trip, Generating Instruction Code

Returning to our sxtab16 example, the extract statement is:

extract32 11111,010,0,010,1111,1111,Rd.4,1,(0),rotate.2,Rm.4

There are many constant bits, followed by a 4-bit field named "Rd", followed by a 1, followed by an expected 0 (instruction is unpredictable otherwise), followed by a 2-bit field named "rotate", followed by a 4 bit field named "Rm". The code is not too surprising:

if(((instr & 0xFFF0F080)==0xFA20F080)) { res->instrSize = 32; if(!((instr & 0x40)==0x0)) { res->flags |= FLAG_UNPREDICTABLE; } if(!(req->arch & ARCH_ARMv6T2) && !(req->arch & ARCH_ARMv7)) { res->status |= STATUS_ARCH_UNSUPPORTED; } res->elements["c"] = COND_AL; res->elements["Rn"] = (instr & 0xF0000)>>16; char Rn_width = 4; res->elements["Rd"] = (instr & 0xF00)>>8; char Rd_width = 4; res->elements["rotate"] = (instr & 0x30)>>4; char rotate_width = 2; res->elements["Rm"] = instr & 0xF; char Rm_width = 4; res->formats.push_back("SXTAB16<c> <Rd>,<Rn>,<Rm>{,<rotation>}"); res->mnem = "SXTAB16";

The outer if masks and checks the constant bits given in the pattern. The next if checks for that expected (0) bit and marks the result unpredictable if not found. If the disassembler request "req" does not name one of the supported architectures, we still give our best effort but mark the disassembly result "res" as unsupported. The Rn , Rd , Rm , and rotate fields are extracted with bit masking and even the spec supplied format for the disassembly is saved. You may notice that each of the variables in the format will be looked up in res->elements when a string is finally produced.

Conclusion

This short tour was written on a POC version that still contained many inefficencies, but is very illustrative of the main ideas. The version released in Binary Ninja runs hundreds of times faster.

On the claim of serviceability, a correction in this disassembler will take place in the text file that captured the important bits of the specification. Fix bits, save, regenerate code. On the claim of accuracy, this disassembler is as accurate as the specification. We're happy to defer to ARM's documents and blame them if something doesn't turn out right :)

A hypothetical debate about the merits of a human written disassembler versus this generated one is fun to ponder. The human surely would have found clever ways to decide between decoding paths, or come up with optimizations like table lookups when the instruction allowed. This generated one, however, has long lines of brute bit masking and testing and repetitive generated code. Certainly the human could have produced a smaller disassembler. But if size is not nearly the concern it was in years past, and if the compiler optimizes well, is it worth boring a human for days or weeks?