The Parser section

The parser takes a line of assembly and generates an abstract representation derived from it. Let’s see how it works with an example:

0x00 Input

Let’s say we get the following input:

" mov rax , rbx ; loads 0x59 into rax "

0x01 Clean

We need to clean this input to get just the sections we need and nothing else to ease the analysis. We cut out things like comments, extra spaces and the like.

"mov rax,rbx"

0x02 Lexical analysis

In this stage, we’ll split the line into parts using delimiters like the comma and space to determine the meaning of each one of the pieces. In this case, we’ll always have an operand and 0, 1 or 2 values that can be registers or immediate values. To store the syntax I used this Go struct (comments indicate the value for the example we are working through):

type lex struct {

Operand string // "mov"

Values []string // ["rax", "rbx"]

AbstractRepresentation string // "mov $1 $2"

OriginalString string // "mov rax,rbx"

}

The Morph section

This section takes the responsibility of, given an abstract representation, returning an assembly line equivalent to the original. In order to do that, it makes use of a map of abstract representations and equivalences that looks like this:

map[string][]equivalence{

"xor $1 $1": []equivalence{

equivalence{

"mov $1, -1",

"inc $1",

},

equivalence{

"sub $1,$1",

},

... some more equivalences for xor $1 $1

},

"mov $1 $2": []equivalence{

equivalence{

"push $2",

"pop $1",

},

equivalence{

"xchg $1, $2",

"push $1",

"pop $2",

},

... some more equivalences for mov $1 $2

},

.. some more abstrac representations for common instructions

}

If a match is found in the map a random equivalence is taken and a new assembly line is returned.