Let's Learn x86-64 Assembly! Part 1 - Metaprogramming in Flat Assembler

This post is a part of a series on x86-64 assembly programming that I'm writing. Check out part 0.

In the previous part we've covered a lot of ground in a short time - from general introduction and description of how registers and memory work, to actually calling Windows API functions. In this one, I want to introduce some important features of Flat Assembler - macros, assembly-time variables and conditional assembly and a couple others. We'll use them to make calling Windows API functions a bit less troublesome, and write our first "Hello, World!" app.

Why Macros?

Coming from C++ you might be having horrific flashbacks whenever you hear the words "macro" or "preprocessor". Indeed, the wise guidance is to avoid using the preprocessor as long as there's a way to achieve what you're trying to do with just language itself.

The situation is quite different in assembly land. There is not much of a "language" to speak of, and there are lots of often re-occuring patterns (we've seen one example - calling conventions). Programming in assembly can get quite tedious. One might say this is why high-level languages were invented after all, which is fair enough. But, if you're going to be writing assembly manually for whatever reason, might as well give yourself tools to create some semblance of high-level constructs. That's why most assemblers have evolved quite elaborate macro systems and even metaprogramming facilities, which are way more powerful than the C++ preprocessor, as we will see.

The FASM Macro System

FASM allows us to define our own custom instructions (which the manual refers to as "macroinstructions"). In this text, we will cover the essentials of what we'll need for our purposes. As always, read the docs if you're interested in details.

A new macroinstruction is defined using the macro directive. Let's experiment with it. Open up FASMW.EXE, and paste in the following code:

use64 macro add5 target { add target, 5 } add5 rax

The use64 directive at the top tells FASM to allow using 64-bit instructions and registers. The macro directive is followed by the name of the macro, which is followed by a list of parameters, and the macro's body included within curly braces. In our case, the macro's name is add5 , it has just a single parameter - the thing to add 5 to. The last line is just invoking the macro, it will be replaced by add rax, 5 during preprocessing. If you press Ctrl+F9 at this time, FASM will assemble the given code and even produce a binary file. However, the produced binary can't actually be run by your OS - it's not in an executable format like PE or ELF. In fact, FASM doesn't require the output to be in any particular format - by default it emits just a stream of bytes, in this case - raw machine code, the binary representation of the assembly instructions from the input file.

Let's enhance our macro a bit. By default, FASM lets you skip some arguments when invoking a macro. Putting an asterisk next to the corresponding parameter name in the macro definition marks it as required:

use64 macro add5 target* { add target, 5 } add5 rax

If you now try invoking add5 without any arguments, it will trigger an error "invalid macro arguments" (if you tried that without marking the parameter as required, there would still be an error, resulting from trying to parse the line add , 5 )

In C, you may define macros that have an arbitrary number of arguments (they're called variadic macros). The FASM preprocessor provides a powerful way to deal with such macros. Let's see how to define a macro with an arbitrary number of arguments.

macro foo a, [b, c] { common ; we'll explain what "common" means later db a, b db a, c } foo 0, 1, 2, 3, 4

This is equivalent to:

db 0, 1, 3 db 0, 2, 4

The "variadic" part of the parameters is enclosed into square brackets at the end of the macro parameter list. The square brackets signify that this group of parameters can be repeated an arbitrary number of times. In the case above, a corresponds to the very first argument while b and c correspond to every odd and even arguments following that, in other words: the portion in the square brackets is repeatedly "matched" with the rest of the argument list. These bracketed arguments are referred to as group arguments, or groupargs in FASM lingo.

Now, let's turn our attention to the mysterious common directive.

Every FASM macro is a sequence of code blocks, where each block has one of the following types: common , forward and reverse . By default, when you start a macro, and it doesn't have any group arguments, it's considered to be in the common mode. On the other hand, macros with group arguments start out in the forward mode. We can end the current block and start a new one by invoking common , forward or reverse directive within the macro body.

The types of blocks differ in how they handle references to groupargs.

When the name of a grouparg is encountered within a common block, it is replaced by all arguments that correspond to it. Therefore, something like this:

macro WinStr labelName*, [args*] { common labelName db args, 0x0D, 0x0A } WinStr foo 1, 2, 3

Would be replaced with:

foo db 1, 2, 3, 0x0D, 0x0A

forward blocks work a bit differently. A forward block is repeated N/M times, where N is the total number of arguments (not counting those that could be matched with non-bracketed parameters), and M is the number of parameters within the square brackets. For each repeated instance, the references to the grouparg within the block are replaced with the value of the corresponding argument.

Thus, given something like this:

macro foo a, [b, c] { forward db b, a, c } foo 0, 1, 2, 3, 4

The preprocessor would emit:

; 0 is assigned to a. ; 1 and 2 are assigned to b and c respectively for the first instance of the `forward' block. ; 3 and 4 are assigned to b and c respectively for the second instance of the `forward' block. db 1, 0, 2 db 3, 0, 4

The reverse blocks work exactly the same way as forward , except the order of arguments is, as the name suggests, reversed.

Now we can write a macro that sums up a bunch of values and puts the result into a register:

use64 macro accumulate target*, [term*] { common xor target, target forward add target, term } accumulate rax, 1, 2, 4

It sets the target to 0 (by xor-ing it with itself), then generates a corresponding add instruction for each of the values passed in as subsequent arguments.

Metaprogramming

Assembly-Time Variables

When you write something like foo = 3 in FASM, it introduces a new assembly-time variable. Despite the fact that the official FASM manual refers to them as "constants" sometimes (apparently, for historical reasons), they're actually variables: you can change them. Check this out:

arg_count = 0 macro count_args [arg] { arg_count = arg_count + 1 } count_args a, b, c, d, e, f, g db arg_count

If you assemble this file, and open the resulting binary, you will see that it consists of a single byte with the value 7!

Conditional Assembly and Loops

Assembly-time variables can be used in conditions for assembly-time if-else constructs. Those allow to process or skip a certain block of code depending on a condition. For example, we can add an upper limit to the number of arguments passed to the previous macro:

arg_count = 0 macro count_args [arg] { if arg_count > 9 display "Too many arguments!" err end if arg_count = arg_count + 1 } count_args a, b, c, d, e, f, g db arg_count

The example above introduces a couple of useful directives - display shows a log message during assembly, and err causes the assembler to abort with an error.

It doesn't stop with conditions: you can also write assembly-time loops!

; shows bits of a value in right-to-left order macro showbits_rtl arg { val = arg while val > 0 display (val and 1) + '0' val = val shr 1 end while } showbits_rtl 13

Comparing Expression Types

FASM has a built-in eqtype operator, which takes two assembly expressions and produces a "true" value if both expressions yield the same type and "false" otherwise. We can use it in assembly-time conditions, of course. Let's modify our accumulation macro from earlier to enforce the target to always be a register:

use64 macro accumulate target*, [term*] { if ~ (target eqtype rax) display "accumulate: target must be a register (was ", `target, ")" err end if common xor target, target forward add target, term } accumulate rcx, 1, 2, 4

Note that rcx eqtype rax yields true since both expressions have the same type. The backtick operator we're using in the error report turns the macro argument into a string literal.

The match Directive

FASM allows you to introduce a certain amount of new syntax using a special match directive, which causes a block of code to be processed when an input sequence matches a certain pattern.

match a .. b, 5 .. 10 { i = a while i <= b db i i = i + 1 end while }

We can wrap it into a macro that allows us to define a sequential range of bytes:

macro byterange rng { ; note that FASM requires us to escape curly braces within the macro body match a .. b, rng \{ i = a while i <= b db i i = i + 1 end while \} } byterange 0 .. 7

Forward-Referencing Assembly-Time Variables

The final thing I want to mention is a surprising and counter-intuitive property of assembly-time variables. FASM allows referencing assembly-time variables before they are defined (it's called forward-referencing). The following code is completely OK and will emit a binary consisting of a single byte with a value of 2:

db foo foo = 2

Now this may seem straightforward, if a bit odd, but things can get much weirder:

db x, y x = 6 - y y = 2 * x

Yes, this code will successfully assemble and produce a binary with two bytes: 2 and 4! Fascinatingly, FASM can work out the values of x and y that satisfy both their definitions. This is actually by design.

The roots of this peculiar feature, as far as I can tell, lie in the fact that FASM is very obsessive about generating the smallest possible form of machine code.

On x86 there might be different possible sequences of bytes corresponding to the same human-readable assembly instruction. In particular, jmp labelname can be represented differently, depending on the address of the target label. Furthermore, since jumps in assembly can be not only backward, but also forward, we have to be able to reference labels that haven't been defined yet. We wouldn't know which form of jump is optimal when we encounter a forward jump: at that time, the target address still hasn't been determined! So what FASM does is assume that the shortest possible form can be emitted for everything on the first pass. If that doesn't work, a second pass is performed (fixing certain "short" instructions and correcting addresses), and then another one etc., until either everything works or FASM decides to give up. This multipass resolving applies not only to addresses of labels, but to the values of assembly-time variables as well (in fact, I'm pretty sure FASM treats labels and assembly-time variables the same way). Whatever algorithm is used for predicting the values of variables is "smart" enough to solve systems of simple linear equations (I haven't tried more complicated ones though).

Simplifying Function Calls

Armed with these new tools, we can make importing and calling Windows API functions a lot less cumbersome. Actually, FASM provides a handy library of macros that do exactly that, but I promised to be going as "from scratch" as possible, so no libraries for us :)

Generating the PE Import Tables

Let's start by writing a couple macros that will simplify writing the import section of our PE files. This one for the Import Directory Table:

macro import_directory_table [lib] { ; for each lib, define an IDT entry forward ; note that IAT and ILT are the same. ; IAT is defined using the import_functions macro. dd rva IAT__#lib dd 0 dd 0 dd rva NAME__#lib ; ptr into the library name table. dd rva IAT__#lib ; terminate IDT with an all-zero entry. common dd 5 dup(0) ; table of library name strings. forward NAME__#lib db `lib, ".DLL", 0

Pretty straightforward. Note the # does token-pasting, similar to the C preprocessor (i.e. pastes the value of the argument directly into the body of the macro).

Next, a macro for generating the import address and hint/name tables for imported functions:

macro import_functions libname, [funcnames] { ; define the hint/name table forward ; ensure entries are aligned on even address. if $ & 1 db 0 end if IMPORTNAME__#funcnames dw 0 db `funcnames, 0 ; IAT definition common IAT__#libname: ; each entry is a ptr into the previously defined hint/name table. ; entries shall be overwritten by actual function addresses at runtime. forward funcnames dq rva IMPORTNAME__#funcnames ; terminate the IAT with a null entry. common dq 0 }

With these two macros, we can rewrite our PE import section:

section '.idata' import readable writeable ; generate the import directory table, include KERNEL32.DLL and USER32.DLL import_directory_table KERNEL32, USER32 ; Functions to import from KERNEL32 import_functions KERNEL32, ExitProcess ; Functions to import from USER32 import_functions USER32, MessageBoxA

Implementing the 64-bit Windows Calling Convention

In the previous part of this series, we discussed the Windows 64-bit calling convention. It has quite a few rules, and ensuring all of them are met before a function call is quite cumbersome. Let's write a reusable macro that will do the job for us.

Note that we won't be implementing the calling convention fully (for example, I won't bother with supporting floating point arguments), but just enough for the purposes of these tutorials. Basically, we'll just make sure the stack is properly aligned and has enough space for all the arguments, and we'll place the arguments into the appropriate registers or places on the stack.

We'll start with a small helper macro that moves its first argument into the register identified by the second argument, unless both arguments refer to the same register.

macro call64_putreg param*, reg* { if ~ (reg eqtype rax) display "target must be a register" err end if if ~ param eq reg mov reg, param end if }

And here's the main macro that implements the calling convention. See the comments inline for an explanation of how it works:

; (Partial) implementation of the Win64 calling convention macro call64 fn*, [arg] { common ; The `local' directive declares the following names ; a "local" to the macro - this is done so that each ; macro invocation gets its very own instance of those variables. local nargs, arg_idx, stack_space ; nargs is the number of arguments passed to the function. ; note that below we're simply forward-referencing nargs and relying ; on fasm to infer the actual value (see the section above on forward-referencing). ; align the stack on 16-byte boundary, and reserve space for arguments. ; we make the assumption that at the time of macro invocation, the stack ; is "16+8"-aligned (due to the return address having been pushed onto it ; by the current function's caller). if nargs <= 4 ; even when the number of arguments is less than 4, ; the calling convention mandates that we reserve the ; so-called "shadow space" on the stack for 4 parameters. stack_space = 5 * 8 ; subtracting 40 from rsp will make it 16-byte aligned else if nargs & 1 ; if we have an odd number of arguments, reserve just enough space for them, ; and the stack will become 16-byte aligned: ; rsp_old = 16 * K - 8 ; rsp_new = 16 * K - 8 - 8 * (2 * Q + 1) = 16 * K - 16 * Q - 16 = 16 * (K - Q - 1) stack_space = nargs * 8 else ; if we have an even number of arguments, we need 8 more bytes of padding to ; achieve alignment. stack_space = (nargs + 1) * 8 end if if stack_space ; allocate space on the stack. sub rsp, stack_space end if arg_idx = 0 forward match ,arg \{ ; this matches an empty argument list. ; unfortunately, when no variadic arugments are provided at the macro ; invocation site, the forward blocks are still processed once (with all groupargs empty). \} match any,arg \{ ; pass the first 4 arguments in registers and the rest on the stack. arg_idx = arg_idx + 1 if arg_idx = 1 call64_putreg arg, rcx else if arg_idx = 2 call64_putreg arg, rdx else if arg_idx = 3 call64_putreg arg, r8 else if arg_idx = 4 call64_putreg arg, r9 else mov qword [rsp + (arg_idx-1)*8], arg end if \} common ; set value of the nargs assembly-time variable (and fasm will magically know to use ; this value up above...) nargs = arg_idx ; perform the call call fn ; clean up the stack as required by the calling convention if stack_space add rsp, stack_space end if }

Let's see that Hello World!

Finally, we can use our new macro to call WinAPI functions in a more convenient way. By now, you have all the pieces to build it yourself, but for completeness sake, here is a full listing of a program that calls MessageBox and then exits:

format PE64 NX GUI 6.0 entry start macro import_directory_table [lib] { forward dd rva IAT__#lib dd 0 dd 0 dd rva NAME__#lib dd rva IAT__#lib common dd 5 dup(0) forward NAME__#lib db `lib, ".DLL", 0 } macro import_functions libname, [funcnames] { forward if $ & 1 db 0 end if IMPORTNAME__#funcnames dw 0 db `funcnames, 0 common IAT__#libname: forward funcnames dq rva IMPORTNAME__#funcnames common dq 0 } macro call64_putreg param*, reg* { if ~ (reg eqtype rax) display "target must be a register" err end if if ~ param eq reg mov reg, param end if } macro call64 fn*, [arg] { common local nargs, arg_idx, stack_space if nargs <= 4 stack_space = 5 * 8 else if nargs & 1 stack_space = nargs * 8 else stack_space = (nargs + 1) * 8 end if if stack_space sub rsp, stack_space end if arg_idx = 0 forward match ,arg \{ \} match any,arg \{ arg_idx = arg_idx + 1 if arg_idx = 1 call64_putreg arg, rcx else if arg_idx = 2 call64_putreg arg, rdx else if arg_idx = 3 call64_putreg arg, r8 else if arg_idx = 4 call64_putreg arg, r9 else mov qword [rsp + (arg_idx-1)*8], arg end if \} common nargs = arg_idx call fn if stack_space add rsp, stack_space end if } section '.text' code readable executable HelloStr db "Hello, World!", 0 start: call64 [MessageBoxA], 0, HelloStr, HelloStr, 0 call64 [ExitProcess], 0 section '.idata' import readable writeable import_directory_table KERNEL32, USER32 import_functions KERNEL32, ExitProcess import_functions USER32, MessageBoxA

That's it for this part. Hopefully it wasn't too boring - personally I find the metaprogramming tools provided by FASM quite impressive compared to some other languages. We'll be using much of this stuff in subsequent parts.

Like this post? Follow this blog on Twitter for more!