Introduction

I have always been fascinated by computers. As a child, I heard that they run "ones and zeros", and was curious as to what that actually meant. Eventually, I learned a bit of x86 assembly language. I think it was from an early version of Randall Hyde's The Art of Assembly Language and the now long-out-of-print Vitaly Maljugin et al., Revolutionary Guide to Assembly Language. For example, this x86 instruction moves the value 42 into the eax register:

movl $42 , %eax

However, that is still a textual, human-readable, representation of the instruction. Where are the ones and zeros that the machine actually runs?

This post explores how to write machine code with 1's and 0's and run it on a computer. It assumes some familiarity with x86 assembly language, C, binary and hexadecimal numbers, how to use a compiler, etc. The examples have been tested on Linux and Mac OS X.

Update: Part 2 shows how to make executable files.

Instruction Representation

Just like all other data in a computer, machine instructions are represented by combining binary numbers. An instruction typically starts with an opcode, which is a number that determines the instruction type. For example, the ret instruction (the variant that doesn't take an operand) has opcode 195, or in binary:

11000011

If an instruction takes operands, those are encoded as binary numbers too. For example, each register has a number:

Register eax ecx edx ebx esp ebp esi edi Number 000 001 010 011 100 101 110 111

Regular numbers, such as 42, are encoded in binary as they are. Note however that x86 is a little-endian system, which means that in a multi-byte number, the least significant ("little end") byte should come first in memory. For example, movl $42, %eax is encoded as:

1011 1 000 00101010 00000000 00000000 00000000 Opcode Width Register 42

The instruction format, as well as everything else about x86, is documented in the Intel manuals. The instruction format is explained in Section 2.1 of Volume 2A, and there is a handy overview of all instruction encodings in Appendix B of Volume 2C.

Running the Code

Given some binary encoded machine code, how do we get the machine to execute it? The two example instructions above are enough to write a very simple function:

movl $42 , %eax ret

This simply puts 42 in the eax register and returns. It would be cool if we could call this function from a C program.

The first question is: how does our C program know to expect the return value to be in eax? The Application Binary Interface (ABI) defines how to call functions and return values, etc. In the case of Linux, this is specified in the System V ABI (gABI) and its processor-specific supplements, which for x86 means System V Application Binary Interface Intel386 Architecture Processor Supplement (psABI). The latter specifies (Section 3-9) that integral or pointer return values appear in eax. (It's the same on Mac.)

Our machine code has to be stored in memory. We cannot simply put it in the stack, heap or static data section of our program: the operating system will typically refuse to execute code in those areas for security reasons. The code has to be placed in a memory page with execution privileges. To allocate memory in such a page, we use mmap(2) and pass the PROT_EXEC flag:

#include <sys/mman.h> /* Allocate size bytes of executable memory. */ unsigned char * alloc_exec_mem ( size_t size ) { void * ptr ; ptr = mmap ( 0 , size , PROT_READ | PROT_WRITE | PROT_EXEC , MAP_PRIVATE | MAP_ANON , - 1 , 0 ); if ( ptr == MAP_FAILED ) { perror ( "mmap" ); exit ( 1 ); } return ptr ; }

To read ones and zeros from the user and store them as bytes in memory, we use this function:

/* Read up to buffer_size bytes, encoded as 1's and 0's, into buffer. */ void read_ones_and_zeros ( unsigned char * buffer , size_t buffer_size ) { unsigned char byte = 0 ; int bit_index = 0 ; int c ; while (( c = getchar ()) != EOF ) { if ( isspace ( c )) { continue ; } else if ( c != '0' && c != '1' ) { fprintf ( stderr , "error: expected 1 or 0!

" ); exit ( 1 ); } byte = ( byte << 1 ) | ( c == '1' ); bit_index ++ ; if ( bit_index == 8 ) { if ( buffer_size == 0 ) { fprintf ( stderr , "error: buffer full!

" ); exit ( 1 ); } * buffer ++ = byte ; -- buffer_size ; byte = 0 ; bit_index = 0 ; } } if ( bit_index != 0 ) { fprintf ( stderr , "error: left-over bits!

" ); exit ( 1 ); } }

And here is main to tie it together:

int main () { typedef int ( * func_ptr_t )( void ); func_ptr_t func ; unsigned char * mem ; int x ; mem = alloc_exec_mem ( 1024 ); func = ( func_ptr_t ) mem ; read_ones_and_zeros ( mem , 1024 ); x = ( * func )(); printf ( "function returned %d

" , x ); return 0 ; }

Note how the program takes the pointer to the executable memory (mem), casts that to a pointer to a function (func) and then calls it.

Compiling and running the program (the code can be downloaded as ones-and-zeros_42.c), with the ones and zeros for the movl and ret instructions copied from above yields the expected result: (input is terminated by ctrl-d)

$ gcc -m32 ones-and-zeros_42.c $ ./a.out 1011100000101010000000000000000000000000 11000011 function returned 42

Look! We just ran some 1's and 0's on our computer :-)

Note: To be able to compile a 32-bit program on a 64-bit machine, we use the -m32 flag, which tells GCC to compile in 32-bit mode. This is not necessary if you are using a 32-bit machine. If you are on a Debian or Ubuntu system, you may need to sudo apt-get install gcc-multilib to make this work.

Exercise: The code that reads 1's and 0's into executable memory is essentialy a very primitive loader. Modify the loader to work with hexadecimal numbers instead, turning it into a hex loader; it might come in handy.

Exercise: This first example actually works in 64-bit mode too, but the following examples will not. Why?

Fibonacci Numbers

Let's try a slightly larger example. The C function below computes the n'th Fibonacci number:

int fib ( int n ) { int x = 0 ; int y = 1 ; int z ; while ( n -- ) { z = x + y ; x = y ; y = z ; } return x ; }

It can be implemented in assembly like this:

movl 4 ( %esp ), %ecx xorl %eax , %eax movl $1 , %edx testl %ecx , %ecx jz end loop: xchgl %eax , %edx addl %eax , %edx loop loop end: ret

There are a number of subtleties in that code:

The ABI specifies that function arguments are passed on the stack, with the first one at offset 4 from esp , so thats where we load n from.

from , so thats where we load from. We choose to store n in the ecx register, because that allows us to use the convenient loop instruction, which decrements ecx and jumps if it's non-zero.

in the register, because that allows us to use the convenient instruction, which decrements and jumps if it's non-zero. We avoid the use of ebx , since the ABI specifies that this is a callee-saved register, i.e. the original value has to be preserved by the function (psABI 3-11).

, since the ABI specifies that this is a callee-saved register, i.e. the original value has to be preserved by the function (psABI 3-11). The result is always kept in eax so that it's easy to return.

so that it's easy to return. Clearing a register with xor is a classic x86 idiom. It encodes efficiently (two bytes as opposed to five bytes for movl ) and is fast to execute. (See the optimization manual, Section 3.5.1.8)

The instructions are encoded like this:

movl 4(%esp), %ecx 1000 101 1 01 001 100 00 100 100 00000100 4 bytes Opcode Width Mod Reg (ecx) R/M SS Index Base (esp) Disp (4)

(The x86 memory operand encoding is complicated. See Section 2.1.5 in Volume 2A of the Intel manual.)

xorl %eax, %eax 0011 000 1 11 000 000 2 bytes Opcode Width Opcode ctd. Reg1 (eax) Reg2 (eax)

movl $1, %edx 1011 1 010 00000001 00000000 00000000 00000000 5 bytes Opcode Width Register (edx) Immediate (1)

testl %ecx, %ecx 1000 010 1 11 001 001 2 bytes Opcode Width Opcode ctd. Reg1 (ecx) Reg2 (ecx)

jz end 0111 0100 00000101 2 bytes Opcode Conditional test (z) 8-bit offset (5)

(The offset 5 is the number of bytes we need to jump past to arrive at the ret instruction, i.e. the sum of the sizes for xchg, add, and loop below.)

xchgl %eax, %edx 1001 0 010 1 byte Opcode Reg2 (edx)

(Note that xchg has a specific opcode for exchanging eax with another register.)

addl %eax, %edx 0000 000 1 11 000 010 2 bytes Opcode Width Opcode ctd. Reg1 (eax) Reg2 (edx)

loop 1110 0010 11111011 2 bytes Opcode 8-bit offset (-5)

ret 1100 0011 1 byte Opcode

We can modify the main function from the program above and try this out: (download ones-and-zeros_fib.c)

int main () { typedef int ( * func_ptr_t )( int ); func_ptr_t func ; unsigned char * mem ; int i , x ; mem = alloc_exec_mem ( 1024 ); func = ( func_ptr_t ) mem ; read_ones_and_zeros ( mem , 1024 ); for ( i = 0 ; i < 10 ; i ++ ) { x = ( * func )( i ); printf ( "fib(%d) = %d

" , i , x ); } return 0 ; }

$ gcc -m32 ones-and-zeros_fib.c $ ./a.out 10001011010011000010010000000100 0011000111000000 1011101000000001000000000000000000000000 1000010111001001 0111010000000101 10010010 0000000111000010 1110001011111011 11000011 fib(0) = 0 fib(1) = 1 fib(2) = 1 fib(3) = 2 fib(4) = 3 fib(5) = 5 fib(6) = 8 fib(7) = 13 fib(8) = 21 fib(9) = 34

Exercise: The fibonacci program is 21 bytes long. Can you make it shorter? Faster?

A Simple Assembler

Writing instructions using 1's and 0's by hand is an interesting exercise and very exciting when it works, but also extremely impractical. It would be much easier if a program could do it for us. Such a program is called an assembler, and here is a simple variant that can be used to assemble our fibonacci example:

typedef struct { unsigned char code ; } reg_t ; static const reg_t EAX = { 0x0 }; static const reg_t ECX = { 0x1 }; static const reg_t EDX = { 0x2 }; static const reg_t EBX = { 0x3 }; static const reg_t ESP = { 0x4 }; static const reg_t EBP = { 0x5 }; static const reg_t ESI = { 0x6 }; static const reg_t EDI = { 0x7 };

(Wrapping the register number in a struct like this gives us some type safety. For example, it makes it impossible to conflate an integer and a register argument when calling a function. This technique is used in V8, for example.)

typedef struct asm_buf_t asm_buf_t ; struct asm_buf_t { unsigned char * data ; ssize_t size ; ssize_t capacity ; }; void emit_byte ( asm_buf_t * buffer , int32_t byte ) { assert (( uint32_t ) byte <= UINT8_MAX ); assert ( buffer -> size < buffer -> capacity ); buffer -> data [ buffer -> size ++ ] = ( unsigned char ) byte ; } void emit_int32 ( asm_buf_t * buffer , int32_t value ) { /* Emit value in little-endian order. */ emit_byte ( buffer , ( value >> 0 ) & 0xff ); emit_byte ( buffer , ( value >> 8 ) & 0xff ); emit_byte ( buffer , ( value >> 16 ) & 0xff ); emit_byte ( buffer , ( value >> 24 ) & 0xff ); } void asm_ret ( asm_buf_t * buffer ) { /* 1100 0011 */ emit_byte ( buffer , 0xC3 ); } void asm_xor ( asm_buf_t * buffer , reg_t src , reg_t dst ) { /* 0011 000 | w=1 | 11 | src(3) | dst(3) */ emit_byte ( buffer , 0x31 ); emit_byte ( buffer , ( 0x3 << 6 ) | ( src . code << 3 ) | dst . code ); } void asm_mov_imm32 ( asm_buf_t * buffer , reg_t dst , int32_t value ) { if ( value == 0 ) { asm_xor ( buffer , dst , dst ); return ; } /* 1011 | w=1 | dst(3) | imm32 */ emit_byte ( buffer , 0xB8 | dst . code ); emit_int32 ( buffer , value ); } void asm_mov_mem_reg ( asm_buf_t * buffer , reg_t src , int32_t disp , reg_t dst ) { unsigned char mod , reg , rm ; /* 1000 101 | w=1 | mod(2) | reg(3) | rm(3) | [sib(8)] | [disp(8/32)] */ emit_byte ( buffer , 0x8B ); if ( disp == 0 ) { mod = 0x0 ; } else if ( disp <= INT8_MAX ) { mod = 0x1 ; } else { mod = 0x2 ; } rm = src . code ; reg = dst . code ; emit_byte ( buffer , ( mod << 6 ) | ( reg << 3 ) | rm ); if ( src . code == ESP . code ) { /* Emit SIB (Scaled Index Byte). */ /* ss=00 | index=100 | base=esp(3) */ emit_byte ( buffer , ( 0x0 << 6 ) | ( 0x4 << 3 ) | ESP . code ); } if ( mod == 0x1 ) { emit_byte ( buffer , ( unsigned char ) disp ); } else if ( mod == 0x2 ) { emit_int32 ( buffer , disp ); } } void asm_test ( asm_buf_t * buffer , reg_t reg1 , reg_t reg2 ) { /* 1000 010 | w=1 | 11 | reg1(3) | reg2(3) */ emit_byte ( buffer , 0x85 ); emit_byte ( buffer , ( 0x3 << 6 ) | ( reg1 . code << 3 ) | reg2 . code ); } void asm_xchg ( asm_buf_t * buffer , reg_t reg1 , reg_t reg2 ) { if ( reg1 . code == EAX . code ) { /* 1001 | 0 | reg2 */ emit_byte ( buffer , ( 0x9 << 4 ) | ( 0x0 << 3 ) | reg2 . code ); return ; } if ( reg2 . code == EAX . code ) { asm_xchg ( buffer , reg2 , reg1 ); return ; } /* 1000 011 | w=1 | 11 | reg1(3) | reg2(3) */ emit_byte ( buffer , 0x87 ); emit_byte ( buffer , ( 0x3 << 6 ) | ( reg1 . code << 3 ) | reg2 . code ); } void asm_add ( asm_buf_t * buffer , reg_t src , reg_t dst ) { /* 0000 000 | w=1 | 11 | src(3) | dst(3) */ emit_byte ( buffer , 0x01 ); emit_byte ( buffer , ( 0x3 << 6 ) | ( src . code << 3 ) | dst . code ); } typedef struct label_t label_t ; struct label_t { ssize_t target_addr ; ssize_t instr_addr ; bool has_target ; bool has_instr ; }; void asm_loop ( asm_buf_t * buffer , label_t * label ) { /* 1110 0010 | displacement(8) */ ssize_t my_addr = buffer -> size ; ssize_t disp = 0 ; if ( label -> has_target ) { disp = label -> target_addr - ( my_addr + 2 ); } else { assert ( ! label -> has_instr && "Label already used!" ); label -> instr_addr = my_addr ; label -> has_instr = true ; } assert ( disp >= INT8_MIN && disp <= INT8_MAX ); emit_byte ( buffer , 0xe2 ); emit_byte ( buffer , ( unsigned char ) disp ); } typedef struct { unsigned char code ; } cc_t ; static const cc_t CC_E , CC_Z = { 0x4 }; void asm_jcc ( asm_buf_t * buffer , cc_t cc , label_t * label ) { /* 0000 1111 1000 | cc(4) | disp(32) */ ssize_t my_addr = buffer -> size ; ssize_t disp = 0 ; if ( label -> has_target ) { disp = label -> target_addr - ( my_addr + 6 ); } else { assert ( ! label -> has_instr && "Label already used!" ); label -> instr_addr = my_addr ; label -> has_instr = true ; } assert ( disp >= INT32_MIN && disp <= INT32_MAX ); emit_byte ( buffer , 0x0f ); emit_byte ( buffer , ( 0x8 << 4 ) | cc . code ); emit_int32 ( buffer , ( int32_t ) disp ); }

(As seen in the hand-translated example above, there is a two-byte version of jcc that can be used with 8-bit displacements. However, since the assembler doesn't know beforehand how long the jump will be, it goes for the safe option and uses the 32-bit displacement variant. A fancier assembler should of course pick the most efficient encoding automatically, a process called relaxation.)

/* Bind label to the current address and update any instruction that uses it. */ void bind_label ( asm_buf_t * buffer , label_t * label ) { ssize_t disp ; ssize_t orig_buf_size ; ssize_t addr ; assert ( ! label -> has_target && "Label already bound!" ); addr = buffer -> size ; label -> target_addr = addr ; label -> has_target = true ; if ( label -> has_instr ) { orig_buf_size = buffer -> size ; /* Update the jump instruction with the displacement. */ switch ( buffer -> data [ label -> instr_addr ]) { case 0xe2 : /* loop */ disp = addr - ( label -> instr_addr + 2 ); assert ( disp >= INT8_MIN && disp <= INT8_MAX ); buffer -> size = label -> instr_addr + 1 ; emit_byte ( buffer , ( unsigned char ) disp ); break ; case 0x0f : /* jcc */ disp = addr - ( label -> instr_addr + 6 ); assert ( disp >= INT32_MIN && disp <= INT32_MAX ); buffer -> size = label -> instr_addr + 2 ; emit_int32 ( buffer , ( int32_t ) disp ); break ; default : assert ( 0 && "Binding label to unknown jump." ); } buffer -> size = orig_buf_size ; } }

And now, let's use this assembler to generate our fibonacci function:

void generate_fib_function ( asm_buf_t * buf ) { label_t end = { 0 , 0 , 0 , 0 }; label_t loop = { 0 , 0 , 0 , 0 }; asm_mov_mem_reg ( buf , ESP , 4 , ECX ); asm_mov_imm32 ( buf , EAX , 0 ); asm_mov_imm32 ( buf , EDX , 1 ); asm_test ( buf , ECX , ECX ); asm_jcc ( buf , CC_Z , & end ); bind_label ( buf , & loop ); asm_xchg ( buf , EAX , EDX ); asm_add ( buf , EAX , EDX ); asm_loop ( buf , & loop ); bind_label ( buf , & end ); asm_ret ( buf ); } int main () { typedef int ( * func_ptr_t )( int ); asm_buf_t buf ; int i , x ; func_ptr_t func ; buf . data = alloc_exec_mem ( 1024 ); buf . size = 0 ; buf . capacity = 1024 ; generate_fib_function ( & buf ); func = ( func_ptr_t ) buf . data ; printf ( "Code: " ); for ( i = 0 ; i < buf . size ; i ++ ) { printf ( "%02x" , buf . data [ i ]); } printf ( "

" ); for ( i = 0 ; i < 10 ; i ++ ) { x = ( * func )( i ); printf ( "fib(%d) = %d

" , i , x ); } return 0 ; }

Verifying that it works: (source available as ones-and-zeros_fib_asm.c)

$ gcc -m32 ones-and-zeros_fib_asm.c $ ./a.out Code: 8b4c240431c0ba0100000085c90f84050000009201c2e2fbc3 fib(0) = 0 fib(1) = 1 fib(2) = 1 fib(3) = 2 fib(4) = 3 fib(5) = 5 fib(6) = 8 fib(7) = 13 fib(8) = 21 fib(9) = 34

To verify that the assembler is doing the right thing, we can convert the hex string above to binary and disassemble it, for example using xxd(1) and objdump(1):

$ echo 8b4c240431c0ba0100000085c90f84050000009201c2e2fbc3 | xxd -p -r > /tmp/obj $ objdump -D -b binary -m i386 /tmp/obj /tmp/obj: file format binary Disassembly of section .data: 00000000 <.data>: 0: 8b 4c 24 04 mov 0x4(%esp),%ecx 4: 31 c0 xor %eax,%eax 6: ba 01 00 00 00 mov $0x1,%edx b: 85 c9 test %ecx,%ecx d: 0f 84 05 00 00 00 je 0x18 13: 92 xchg %eax,%edx 14: 01 c2 add %eax,%edx 16: e2 fb loop 0x13 18: c3 ret

Further Reading

Usborne Introduction to Machine Code for Beginners is a wonderful children's book about machine code from 1983.