BOLO: Reverse Engineering — Part 2 (Advanced Programming Concepts)

Preface

Throughout this article we will be breaking down the following programming concepts and analyzing the decompiled assembly versions of each instruction:

Arrays Pointers Dynamic Memory Allocation Socket Programming (Network Programming) Threading

For the Part 1 of the BOLO: Reverse Engineering series, please click here.

Please note: While this article uses IDA Pro to disassemble the compiled code, many of the features of IDA Pro (i.e. graphing, pseudocode translation, etc.) can be found in plugins and builds for other free disassemblers such as radare2. Furthermore, while preparing for this article I took the liberty of changing some variable names in the disassembled code from IDA presets like “v20” to what they correspond to in the C code. This was done to make each portion easier to understand. Finally, please note that this C code was compiled into a 64 bit executable and disassembled with IDA Pro’s 64 bit version. This can be especially seen when calculating array sizes, as the 32 bit registers (i.e. eax) are often doubled in size and transformed into 64 bit registers (i.e rax).

Ok, Let’s begin!

While Part 1 broke down and described basic programming concepts like loops and IF statements, this article is meant to explain more advanced topics that you would have to decipher when reverse engineering.

Arrays

Let’s begin with Arrays, First, let’s take a look at the code as a whole:

Basic Arrays — Code

Now, let’s take a look at the decompiled assembly as a whole:

Basic Arrays — Decompiled assembly overview

As you can see, the 12 lines of code turned into quite a large block of code. But don’t be intimidated! Remember, all we’re doing here is setting up arrays!

Let’s break it down bit by bit:

Declaring an array with a literal — code

Declaring an array with a literal — disassembled

When initializing an array with an integer literal, the compiler simply initializes the length through a local variable.

EDIT: The above photo labeled “Declaring an array with a literal — disassembled” is actually labeled incorrectly. While yes, when initializing an array with an integer literal the compiler does first initialize the length through a local variable, the above screenshot is actually the initialization of a stack canary. Stack Canaries are used to detect overflow attacks that may, if unmitigated, lead to execution of malicious code. During compilation the compiler allocated enough space for the only litArray element that would be used, litArray[0] (see photo below labeled “local variables — Arrays” — as you can see, the only litArray element that was allocated for is litArray[0]). Compiler optimization can significantly enhance the speed of applications.

Sorry for the confusion!

local variables — Arrays

Declaring an array with a variable — code

Declaring an array with a variable — assembly

First, the array length is saved to a local variable (ArraySize), then the maximum and minimum index values, as well as the overall length of the array are calculated and used to then calculate the base location of the array in memory.

declaring an array with pre-defined objects — code

declaring an array with pre-defined objects — assembly

When declaring an array with pre-defined index definitions the compiler simply saves each pre-defined object into its own variable which represents the index within the array (i.e. objArray4 = objArray[4])

initializing an array index — code

initializing an array index — assembly

Much like declaring an array with pre-defined index definitions, when initializing (or setting) an index in an array, the compiler creates a new variable for said index.

retrieving an item from an array — code

retrieving an item from an array — assembly

When retrieving items from arrays, the item is taken from the index within the array and set to the desired variable.

creating a matrix with variables — code

Creating a matrix with variables — assembly

When creating a matrix, first the row and column sizes are set to their row and col variables. Next, the maximum and minimum indexes for the rows and columns are calculated and used to calculate the base location / overall size of the matrix in memory.

inputting to a matrix — code

inputting to a matrix — assembly

When inputting into a matrix, first the location of desired index is calculated using the matrix’s base location. Next, the contents of said index location is set to the desired input (i.e. 1337).

Retrieving from a matrix — code

Retrieving from a matrix — assembly

When retrieving from a matrix the same calculation as performed during the input sequence for the matrix index is performed again but instead of inputting something into the index, the index’s contents are retrieved and set to a desired variable (i.e. MatrixLeet).

Pointers

Now that we understand how arrays are used / look in assembly, let’s move on to pointers.

Pointers — Code

Let’s break the assembly down now:

int num = 10 in assembly

First we set int num to 10.

pointer = &num

Next we set the contents of the num variable (i.e. 10) to the contents of the pointer variable.

printf num — assembly

We print out the num variable.

printf *pointer — assembly

We print out the pointer variable.

printf address of num — assembly

We print out the address of the num variable by using the lea (load effective address) opcode instead of mov.

printf address of num using pointer variable — assembly

We print the address of the num variable through the pointer variable.

printf address of pointer — assembly

we print the address of the pointer variable using the lea (load effective address) opcode instead of mov.

Dynamic Memory Allocation

The next item on our list is dynamic memory allocation. In this tutorial I will break down memory allocation using:

malloc calloc realloc

malloc — dynamic memory allocation

First, let’s take a look at the code:

Dynamic memory allocation using malloc — code

In this function we allocate 11 characters using malloc and then copy “Hello World” into the allocated memory space.

Now, let’s take a look at the assembly:

Please note: Throughout the assembly you may see ‘nop’ instructions. these instructions were specifically placed by me during the preparation stage for this article so that I could easily navigate and comment throughout the assembly code.

dynamic memory allocation using malloc — assembly

When using malloc, first the size of the allocated memory (0x0B) is first moved into the edi register. Next, the _malloc system function is called to allocate memory. The allocated memory area is then stored in the ptr variable. Next, the “Hello World” string is broken down into “Hello Wo” and “rld” as it is copied into the allocated memory space. Finally, the newly copied “Hello World” string is printed out and the allocated memory is freed using the _free system function.

calloc — dynamic memory allocation

First, let’s take a look at the code:

dynamic memory allocation using calloc — code

Much like in the malloc technique, space for 11 characters is allocated and then the “Hello World” string is copied into said space. Then, the newly relocated “Hello World” is printed out and the allocated memory space is freed.

dynamic memory allocation using calloc — assembly

Dynamic memory allocation through calloc looks nearly identical to dynamic memory allocation through malloc when broken down into assembly.

First, space for 11 characters (0x0B) is allocated using the _calloc system function. Then, the “Hello World” string is broken down into “Hello Wo” and “rld” as it is copied into the newly allocated memory area. Next, the newly relocated “Hello World” string is printed out and the allocated memory area is freed using the _free system function.

realloc — dynamic memory allocation

First, let’s look at the code:

dynamic memory allocation using realloc — code

In this function, space for 11 characters is allocated using malloc. Then, “Hello World” is copied into the newly allocated memory space before said memory location is reallocated to fit 21 characters by using realloc. Finally, “1337 h4x0r @nonymoose” is copied into the newly reallocated space. Finally, after printing, the memory is freed.

Now, let’s take a look at the assembly:

Please note: Throughout the assembly you may see ‘nop’ instructions. these instructions were specifically placed by me during the preparation stage for this article so that I could easily navigate and comment throughout the assembly code.

dynamic memory allocation using realloc — assembly

First, memory is allocated using malloc precisely as it was in the above “malloc — dynamic memory allocation” section. Then, after printing out the newly relocated “Hello World” string, realloc (_realloc system call) is called on the ptr variable (that represents the mem_alloc variable in the code) and a size of 0x15 (21 in decimal) is passed in as well. Next, “1337 h4x0r @nonymoose” is broken down into “1337 h4x”, “0r @nony”, “moos”, and “e” as it is copied into the newly re-allocated memory space. Finally, the space is freed using the _free system call

Socket Programming

Next, we’ll cover socket programming by breaking down a very basic TCP client-server chat system.

Before we begin breaking down the server / client code, it is important to point out the following line of code at the top of the file:

define the Port number

This line defines the PORT variable as 1337. This variable will be used in both the client and the server as the network port used to create the connection.

Server

First, let’s look at the code:

Server — Code

First, the socket file descriptor ‘server’ is created with the AF_INET domain, the SOCK_STREAM type, and protocol code 0. Next, the socket options and the address is configured. Then, the socket is bound to the network address / port and the server begins to listen on said server with a maximum queue length of 3. Upon receiving a connection, the server accepts it into the sock variable and reads the transmitted value into the value variable. Finally, the server sends the serverhello string over the connection before the function returns.

Now, let’s break it down into assembly:

initiating the server variables

First, the server variables are created and initialized.

server = socket(…) — assembly

Next, the socket file descriptor ‘server’ is created by calling the _socket system function with the protocol, type, and domain settings passed through the edx, esi, and edi registers respectively.

setockopt(…) — assembly

Then, setsockopt is called to set the socket options on the ‘server’ socket file descriptor.

address initialization — assembly

Next, the server’s address is initialized through adress.sin_family, address.sin_addr.s_addr, and address.sin_port.

bind(…) — assembly

Upon address and socket configuration, the server is bound to the network address using the _bind system call.

listen(…) — assembly

Once bound, the server listens on the socket by passing in the ‘server’ socket file descriptor and a max queue length of 3.

sock = accept(…) — assembly

Once a connection is made, the server accepts the socket connection into the sock variable.

value = read(…) — assembly

The server then reads the transmitted message into the value variable using the _read system call.

send(…) — assembly

Finally, the server sends the serverhello message through the s variable (which represents serverhello in the code).

Client

First, let’s look at the code:

Client — code

first, the socket file descriptor ‘sock’ is created with the AF_INET domain, SOCK_STREAM type, and protocol code 0. Next, memset is used to fill the memory area of server_addr with ‘0’s before address information is set using server_addr.sin_family and server_addr.sin_port. Next, the address information is converted from text to binary format using inet_pton before the client connects to the server. Upon connection, the client sends it’s helloclient string and then reads the server’s response into the value variable. Finally, the value variable is printed out and the function returns.

Now, let’s break down the assembly:

Client variable initialization — assembly

First, the client’s local variables are initialized.

sock = socket(…) — assembly

The ‘sock’ socket file descriptor is created by calling the _socket system function and passing in the protocol, type, and domain information through the edx, esi, and edi registers respectively.

memset(…) — assembly

Next, the server_address variable (represented as ‘s’ in assembly) is filled with ‘0’s (0x30) using the _memset system call.

Client — address configuration — assembly

Then, the address information for the server is configured.

inet_pton(…) — assembly

Next, the address is translated from text to binary format using the _inet_pton system call. Please note that since no address was explicitly defined in the code, localhost (127.0.0.1) is assumed.

connect(…) — assembly

The client connects to the server using the _connect system call.

send(…) — assembly

Upon connecting, the client sends the helloClient string to the server.

value = read(…)

Finally, the client reads the server’s reply into the value variable using the _read system call.

Threading

Finally, we’ll cover the basics of threading in C.

First, let’s look at the code:

Threading — Code

As you can see, the program first prints “This is before the thread”, then creates a new thread that points to the *mythread function using the pthread_create function. Upon completion of the *mythread function (after sleeping for 1 second and printing “Hello from mythread”), the new thread is joined back the main thread using the pthread_join function and “This is after the thread” is printed.

Now, let’s break down the assembly:

printf “This is before the thread” — assembly

First, the program prints “This is before the thread”.

Creating a new thread — assembly

Next, a new thread is created with the _pthread_create system call. This thread points to mythread as it’s start routine.

The mythread function — assembly

As you can see, the mythread function simply sleeps for one second before printing “Hello from mythread”.

Please note: In the mythread function you will see two ‘nop’s. These were specifically placed for easier navigation during the preparation stage of this article.

joining the mythread function’s thread back to the main thread — assembly

Upon returning from the mythread function, the new thread is joined with the main thread using the _pthread_join function.

printf “This is after the thread” — assembly

Finally, “This is after the thread” is printed out and the function returns.

Closing Statements

I hope this article was able to shed some light on some more advanced programming concepts and their underlying assembly code. Now that we’ve covered all the major programming concepts, the next few articles in the BOLO: Reverse Engineering series will be dedicated to different types of attacks and vulnerable code so that you may be able to more quickly identify vulnerabilities and attacks within closed source programs through static analysis.