Binary Exploitation ELI5 — Part 1

0x00 — Preface

In this article series I will be going over different types of binary exploits in detail, explaining what they are, how they work, the technologies behind them, and some defenses against them. Throughout this series I will do my best to explain these attacks, defenses, technologies, and concepts in a way that anyone, from beginner to 1337 h4x0r, can understand.

Please note: While I will be adding some key Prerequisite knowledge sections in hopes of making the more technical explanation of these attacks easier to understand, this article series will not go over all of the information / concepts / technologies necessary to be proficient in the field of binary exploitation.

In this article, we’ll be covering:

0x01. Prerequisite Knowledge: Application Memory

0x02. Prerequisite Knowledge: The Stack

0x03. Prerequisite Knowledge: Function Calls and Returns

0x04. Attack: Stack Buffer Overflows

0x05. Attack: Return-to-libc (ret2libc) attacks

Click below for the next part of this series:

Binary Exploitation ELI5 — Part 2

0x01— Prerequisite Knowledge: Application Memory

When executed, Applications are loaded into memory, however, as we all know, computers have a finite amount of memory and, as such, they have to be extremely careful when loading things into it so as to not overwrite any other application. To do this, computers use a concept called Virtual Memory, which can be perfectly summed up using the scene from the early 2000s TV show, Drake and Josh, in which Drake and Josh take a job organizing sushi into containers:

In the scene, Drake and Josh get a job where they take sushi that is coming through a conveyor belt and they have to organize the pieces of sushi into containers. Furthermore, while all the sushi containers look exactly the same, it is crucial that each container only contains one type of sushi.

So, let’s break the analogy down and relate it to the concept of Virtual Memory:

The Sushi Conveyor Belt: As I said above, computers have to be very careful and precise about where they put application data in memory so that nothing is overwritten. Although a computer could just simply carefully put applications in physical memory, this would eventually cause problems, as application fragments would quickly fill up the entire space. In the above example, the individual sushi pieces can be seen as application fragments or chunks of memory allocated by the application, while the entire set of sushi (6 per container) can be seen as the application itself.

Drake and Josh: To circumvent the issue of filling up the conveyor belt with individual pieces of sushi, Drake and Josh organized them into individual containers, which were then allowed to move down the conveyor belt. Much like Drake and Josh, your computer organizes and sets applications into containers as well, called virtual memory locations. These virtual memory locations (or Virtual Address Spaces) allow the application to believe it has full control over the entire scope of memory. However, when an application calls a location or tries to allocate memory within it’s Virtual Address Space instead of being granted access to arbitrary physical memory, a small, but extremely important, piece of hardware in your computer’s CPU (Central Processing Unit) called the MMU (Memory Management Unit) maps the application’s call with a specific region of physical memory, and facilitates any memory manipulation. This memory mapping allows computers to organize and process multiple applications with dynamic memory requirements through a centrally organized lookup table.

An ASCII Diagram of the Virtual Memory Process

It is also important to note that while all of an application’s code is contained within it’s virtual address space, applications often use dynamically linked libraries (DLL) such as libc or kernel32. These DLLs are simply external (not stored within the Application’s address space) system applications or other custom applications that the program imports code from. Take the below code for example:

A Basic C function

As you can see, nowhere in this 6 line program do I actually define what printf is. However, this program will still run without issue and print out “Hello World”. This is because the printf function is a system function defined in libc, which is the standard C library. During the compiling process, libc is externally linked to the executable. On a linux system, you can view a program’s shared library dependencies using the ldd command.

Displaying a program’s shared library dependencies with ldd

If you’re looking at the above screenshot and wondering what in the world is 0xb7e99000 well, that’s the address of the libc library in memory. Memory addresses are represented in hexadecimal format. Please click here to get some more information on the hexadecimal number system.

0x02— Prerequisite Knowledge: The Stack

The Stack is simply a large data structure that is used to store application information and data during runtime. The stack’s functionality can be simply explained through the following analogy:

Bob is a dishwasher at a fancy restaurant, each night Bob has a stack of plates to wash. Furthermore, throughout the night more plates may be added to Bob’s stack whenever a table is cleared off. If Bob takes a plate from anywhere but the top of the stack, all the ones above it will fall and break.

Now, instead of Bob and a stack of plates simply imagine a computer and a stack of Data Objects. Whenever something is pushed onto the stack, it is added to the top of the stack, and whenever something is popped off the stack, it is removed from the top of the stack, making it a Last In First Out (LIFO) mechanism.

The stack is used by programs to hold all sorts of things such as function pointers (the location of a function in memory), and variables.

0x03 — Prerequisite Knowledge: Function calls and Returns

Take a look at the below code:

A basic C program

In this code snippet, we see that the function add takes 2 integer type arguments called A and B. In the main function, we can see that we’ve called Add with the number 1 for the argument A and the number 2 for the argument B. If we break this code down into it’s underlying machine code we see:

Calling the add function with 2 arguments

As you can see, when calling a function with parameters the program first pushes both parameters onto the stack and then executes a call statement. This call statement redirect’s the programs instruction pointer (An instruction pointer is like the little pencil you use to keep track of which word you’re reading. The instruction pointer always points to the instruction that’s about to be executed (the word that’s about to be read)) to the address of the function being called. However, before navigating to the called function, the call statement pushes the address of the next instruction below it to the stack, so that when the add function returns, it will know where to continue processing from. The address of the location that the function should return to is called the functions return pointer.

0x04 — Attack: Stack Buffer Overflows

Before going into technical detail about what Stack Buffer Overflows are and how they work, let’s look at a quick, easy-to-understand, analogy:

Alice and Bob used to date, but Alice ended up breaking up with Bob. As time went on, Alice moved on but Bob never really got over the heartbreak. Now, Alice is getting married to Robert Hackerman, Bob’s arch-nemesis. Bob, being a creepy weirdo, spied on all of Alice’s wedding plans through his secret access to Alice’s email account. Bob saw that Alice hired a famous wedding cake designer who would wanted Alice to edit parts of his recipe for her flavor preferences. The designer gave Alice a recommended list of ingredients to add but said he would do whatever she wanted, precisely. Bob opened up the document attached to the designer’s email and saw that the recipe’s custom lines looked like:

… Then, we’ll add flavor to the frosting by adding ______. After that, we’ll add some chocolate ….

Bob noticed that if you entered “Banana” into the line, the text would look like:

… Then, we’ll add flavor to the frosting by adding banana. After that, we’ll add some chocolate …

But, if Bob entered “Strawberry” into the line, the text would look like:

… Then, we’ll add flavor to the frosting by adding strawberryter that, we’ll add some chocolate …

Bob realized that this would be the perfect way to ruin Alice’s wedding, all he had to do was overwrite the rest of the recipe with his own, disgusting, version! On Alice’s wedding day, the designer finally revealed the cake he had made — It was covered in bugs and made out of frozen mayonnaise!

A stack buffer overflow, much like Bob’s attack, overwrites data that the developer didn’t intend to have overwritten, allowing for full control over the program and its output(s).

So, now let’s see it in the real world. Take a look at the following piece of code from exploit-exercises.com:

Exploit-Exercises.com Protostar Stack0 Code

In the above function, we see that a character type array called buffer is created with a size of 64. Then, we see that the modified variable is set to 0 and the gets function is called with the buffer variable as an argument. Finally, we see an IF statement that checks if modified is not 0. Obviously no where in this application is the modified variable set to anything other than 0 so how are we going to change it?

Well, let’s first take a look at the gets function documentation:

gets function defined

gets function bugs section

As you can see, the gets function simply takes in user input. However, the function does not check if the user input actually fits into the data structure we’re storing it in (in this case, buffer) and thus, we’re able to overflow the data structure and affect other variables / data on the stack. Furthermore, since we know that all variables are stored on the stack, and we know what the modified variable is (0), all we have to do is enter enough input to overwrite the modified variable. Let’s take a look at a diagram:

an ASCII diagram of a stack buffer overflow

As you can see, if a malicious user simply enters too much text they can overwrite the modified variable and anything else on the stack, including return pointers. This means that if a malicious agent is able to take control of a programs stack, they are effectively able to take control of the entire program and make it do whatever they want. They could simply overwrite a function’s return pointer on the stack to a custom one that points at some malicious payload.

0x05 — Attack: ret2libc

Before we talk about Return-to-libc (ret2libc) attacks, let’s take a moment to discuss libc a little bit deeper.

As we know (from section 0x01), libc is the standard C library. This means that it contains all the generic system functions included in the C programming language. Now, what if a malicious user was able to take control of the program to execute some of these functions?

Well, that’s pretty much exactly what ret2libc is. One perfect analogy for ret2libc’s consequences could be the Matrix series. Think back to the classic “Guns, lots of guns” scene. Tank, the operator, was able to completely bypass and reprogram the matrix to make A TON of guns just appear out of nowhere.

You can sort of think of return to libc like that, we’re able to take control of the matrix (the standard C library) and make it do whatever we want.

At it’s base, ret2libc attacks are actually based on stack buffer overflows. Think back to what I said at the end of section 0x04, If a malicious agent can overwrite data on the stack, they can simply overwrite the return pointer to point to a specific function within libc and pass it whatever arguments necessary to deliver a payload.

One of the most common functions to use for ret2libc attacks is the system function. Let’s take a look at it’s documentation:

the system command’s documentation

As you can see, the system command simply executes shell commands (the shell is the linux command line). Furthermore, If we read into the description we can see that system simply executes /bin/sh -c <command> (/bin/sh is the actual shell command) and the command is passed into the function through an argument.

So, all we have to do to gain command-line access to the machine that the vulnerable application is running on is push “/bin/sh” onto the stack as an argument then replace a return or call pointer with the memory address of the system function so that the function is called with /bin/sh as an argument, starting up a shell and granting us complete access over the system.

Exploits, lots of exploits.

0x06 — Part 1 Conclusion

In this article we covered:

0x01. Virtual memory and how applications are processed in memory

0x02. Dynamically Linked Libraries and libc

0x03. The Stack

0x04. How functions are called and how returning from a function works

0x05. Stack Buffer Overflows

0x06. Return-to-libc (ret2libc) attacks

I hope this article was helpful. Click below to continue on to Part 2 of this series:

Binary Exploitation ELI5 — Part 2

Also, if you’re interested in reverse engineering, please check out my BOLO: Reverse Engineering article series:

BOLO: Reverse Engineering — Part 1 (Basic Programming Concepts)

BOLO: Reverse Engineering — Part 2 (Advanced Programming Concepts)

And, if you’re looking for more ELI5 content, check out my Explain Spectre and Meltdown Like I’m 5 article.

push “Thanks”

push “for”

call Reading