Hello, my name is Nikos Naziridis and I am a security researcher at CENSUS. In this post, I will present how SystemTap and kernel instrumentation in general, could be used to aid the process of determining the exploitability of unbound memory overflows and the detection of thread race condition bugs.

Introduction

For the reader who is not familiar with SystemTap and the concepts of kernel instrumentation, I will attempt to make a small introduction, however for more details please refer to the References section. In this post I will be talking about the Linux kernel versions >= 2.6 and SystemTap versions >= 2.2. If you feel at ease with SystemTap or how kernel instrumentation works you may skip to the next section.

Kernel instrumentation is a set of techniques that allows a user to monitor and trace the execution of a kernel. One popular implementation of kernel instrumentation is the Kprobes (Kernel Probes) system. Kprobes allows a user to develop a kernel module that will hotpatch specific instructions in the kernel code with trampoline functions. If such a patched instruction is executed, then the execution flow jumps into one of these aforementioned functions, runs user provided code and then returns to the original instruction.

SystemTap automates the process of developing a kernel module and abstracts the user from the kernel specifics. It does this by exposing a variety of probe points for Kprobes and provides utility functions in a scripting language it understands. When a script in this language is run through SystemTap, it is translated to C code, compiled as a kernel module and loaded into the running kernel automatically.

An example

Imagine a heap buffer overflow vulnerability that is the result of an unsigned integer overflow used as the size parameter for a memcpy() call. The nature of the bug, while common enough, introduces complications to its exploitation. The following snippet, though unrealistic, is sufficient to showcase the bug:

int one = ...; int two = ...; ... char stack[] = "..."; char *heap = (char *)malloc(sizeof(one)); ... memcpy(heap, stack, (one — two));

It is apparent that if two was larger than one , then the result of (one — two) would be negative. Of course, the size parameter that memcpy() expects is size_t , which is an unsigned type. This would result in an unsigned integer overflow, and the negative integer would be interpreted as a very large number. Unable to detect this, memcpy() would try to copy this large number of bytes from stack to heap until it would trigger an access violation. So, determining the exploitability of a vulnerability such as this, boils down to whether the attacker can control the size of the overflow or not.

Possible solutions

One solution to situations similar to the above, is to try and provide a value in one or two , such that would produce an integer underflow and wrap around to a size that suits the attack scenario. But there are many occasions where this is not possible, so let’s assume this is the case.

Since inducing an underflow is out of the question and there are no arithmetic operations that could provide control over the overflow size, another solution would be to look at memcpy() ‘s implementation. Gobbles’ apache-scalp exploit for BSD systems in 2002, solved a similar case by abusing the fact that the BSD memcpy() stores the number of remaining bytes to be written in a stack variable. So, by overwriting this value, an attacker could dynamically control the overflow size. Though, in this case, the overflow copies data from the stack to the heap, so again this would not work.

If there is no “conventional” way to control the overflow size, then how about trying to stop or delay it? Imagine, that the snippet above was part of a large threaded program. Then, in theory, there could be a thread race condition situation (not necessarily a bug) that would allow us to use a portion of the overflown memory from one thread, before the preempted thread that does the memcpy() call reaches a protected/unmapped area. But even if such a thing was possible, how would we debug this?

Enter SystemTap

The Linux kernel uses a scheduler with dynamic priorities, that supports preemption. This means that at any given moment the thread being executed can be replaced by another thread that is considered by the scheduler as more important.

Since the scheduling happens in the kernel, it should be “accessible” from a kernel module. So, by using SystemTap we should be able to monitor the preemption. Indeed, we could use the scheduler.cpu_off probe to do something like:

global goflag = 0, interesting_pid = 0 # probe scheduler every time a task is switched probe scheduler.cpu_off { # if execution reached the interesting point if (goflag) { # find the pid and base (dynamic) priority of # the tasks involved prevpid = task_pid(task_prev); nextpid = task_pid(task_next); prevprio = task_prio(task_prev); nextprio = task_prio(task_next); # inform the user printf( "switched %lu (p: %lu) to %lu (p: %lu)

", prevpid, prevprio, nextpid, nextprio ); print_regs(); print_ubacktrace(); } }

This adds a probe point (called tap in SystemTap’s lingo) that would be called every time a thread is being scheduled off a CPU core. If you would run this with SystemTap, it would produce a garbage-ridden output that would contain all thread preemptions occuring in the system.

To actually produce output that is relevant to the problem at hand, we need to be able to run our code just before the memcpy() call and until a SIGSEGV or other terminating condition occurs. Fortunately, SystemTap implements many different probe points that can help with this, namely signal.send and process().statement . Therefore, we can use something like the following to start monitoring:

# set a probe for the interesting point probe process("/path/to/lib/lib.so").statement("*@whatever.c:1337") { # use some globals to enable probing when # the execution reaches whatever.c:1337 (file:line_number) # store the current pid (interesting pid) currtask = task_current(); interesting_pid = task_pid(currtask); # set the go flag goflag = 1; printf("reached interesting point; starting...

"); }

To define an ending condition, we can add:

# if you detect an access violation (SIGSEGV == 11) probe signal.send { # check if it is intended for the interesting task currtask = task_current(); currpid = task_pid(currtask); if (interesting_pid == currpid) { if (sig == 11) # SIGSEGV { # inform and die printf( "detected SIGSEGV to process %lu; stopping...

", interesting_pid ); exit(); } } }

In the case that there are no debugging symbols available for our target application, we could use process().statement(ADDRESS).absolute and provide an absolute virtual address for the probe. By using start and end conditions, the above script would only show threads preempting in the critical time window.

Conclusion

Using SystemTap and Kprobes we have implemented a way to examine the threads that preempt the thread that does the vulnerable memcpy() call. We can also put our target application under stress conditions (for example, forcing it to process large amounts of user input) in order to see if the memcpy() thread can indeed be preempted by some other thread. If there is such a thread, we can now carefully study it in order to see if it accesses the partially overflown memory, or any variables overwritten by it, and determine if there are exploitable conditions.

References

Check the following links for more details on the subject discussed:

http://en.wikipedia.org/wiki/Instrumentation

https://sourceware.org/systemtap/wiki

https://sourceware.org/systemtap/tapsets/

https://sourceware.org/systemtap/kprobes/

