A couple of months ago there was an ACM article on the SAGE whitebox fuzzing system from Microsoft Research. SAGE is one of the most interesting products of research on automated program testing in recent years and, according to Microsoft, has been used to find a massive amount of bugs in their various file parsers.

At its core, SAGE contains a symbolic emulator for executing instruction traces over symbolic data. As well as whitebox fuzzing, symbolic emulators are fairly useful things for a variety of reverse engineering, vulnerability discovery and program analysis tasks. Essentially, a symbolic emulator is a CPU emulator that not only supports operations on concrete numeric values but also on abstract values that may represent a range of concrete values.

In this series of posts I’m going to give an overview of a Python-based symbolic emulator for x86 that I’ve been working on (called JESTER) and show how it can be applied to the problem of whitebox fuzzing. Hopefully this will give an idea of how symbolic emulation works (it’s fairly simple) and also provide some insight into how systems like SAGE operate.

Consider the x86 instruction add eax, ebx . Operating over concrete values an emulator will do the obvious thing of taking the value in EAX, adding it to EBX and then storing the result back in EAX. It will also update the various flags that are affected. Over symbolic values however the result is a bit more interesting. Lets assume that EAX contains the abstract value V1 which represents an unconstrained 32-bit variable, and EBX contains the concrete value 0x10. In this case the emulator will create a new abstract value V2 which represents the addition of V1 and 0x10 and store that back in EAX. Diagrammatically, we can see that EAX now contains something that is a function rather than a single value.

v1 10 \ / EAX: +

A slightly more complex diagram shows what the Zero Flag would hold after the above instruction.

v1 10 \ / + 0 \ / == 1 0 \ | / ZF: if-then-else

I purposefully used the word ‘function’ because what we end up with, in registers and memory, are expression trees that map from a given set of inputs to an output. As more instructions are emulated these trees get bigger and more difficult to reason about so people usually take the approach of exporting them to a SMT solver and querying their models that way. The obvious applications being input crafting, tracking user-influenced data and checking security properties. This is fairly well documented in previous posts and in a decade worth of academic literature so I won’t delve into the details.

The point of this post is instead to look at the overall architecture of a symbolic emulator with the aim of illuminating some of the components involved, more directly than is typically done in formal descriptions. I also want to give people an idea of how much or how little effort is involved in building these tools. In order to demonstrate the use of a symbolic emulator I’ll apply it to the problem of whitebox fuzzing i.e. using a symbolic emulator in combination with a SMT solver to generate inputs for a program guaranteed to force execution down a new path.

While writing this series of posts Rolf Rolles posted a great video/blog entry on the topic of input crafting using an SMT solver. Taking a leaf out of his book I’ve decided to accompany these with a video that demonstrates the tools described in operation and should ideally give some insight into their construction. The video is linked at the end but the following wall of text will give some context and might be worth glancing over. This isn’t the most entertaining of entries in the series and is mostly for completeness so if you’re bored out of your mind I accept minimal responsibility =)

1. Trace generation

An emulator needs some way to know what instructions execute and it also needs a starting memory and thread context. There are a few different approaches to getting such information. The Bitblaze/BAP research groups modified Qemu and hook in directly there, the guys working on S2E do something similar and I previously wrote a C++ library that was used as part of a Pintool at run time. There are a couple of problems with tying your emulation directly into the run time environment of the tool however. Firstly, it’s a lot more annoying to debug an extension to Qemu or PIN than n separate emulator and secondly, it prevents you from doing the emulation on a separate machine to the tracing. The second issue is probably the most important in the long run as to really scale whitebox fuzzing to the point where it is useful requires parallelism.

The approach I took this time around is directly inspired by the work of MSR on their Nirvana/iDNA tool, but much more simplistic. Instead of using the Pintool to do the emulation I use a lightweight one to just trace the instructions executed and other interesting events, like image loads/unloads, system calls and debugging info. If you’ve used PIN before then most of what I’m about to describe will be obvious and fairly boring so you might want to skip on to part 2 of this series of entries.

The trace format is uncompressed and unoptimised and to date I’ve not had any problems with that. A typical segment just looks as follows ( L denotes an image load, I an instruction execution and C provides debugging information as discussed below):

L;4;/lib32/libc.so.6;f5c54000;157244 L;5;/lib32/libm.so.6;f5c2e000;24790 C;0;EAX:ffb292a4;EBX:f5da9ff4;ECX:53f78923;EDX:5;ESP:ffb291f8;EBP:ffb291f8 ... I;0;8048fc5 C;0;EAX:ffb292a4;EBX:f5da9ff4;ECX:53f78923;EDX:5;ESP:ffb291f0;EBP:ffb291f8 ... I;0;8048fc8 C;0;EAX:ffb292a4;EBX:f5da9ff4;ECX:53f78923;EDX:5;ESP:ffb291ec;EBP:ffb291f8 ...

In the early stages of the project I worried about this and thought I’d have to come up with some compression method but that hasn’t been the case. Most file parsers generate traces that can be measured in 10s of millions of instructions and things of that scale easily fit in a few gigabytes of storage.

1.1 Debugging Assistance

Writing an emulator of any kind can be tedious work. It’s easy to make mistakes and get the semantics of an instruction slightly wrong or set a flag incorrectly. Initially I tried to counter this by writing unit-tests but it quickly became obvious that 1) These were never going to be exhaustive and 2) They were as likely to have mistakes as the emulator code itself. Instead, I added a debug mode to the tracer that logs the register values after each instruction (The lines starting with a “C” above). This then allows the emulator to compare its register values to the ones we know it should have and highlight any discrepancies. Tracing everything in /usr/bin/ and checking these values is a hell of a lot more exhaustive than any unit-testing I would have done! The only reason I’m mentioning this is that I’d recommend it to anyone writing something of this nature. The best tests are by far those you can extract from real binaries.

1.2 Handling system calls

One of the disadvantages of using a user-land tracer is that you miss out on any updates to memory that happens within the kernel. The only real way to handle this correctly is to define per-system-call handlers that know which memory addresses a system call will update based on its arguments or return value. In PIN this is fairly straightforward, you register a syscall entry and exit handler, get the syscall args or return value and then log whatever you need on exit.

int main(int argc, char *argv[]) { ... PIN_AddSyscallEntryFunction(SyscallEntry, 0); ... } VOID SyscallEntry(THREADID tid, CONTEXT *ctxt, SYSCALL_STANDARD std, VOID *v) { ADDRINT syscall_num = PIN_GetSyscallNumber(ctxt, std); ... #ifdef LINUX // Handle Linux syscalls switch (syscall_num) { case SYS_READ: { ADDRINT read_buf_ptr = PIN_GetSyscallArgument(ctxt, std, 1); size_t read_sz = PIN_GetSyscallArgument(ctxt, std, 2); t_data->log_addrs[read_buf_ptr] = read_sz; break; } ... }

Handling each and every system call might seem like an onerous task but if you’re working on particular types of software (e.g. file parsers) then you can get away with a minimal subset e.g. open, read, lseek, mmap and a few others. My general approach is to just add them as necessary. You’ll encounter many more along the way but it turns out not a whole lot end up having any interaction with the user controlled data you’re interested in.

In the trace log format I included support for events other than those shown in the above snippet.). For syscalls as just discussed there is the M event which looks like as follows and tells the emulator to update the memory address given with the contents of a file.

M;0;f5f97000:syscall_c0_f5f97000_1000_1

There is also the ‘R’ event which tells the emulator to update a register with a particular value. This is useful for instructions you can’t handle for whatever reason. Other than that there isn’t really anything to capturing a trace. The only thing I haven’t mentioned is that on starting tracing, either at a given address or the programs entry point, you also need to log the programs memory and thread contexts at that point in order to give your emulator starting values. This is fairly straightforward though and PIN provides all the required API calls.

(You probably want to click the “Watch on YouTube” option on the bottom right of the video and set the quality to 720p. The tools discussed are not publicly available but that may change in the future.)