TL;DR: We build some LLVM passes which ‘deoptimize’ code generated by LLVM to increase code coverage with AFL (and potentially other feedback driven fuzzers, e.g. libFuzzer). Get the code here.

Introduction

If you haven’t been living under a rock you probably already heard of American Fuzzy Lop (AFL). In case you did not, here is a quick recap: AFL is a very fast and robust fuzzer which uses code coverage as a metric to decide which generated inputs should be discarded and which should be kept for further mutations.

To obtain the code coverage information, AFL adds instrumentation code to each basic block. This added code will notify AFL that a conditional branch was executed. Furthermore, it will identify the taken destination by a random ID. If AFL has never before seen this ID, it knows that the current input reached new (i.e. never before executed) code and is therefore worthy of additional mutation cycles in the future.

This approach is very robust, efficient and elegant. However, there are certain cases which are hard to overcome with this technique. The following code snippet shows a typical example:

if (input == 0xabad1dea ) { /* terribly buggy code */ } else { /* secure code */ }

We can see that the buggy code is only executed if the variable input holds the value 0xabad1dea . This in turn is very unlikely because the value in input is randomly generated by the fuzzer. So, whenever AFL generates a new input which is not 0xabad1dea it will see that the else part of the conditional statement is executed. Even if the input is 0xabad1dee , and therefore almost correct, it will be treated as any other ‘wrong’ input. Hence, AFL will not even notice that it almost guessed the correct input and will happily continue generating incorrect inputs.

Now what if we could hint afl that some guesses were better than others…

Treats for the Rabbit

To overcome this issue we (and several others and probably more) had the idea to split up comparisons into multiple smaller ones which should ‘guide’ AFL towards the correct value.

The approach is simple. Instead of making one four byte comparison (as in the snippet given previously) we split the comparison and make four one byte comparisons and cascade them (aka link them with an AND operation).

if (input >> 24 == 0xab ){ if ((input & 0xff0000 ) >> 16 == 0xad ) { if ((input & 0xff00 ) >> 8 == 0x1d ) { if ((input & 0xff ) == 0xea ) { /* terrible code */ goto end; } } } } /* good code */ end:

Now, as soon as AFL guesses the first byte correctly it will executed the nested if-condition and therefore discover a new path. This in turn, will signal AFL that the current input should be used again in further fuzzing attempts. This will repeat for the other if-statements. This increases the probability to guess the correct value drastically (2^9 << 2^31). (Edit: @esesci pointed out a bug in the code snippet above. Fixed it. Thank you!)

We implemented this approach in an LLVM pass which integrates easily with AFL. According to similar principles we implemented two additional passes to deconstruct other problematic constructs: switch-statements and comparison functions (memcmp, strcmp …)

LLVM Passes

We implemented three LLVM passes which allow AFL to overcome tricky conditional statements easier. The purpose of all three plugins is letting AFL discover more paths than with its integrated feedback mechanism alone. Comparisons are rewritten in the way described above. The code for this is in the split-compares-pass. Besides those comparisons, we rewrite strcmp and memcmp calls in the compare-transform-pass and switch statements in the split-switches-pass.

The split-compares-pass

In essence the split-compare-pass works exactly like explained before, so we are not going to explain it again. However, there are a few additional things implemented which we will talk about briefly.

Obviously, there are not just equal (i.e. ==) operations within if-statements. There are also, not equal (!=), less than (<), greater than (>), and their companion equal variants (<=, >=). Additionally, there are also signed versions of the last four. To increase the chance for additional coverage we wanted to ‘split’ comparisons which use these predicates as well. The pass achieves this by first replacing the <= and >= with two comparisons, one == comparison and one < ( or > respectively). Next all signed compares are split into a comparison of the sign-bit, the unsigned equivalent of the predicate, and some logic to make all of this work correctly.

After these two stages there are only four types of comparisons remaining (<, >, ==, !=, all unsigned). These are then split into bytes and are compared as discussed previously.

The compare-transform-pass

The second pass deals with AFL’s difficulties to generate input which causes the strcmp calls in line 14 and 17 to return 0 (see Driller code in the evaluation section) . In consequence, execution of line 15 or 18 is unlikely to occur within any reasonable time, which prevents AFL from reaching the programbug() function. To overcome this issue, the compare-transform-pass rewrites the code in a way to let AFL discover the strings “crashstring” and “setoption” in an incremental fashion. The best way to illustrate the effect of the pass is a code snippet. Consider the code below:

if ( ! strcmp(directive, "crash" )) { programbug() }

All generated strings which end up in the variable directive cause the same feedback in the instrumentation output, unless it is exactly the string “crash”. In particular, “crasi” doesn’t cause a different instrumentation output than “total_wrong_string”. In the spirit of the split-compares-pass, the purpose of this plugin is to let AFL discover the proper string in an incremental fashion. The rewritten form of the code snippet above looks like this:

if (directive[ 0 ] == 'c' ) { if (directive[ 1 ] == 'r' ) { if (directive[ 2 ] == 'a' ) { if (directive[ 3 ] == 's' ) { if (directive[ 4 ] == 'h' ) { if (directive[ 5 ] == 0 ) { programbug() }

This new form introduces an explicit control flow transfer, which in turn is instrumented by the compiler pass provided by AFL. This allows AFL to detect changes in the control flow as soon as the first byte of directive is ‘c’. Since this newly found test case later on become the base of AFL’s mutation, sooner or later the second byte is set to ‘r’, again revealing another character of the string. One after another, the complete string is unveiled, ultimately leading to programbug() without having to guess the whole string at once. This implemented pass not only covers strcmp but memcmp as well.

Limitations

The compiler pass can only be applied if one of the strings is a literal and thus the string itself and its length are known at compile time. Otherwise, its impossible to know this. For our purpose this is not a severe limitation since many parsers embed strings directly in their source code. In case of memcmp, the size parameter of the function call must be known at compile time.

The split-switches-pass

The split-switches-pass is the most complicated to understand, so bare with us. Switch tables generated by the compiler are usually not in the optimal format for AFL’s path discovery. In general, the individual cases of a switch expression are found by either a table based lookup or by a binary tree. The former one is used if the individual case constants are dense, whereas the latter one is preferred if the constants are sparse. Given the split-compares-pass it is trivial to rewrite each switch statement into a long list of if, else if, else statements and let the split-compares-pass do the rest. Even though this approach stands to reason, the pass would generate sub-optimal code (in terms of AFL’s ability to find paths). Consider the switch statement below with two cases, 0x11ff and 0x22ff.

int x = userinput(); switch (x) { case 0x11ff : /* handle case 0x11ff */ break ; case 0x22ff : /* handle case 0x22ff */ break ; default: /* handle default */ }

Rewriting this code to a if-else-if-construct and applying the split-compares-pass generates the following code:

if (x >> 24 == 0x00 ){ if ((x & 0xff0000 ) >> 16 == 0x00 ) { if ((x & 0xff00 ) >> 8 == 0x11 ) { if ((x & 0xff ) == 0xff ) { /* case 0x11ff */ goto after_switch; } goto default_case; } goto default_case; } goto default_case; } else if (x >> 24 == 0x00 ){ if ((x & 0xff0000 ) >> 16 == 0x00 ) { if ((x & 0xff00 ) >> 8 == 0x22 ) { if ((x & 0xff ) == 0xff ) { /* case 0x22ff */ goto after_switch; } goto default_case; } goto default_case; } goto default_case; } default_case: /* default case */ after_switch:

The feedback mechanism allows AFL to incrementally discover that the 2 leftmost bytes of x must be 0. Then AFL discovers that the third byte is either 0x11 or 0x22. The important observation is that AFL has to discover the value of 0xff for the last byte two times, once for the value 0x11ff and once for 0x22ff. The split-switches-pass generates less redundant code and more importantly requires AFL to discover the value of 0xff of the last byte only a single time:

int x = userinput(); if (x >> 24 == 0 ) { if ((x & 0xff0000 ) >> 16 == 0x00 ) { if ((x & 0xff ) == 0xff ) { if ((x & 0xff00 ) >> 8 == 0x11 ) { /* handle case 0x11ff */ goto after_switch; } else if ((x & 0xff00 ) >> 8 == 0x22 ) { /* handle case 0x22ff */ goto after_switch; } else { goto default_case; } } goto default_case; } goto default_case; } default_case: /* handle default */ after_switch:

The code above, generated by the split-switches-pass, requires AFL to discover 0 for the two leftmost bytes. Now, instead of targeting the third byte, the condition in line 4 checks the fourth byte. Only after this condition is fulfilled, the third byte is checked. In consequence, the value for the third byte needs to be discovered only once instead of twice. The effect of this plugin compared to a list of if-else-if-constructs strongly increases the more cases the switch statement contains.

Evaluation

To compare the effect of the additional passes on AFL’s ability of reach new code paths, we tested the plugins with a pathological test case and the two real world libraries libpng and harfbuzz. Even though no security critical bug was found in either of the libraries, the passes significantly increase the code coverage in libpng and slightly in harfbuzz, and uncover a read of and uninitialized variable in an error handling path of libpng.

Comparison to Driller

The first test case is a copy of Listing 1 from the paper Driller: Augmenting Fuzzing Through Selective Symbolic Execution. The snippet illustrates the motivation of the paper’s authors to enhance fuzzing with symbolic execution. In particular, lines 7, 14, and 17 pose a challenge to AFL for the reasons discussed above.

1 int main ( void ) { 2 config_t * config = readconfig(); 3 if (config == NULL ){ 4 puts( "Configuration syntax error" ); 5 return 1 ; 6 } 7 if (config -> magic != MAGICNUMBER) { 8 puts( "Bad magic number" ); 9 return 2 ; 10 } 11 initialize(config); 12 13 char * directive = config -> directives[ 0 ]; 14 if ( ! strcmp(directive, "crashstring" )) { 15 programbug(); 16 } 17 else if ( ! strcmp(directive, "setoption" )) { 18 setoption(config -> directives[ 1 ]); 19 } 20 else { 21 _default(); 22 } 23 24 return 0 ; 25 }

If the code is compiled with the passes described previously, AFL should be able to find the value MAGICNUMBER and the strings “crashstring” and “setoption” on its own. An indeed, it took AFL a minute to get pass the check in line 7. From there on, it took AFL 60 minutes to generate the string “crashstring”. Unfortunately, the string “setoption” was not found, for some unclear reason AFL never explored the corresponding path. Without the passes AFL doesn’t make it past line 9 within any reasonable time, not to mention finding the two strings guarding the non-default case.

libpng

Setup for evaluating the plugins

Measuring the effect of the plugins on fuzzing real world libraries is more difficult compared to test case discussed above. In general, we are interested in how the code coverage changes and how the number of paths differ. However, one has to be cautious when comparing the number of paths. The plugins add a significant amount of spurious paths through the binary.

For example, a conditional jump based on a 32 bit comparison normally introduces two edges in the control flow graph. With the split-compares-pass, a single 32 bit comparison is transformed into 4 consecutive 8 bit comparisons, resulting in 5 edges in the control flow graph. Thus, we expect to see many more paths through the binaries with the modified instrumentation. To make a comparison based on the number of paths feasible again, our setup looked the following:

Group A:

1x normal instrumentation, master mode

3x normal instrumentation, slave mode

Group B:

1x normal instrumentation, master mode

1x normal instrumentation, slave mode

1x enhanced instrumentation, master mode

1x enhanced instrumentation, slave mode

Group A simply runs 4 afl instances creating the baseline of the evaluation. Group B seems like an unorthodox choice, but is key to enable comparisons based on the number of paths found. Since two instances of afl in group B work on exactly on the same binaries as used in group A, their respective number of paths are comparable. For the same reason, namely working on exactly the same binaries, all those instances are expected to get stuck at the same obstacles (e.g. 32 bit magic value). This is where the 2 binaries with the enhanced instrumentation (in group B) contribute their strength. They find the 32 bit magic value incrementally, ultimately generating a test case with the whole 32 bit magic value at the right position. Eventually those test cases are picked up by the normally instrumented binaries in group B (through the regular syncing mechanism of AFL). In consequence, the normally instrumented binaries in group B profit from the enhanced instrumentation without falsifying their number of found paths.

Results

The fuzzing with group A and B as described above ran nearly 52 hours before we terminated the session. A general observation from the run is that in the beginning, group A was faster in discovering paths than group B. However, after a while, group B catched up and surpassed group A. In the end, group A found 1459 paths, whereas group B found 2318 paths. In terms of code coverage (measured with lcov, see the full report here), group A hit 2186 lines of libpng, whereas group B hit 2707, an noticable increase of 23%. The overall code coverage of group B is still only 26.2% of libpng. This is mainly caused by the utilized fuzzing stub which only targets a subset of the libraries functionality.

A quick look at the lcov reports show the effect of the split-compares-pass in pngread.c, starting at line 166. The function png_read_info reads a 4 byte value from the png file and compares the value against a set of predefined constants.

In group A, without our pass, only a small subset of all constants is found. In consequence, the functions png_handle_cHRM, png_handle_hIST, png_handle_oFFs, png_handle_pCAL, png_handle_sBIT, png_handle_sCAL, png_handle_sRGB, and png_handle_tIME are never executed. With our plugin, all constants are found, causing execution of all those functions:

The fuzzing didn’t yield a crash in libpng; however we found a minor msan violation.

harfbuzz

The setup for the test on harfbuzz was identical to the one for libpng, however the test ran only 24 hours. At the end of the test, group A found 2070 paths, whereas group B found 2150 paths. In terms of code coverage, group A hit 3358 lines, whereas group B hit 3474 lines, an increase of 3.5%. The full lcov report is available here. Especially interesting is the file hb-ot-shape-complex-hebrew.cc since it highlights the effect of the plugin pretty well. The lcov report without the plugin shows the following coverage:

Whereas with instrumentation the report looks the following:

The test on harfbuzz shows that the effect of the passes is highly dependant on the code structure of the fuzzing target.

Caveats

There are a few caveats with the plugins and the rewriting techniques they implement. Most of those magic values and strings could be found with a proper dictionary as well. Hence, a dictionary is an alternative option to the the split-compares-pass with regards to magic values. Nevertheless, the pass could still be useful for finding the possible values for compares of variables. (Note: AFL recently added libtokencap which tries to find magic values. We have not tested this feature yet, so we can not compare our pass to this feature)

AFL provides a different approach to cope with strcmp and memcmp functions. An example of the approach can be found in AFL’s instrumented_cmp.c file. This code redefines the functions to a straight-forward loop with a similar goal in mind, namely giving AFL feedback on individual bytes. We think that our approach is slightly better because we don’t need a loop structure (since we are aware of the string length). With a loop structure, AFL cannot distinguish between a test case with the first 10 characters matching a string literal and a test case with the first 11 characters matching the a string literal (because of AFL’s bucketing mechanism).

We didn’t evaluate the overall performance impact of the plugins in terms of executions per second. Because the executed code increases, we expect a noticable but acceptable slowdown. Our evaluation for libpng and harfbuzz indicate that the benefits outweight the cost, but for other targets this might be different.

Further ideas

As discussed on this blog, there are bug classes such as divide by 0 which afl is unlikely to find. Given the code:

printf( "%ld

" , 3 / (n + 1000000 ));

where n is a user-defined value, afl is unlikely to set n to -1000000, causing afl to miss the crash. For special bug classes like this another pass could be implemented which takes the denominator of every division and preceeding it with a comparison against 0. Combined with the split-compares-pass, the value of n causing the division by 0 should be found quickly.

Another small improvement could be achieved if we rearrange the byte-wise comparisons with equal predicate. As described earlier, we split these compares into single bytes and nest the comparisons. This requires the fuzzer to find the first byte before the other bytes can be checked. We can now rearrange these compares in such a manner that all bytes are compared (independent of the result of the first compare). The approach is illustrated in the following code snippet:

int acc = 0 ; if (input >> 24 == 0xab ) acc += 1 if ((input & 0xff0000 ) >> 16 == 0xad ) acc += 1 if ((input & 0xff00 ) >> 8 == 0x1d ) acc += 1 if ((input & 0xff ) == 0xea ) acc += 1 if (acc == 4 ) { /* terrible code */ } else { /* good code */ }

This could improve our results as the fuzzer is not required to guess the first byte correctly before it is able to make guesses for any other bytes.

Conclusion

Are you still reading? Just get the code and try it yourself!

Maybe give some feedback 🙂