Fuzz testing (or fuzzing) is an increasingly popular technique to find security and other bugs in programs. For user space, american fuzzy lop (AFL) has been used successfully to find many bugs (as noted in an LWN article in September 2015). On the kernel side, projects like the Trinity system-call fuzzer and syzkaller have been used effectively. But there is now another fuzzing option for the kernel. Vegard Nossum and Quentin Casasnovas gave a presentation at Vault 2016 on porting AFL to work on the kernel, with filesystems as the target. Last year's Vault conference also had a presentation on filesystem fuzzing using different techniques.

They began with a chart (slides [PDF]) showing the amount of time it took to find the first bug in various filesystems using three AFL instances running in parallel, which ranged from five seconds to two hours. As a demonstration, they had half a dozen USB sticks with various broken filesystems found by AFL. Nossum inserted one at random into his laptop, mounted the GFS2 filesystem, which seemed to mount just fine, then removed the USB stick. At that point, the laptop hung and was completely unresponsive. A bug in the GFS2 code, which was embodied in the filesystem image that AFL found, had evidently caused enough kernel corruption to hang the system.

AFL basics

Casasnovas then introduced fuzzing and AFL to the audience. The idea behind fuzzing is to use semi-random inputs to a subsystem or program to try to "trigger interesting behavior". AFL is a "genetic fuzzer" that uses branch instrumentation to find new paths through the program. It is "amazingly good" at finding deep and obscure paths through the code.

He showed a simple "lottery" program that would fail only once per 272 runs, so it would take that many tries in the worst case to tickle the "bug". He calculated that would take up to 124 billion CPU years. With AFL, the branch information will be used to find new paths through the code. The inputs that generate a new path are saved and other inputs are "mutated" to find even more paths through the code. The net result is that AFL takes only 2034 iterations in the worst case—just a few seconds of CPU time.

AFL uses a huge buffer of shared memory between the afl-fuzz program and its target. Each branch operation changes a value in the shared memory in such a way that a branch from A to B can be distinguished from a branch from B to A. At the end of the run, a checksum for the shared memory region is calculated to see if a new path has been generated.

Porting AFL to the kernel

For user-space programs, AFL requires a special compiler pass that wraps all conditional jumps in the generated assembly code with a stub that writes the branch-taken information into the shared memory region. The first approach for the kernel was similar, but there were some downsides. For one thing, patching the assembly code was architecture dependent. In addition, all registers needed to be saved by the stub since the generated assembly code does not contain enough information about the register use.

The second approach used the GCC patch written by Dmitry Vyukov for syzkaller. That patch runs after the GIMPLE intermediate representation has been generated, which is after any optimizations have been done, and adds a stub call at the beginning of each basic block. That is an architecture-independent solution and, since GCC knows the register allocations, there is no need to save all of the registers on each call.

The afl_stub() that is called does not take any arguments; it uses the return address to calculate an index into the shared memory. Only the lower bytes of the address are used, which could cause collisions, but "worked well enough" in practice. The index is calculated by XORing the return address and the previous return address, which is what allows AFL to detect the direction of the branch. The value at the index location is then simply incremented.

In order to support shared memory between the user-space afl-fuzz program and the kernel, a /dev/afl device was created. It supports mmap() so the user-space program can map the buffer into its address space.

Multiple AFL fuzzers can be run in parallel, each with their own shared memory. The changes that were made are fairly generic, so they could be applied to other parts of the kernel (e.g. USB). Casasnovas and Nossum targeted filesystems.

Applying AFL to filesystems

Nossum then took over to talk about how this all applies to filesystems. There are a few ingredients needed for AFL to fuzz a specific filesystem. The source directory in the kernel (e.g. fs/ext4 ) and configuration options to enable the filesystem (e.g. CONFIG_EXT4_FS=y ) are needed. Then a stub needs to be written to be called from afl-fuzz . There is also a need for a set of initial filesystem images.

The user-space stub is needed to set up the loopback device and mount point. It then needs to expand a sparse filesystem image to the full image and mount it. Then it needs to do some filesystem activity (open and read/write files, change extended attributes, and so on).

The filesystem images are needed to "seed" the process. AFL wants a test case where everything works as a starting point. It can then change things in the filesystem image to find new paths. Those images can also help drive the fuzzing in certain directions. For example, creating images with UTF-8 filenames would point AFL toward the Unicode support.

Nossum wanted to "emphasize that running a fuzzer is really easy". There is a top-level config.yml that needs to be changed to point at the AFL and kernel Git trees and possibly to a specific GCC version. From there, building and running AFL and the kernel is simply a matter of using a start script that is part of the code they will be releasing soon.

There are some challenges to fuzzing filesystems, however. Large filesystem images pose a problem because AFL works best with small input files (less than 1MB, preferably). Many filesystems have minimum size requirements larger than that, though. So sparse images are used, which have removed the "all-zero" areas since they probably represent unused space. Filesystem-specific compression could also be done to remove "uninteresting" parts of the image.

Internal filesystem checksums also pose a challenge. The fuzzer will change things in the image, but those values won't be reflected in the checksums. One possibility would be to comment out the checksum-verification code in the filesystem, though that could lead to introducing other bugs. It also means that the test-case images may no longer work on a stock kernel. A better idea is to calculate the correct checksums and modify the image before it gets mounted. Figuring out how and where to do that can take a fair amount of work, however.

The overhead of virtualization was another problem area. When using KVM, they could only run roughly 30 tests per second. So they turned to User-Mode Linux (UML), which allows running the kernel as a regular user-space program. The result was that they could run 60x more tests per second.

Running in the kernel environment can make each execution of the test slightly different. Ideally, each run should be deterministic and independent, but things like interrupts can alter that. In particular, interrupts during the mount process were clobbering the feedback buffer, so they ended disabling the instrumentation for interrupt routines.

The rate limiting that is done for printk() caused some state to bleed over between successive runs. They found that either disabling rate limiting or disabling printk() itself would produce more deterministic runs. In addition, disabling symmetric multi-processing (SMP) and preemption both helped make things more deterministic.

Next steps

One of the next steps would be to create a regression test suite using the images created by running AFL. Since these images trigger distinct code paths, they will be good tests as changes are made. For example, one could use 2000 images created by AFL and know that many paths are being tested.

They suggested that filesystem developers should keep track of images found by AFL. They can be used for regression testing or to generate coverage reports for the filesystem's code. Much of the work to do all of that has already been done.

Some other ideas are to do fault injection (for out of memory conditions, for example) to see what new paths are taken. The coverage reports can also be used to add new operations into the user-space stub. Nossum noticed that extended attributes were not getting any coverage at one point, so he added get and set operations for extended attributes, which resulted in "way more coverage".

There were suggestions from the audience that other test suites (xfstests or fsstress) might make good additions. Fast tests are desired, though, but there may be code snippets of use in those, Casasnovas said. So far, there has been no real need to go beyond the 20-30 system calls in the user-space stub, as bugs are still found quickly with what they have.

This work is all meant to be open source, Nossum said, but isn't yet. They are working on a release of the code and will announce it on various mailing lists (including linux-fsdevel, as suggested by Ted Ts'o) when that is done.

[ Thanks to the Linux Foundation for supporting my travel to Raleigh for Vault. ]

Comments (5 posted)