Fuzz testing the Hat backup system

For Hat’s first fuzz test, I wanted to test something interesting and with a large scope. Something I could not expect quick-check to figure out in a few seconds, but where a fuzz test might have a chance if run over many days.

I went with a high-level end-to-end test of Hat’s snapshot mechanism. I am testing the code that stores a file’s metadata inside the backup system, as well as the code that recovers it. Here is the test:

fn metadata_test(info: models::FileInfo) {

if !info.name.is_empty() {

// Convert fuzzer-input to insertable file entry.

// The entry contains metadata like modified timestamp.

let entry = key::Entry::new_from_model(

None, key::Data::FilePlaceholder, info); // Setup a testing Hat.

let (_backend, mut hat, mut fam) = setup_family(); // Backup the file entry with no data contents.

fam.snapshot_direct(entry.clone(), false, None).unwrap(); // Complete a full snapshot.

hat.commit(&mut fam, None).unwrap();

hat.meta_commit().unwrap();

hat.data_flush().unwrap(); // Setup virtual file-system and verify the snapshot.

let mut fs = Filesystem::new(hat);

if let vfs::fs::List::Dir(files) =

fs.ls(&path::PathBuf::from("familyname/1"))

.unwrap()

.expect("no files found")

{

assert_eq!(files.len(), 1);

let mut want = entry.info;

want.snapshot_ts_utc = files[0].0.info.snapshot_ts_utc;

assert_eq!(want, files[0].0.info);

} else {

panic!("familyname/1 is not a directory");

}

}

} fn metadata_test_bincode(data: &[u8]) {

bincode::deserialize(data).ok().map(metadata_test);

} fuzz_target!(|data: &[u8]| { metadata_test_bincode(data) });

I can build and start it with Cargo-fuzz like so:

cargo fuzz run insert_file_bincode

Which gives an output roughly like this:

INFO: Seed: 3527004481

INFO: Loaded 1 modules (588119 guards): 588119

INFO: A corpus is not provided, starting from an empty corpus

#2 INITED cov: 881 ft: 877 corp: 1/1b exec/s: 0

#7 NEW cov: 1400 ft: 1472 corp: 2/54b exec/s: 0

#8 NEW cov: 2255 ft: 2743 corp: 3/89b exec/s: 0

#9 NEW cov: 2773 ft: 3749 corp: 4/159b exec/s: 0

#10 NEW cov: 2899 ft: 4179 corp: 5/195b exec/s: 0

#11 NEW cov: 4201 ft: 5638 corp: 6/4291b exec/s: 0

#19 REDUCE cov: 4201 ft: 5638 corp: 6/3226b exec/s: 0

#20 REDUCE cov: 4201 ft: 5638 corp: 6/3208b exec/s: 0

#27 REDUCE cov: 4201 ft: 5638 corp: 6/2696b exec/s: 0

#33 NEW cov: 4242 ft: 5842 corp: 7/2752b exec/s: 0

#40 REDUCE cov: 103425 ft: 103773 corp: 8/5257b exec/s: 0

#47 NEW cov: 103591 ft: 104604 corp: 9/7762b exec/s: 47

#53 NEW cov: 103619 ft: 104634 corp: 10/7808b exec/s: 53

(For longer runs, Rust’s release mode should be used for performance)

In the first iteration of this test, I did not check that the filename was non-empty. As a result, the fuzz test succeeded in inserting a file with an empty name. It turns out that I did not add any checks for this in Hat’s internal APIs and doing so is now on the TODO list.

As it turns out, Hat can insert files with empty names just fine. The difficult part would be restoring them on a file-system later :-)

So what is going on in this fuzz test?

In each iteration, the fuzz test produces an input vector with some bytes in it. The test then tries to parse it as a Rust struct representing file metadata. It does so using the Rust serialization library Serde and the data format bincode. Serde’s flexibility allows me to easily choose the serialization format, so I am using bincode for the fuzz test even though Hat uses cbor internally. To me, bincode seems less restrictive and I am guessing the fuzz test will find it easier to produce valid inputs in bincode than cbor (To verify that assumption later, I added cbor and JSON variants; I will go over the results in a later post).

If the fuzz test’s input data can be parsed as the wanted struct, the test goes on to check whether the filename is non-empty. If so, the metadata is valid enough and the test proceeds to simulate a snapshot of a virtual file with the given metadata, proceeds to do a basic checkout of the file and verifies that the metadata matches my expectations.

I am hoping this test will eventually find some strange combination of file metadata that somehow breaks the system in an interesting way :-)

What to expect

To start with, the inputs will be random like those of a quick-check test. Once the fuzz test finds something that parses correctly, that will trigger new coverage, and the fuzz test will remember that input and use it for future guesses. When the running fuzz test outputs NEW the input tested in that iteration has reached a previously unseen location in the code.

The space of all possible metadata objects is large. And while the most interesting values for something like the modification time are likely few (min, 0 and max) there could be interesting combinations of values that might break something. Or there could be interesting individual values I have not yet thought of.

I will check back in a couple of weeks to see which parts of the code this test was able to exercise and whether it found something.

Example inputs

I want to share some examples of inputs this test produced with you, to give you an idea of what the fuzz test is doing.

This is the very first input that the fuzz test chose to keep:

> hexdump fuzz/corpus/metadata_test_bincode/483ceba1...

0000000 ffff ffff ffff ffff ffff ffff ffff ffff

*

0000020 ffff 0a0a

0000024

This does not deserialize into a valid struct, but from the fuzz test’s perspective getting rejected is a interesting too.

The first input that correctly parsed to a struct, gave this result:

FileInfo {

name: "",

created_ts: 0,

modified_ts: 0,

accessed_ts: 0,

byte_length: 0,

owner: None,

permissions: None,

snapshot_ts_utc: 0

}

This struct has an empty name, so it is only 1 step further to exercising the actual test. I let the test run for a couple hundred iterations more, and found an input that does exercise the test as intended:

FileInfo {

name: "\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}"

"\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}"

"\u{4}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{4}\u{0}"

"z\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}",

created_ts: 0,

modified_ts: 0,

accessed_ts: 0,

byte_length: 0,

owner: None,

permissions: None,

snapshot_ts_utc: 0

}

This is definitely not an everyday filename, but it passed the check and this input brought the fuzz test from a coverage of 4'857 code-points, to 107'869 code-points.

After running the fuzz test a bit longer, it has found 177 interesting inputs. It has not yet managed to flip one of those None values to a Some value, but it has found non-trivial inputs like:

FileInfo {

name: "\u{0}\u{0}\u{0}\u{0}\u{0}#...",

created_ts: 7307217257065611264,

modified_ts: 7310874267742461811,

accessed_ts: 1933205832,

byte_length: 3439329280,

owner: None,

permissions: None,

snapshot_ts_utc: 0

}

This input provides an incorrect file size hint (byte_length) of 3'439'329'280 bytes. The file size hint does not have to be accurate, since the file could change while reading it anyway. That is an interesting case to verify :-)