Fuzz!

Automatically generating large numbers of oracles, sometimes randomly, sometimes with constraints, has a name: fuzz testing. There are many ways to make and work with fuzz:

Cheat → artillery-plugin-fuzzer Fuzz blindly → zzuf , radamsa , sinkdweller , fuzz-me-maybe Evolve your fuzz → afl-fuzz , js-fuzz Fuzz with a grammar → dharma , octo , fuzzur , juzz Fuzz properties → test-check , babel-plugin-transform-flow-to-gen Fuzz models → fast-check Go meta! Fuzz test test suites → stryker

We shall consider these options in turn. For links to the libraries, please refer back to this list.

1. Cheat

The simplest way to get started with fuzzing, and possibly the most likely to yield a good return on investment (since the investment is so small) is to cheat.

Cheating means not generating oracles automatically but using a BIG LIST OF NAUGHTY STRINGS, crowdsourced test cases which are known to cause issues for insufficiently robust systems, for example:

1;DROP TABLE users 1'; DROP TABLE users —

Ṱ̺̺̕o͞ ̷i̲̬͇̪͙n̝̗͕v̟̜̘̦͟o̶̙̰̠kè͚̮̺̪̹̱̤ ̖t̝͕̳̣̻̪͞h̼͓̲̦̳̘̲e͇̣̰̦̬͎ ̢̼̻̱̘h͚͎͙̜̣̲ͅi̦̲̣̰̤v̻͍e̺̭̳̪̰-m̢iͅn̖̺̞̲̯̰d̵̼̟͙̩̼̘̳ ̞̥̱̳̭r̛̗̘e͙p͠r̼̞̻̭̗e̺̠̣͟s̘͇̳͍̝͉e͉̥̯̞̲͚̬͜ǹ̬͎͎̟̖͇̤t͍̬̤͓̼̭͘ͅi̪̱n͠g̴͉ ͏͉ͅc̬̟h͡a̫̻̯͘o̫̟̖͍̙̝͉s̗̦̲.̨̹͈̣

Install artillery and artillery-plugin-fuzzer from NPM for a quick way to fire naughty strings at your application. Artillery expects a piece of configuration like the following:

This tells Artillery to fire a username and password at the session endpoint, taking the username from the list of naughty strings. Artillery will automatically do this for every string on the list.

Simply verifying that your server does not crash should provide some additional confidence. However, the best results with this type of fuzzing (and the next) are obtained if the code under test contains runtime assertions. Runtime assertions can check things like preconditions (does the state required for the current operation exist?), postconditions (did the operation do what was expected?) and invariants (some things that should never change). With runtime assertions in place, fuzzy tests will catch not just unhandled exceptions, but also broken business logic.

2. Fuzz blindly

For truly machine-generated oracles, turn to the command line utilities zzuf or radamsa . They are mutational fuzzers, which means they accept some input and output a fuzzy (randomly altered) version of it. They pay no heed to datastructures or encoding, which has benefits and downsides. The benefit is that they are more likely to produce edgier edge cases; the downside is that because most inputs will be rejected right away, they will need to run much longer before they catch anything (sometimes weeks…).

The following script will produce 100 variants of the string “hello”:

Note that each time we run the program we pass it a seed. Most fuzzers generate their randomisations in a deterministic fashion based on a given seed. That means if we find some output that triggers a bug we can easily recreate it using the seed.

Radamsa works similarly to zzuf but has a different fuzzing style. It tends to produce more wildly varying output more quickly. You can also find a Node wrapper for radamsa on NPM ( sinkweller ), together with a handy module ( fuzz-me-maybe ) which allows you define an insertion point for the fuzzed data other than your program’s main entrypoint:

Data passed into output will now be fuzzed before being passed to stream.write . Using this library, fuzzing can be turned on and off with environment variables.

Again, remember that this sort of fuzzing benefits greatly from being combined with runtime assertions.

3. Evolve your fuzz

What if instead of fuzzing blindly we had some way of automatically looking into our program and figuring out what types of input are most likely to cause interesting behaviours. We do, and its very simple: coverage analysis. A group of fuzzers sometimes known as “genetic fuzzers” keep track of the code coverage generated by each input and select the inputs which increase code coverage as material for further mutations. The oldest and most famous of these is American Fuzzy Lop (AFL), whose algorithm is sketched below:

Schematic of AFL algorithm

Because AFL relies on code instrumentation to track coverage, every language requires its own port of the program. The original AFL was written for C applications. It is possible to use this on JavaScript applications today by compiling Node from scratch with the AFL tools for C++, but this approach is not recommended. Fully functioning AFL ports exist for Go, Rust and Python: why not JavaScript?

Popularity of AFL ports in different languages

The best I have been able to find in JavaScript is connor4312/js-fuzz . This library is complete enough that it may function for some situations, but it needs more work and you are likely to experience bugs. Now would be a great time to get stuck in!

4. Fuzz with a grammar

You can pursue a more targeted fuzzing strategy by using a grammar to define precisely the shape of the fuzz that you want to see. Mozilla’s dharma is the simplest I have found. It is generative rather than mutational: you write a specification for the kind of data that you wish to produce and dharma will spew out data to satisfy the specification. It does not need any input data to “mutate”.

Writing a proper specification for an input type can be a little tricky but luckily various grammars already exist. Here is one for URLs, for example, that runs to 326 lines. Dharma can also be used to generate valid strings of JavaScript that you can eval in your application context. Check out this example of using dharma to generate valid invocations of a JavaScript function.

Now, that might seem like a long winded way to generate native JavaScript datatypes, and it is. Why not generate them directly inside the JavaScript context? Another library from Mozilla, octo does in fact include some in-JavaScript generators for things like URLs but is rather limited in scope. On the other hand, if the extent of your desire for grammar is to restrict output to certain specific JavaScript native types, and you are not concerned about things like forming valid URLs, you have a few more options. The simplest is fuzzur which will just take a JavaScript object and mutate it, preserving the types. More elaborate and potentially more useful is juzz , which promises to generate objects conforming to joi specifications. It works well for many Joi types but it is incomplete and not in active development. Why not pick it up? (Notice a pattern here!?)

Another group of testing libraries takes the idea of generating native JavaScript types much further, providing more fine-grained tools for doing so and including a harness for doing something called “property-based testing”. Let’s take a look at them now.

5. Fuzz properties

In this section we will look at property-based testing or generative testing (the two are synonymous), which sits somewhere between the fuzzing techniques described above and normal unit testing.

What is property-based testing? It is easiest to understand by looking at an example. Let’s take test-check . Here we are testing a function that is supposed to return the sum of two numbers or ten, whichever is higher. The developer accidentally commented out the functionality to make this work:

The test has two arguments: an array of generators and an assertion. Here we ask for two randomly generated integers with gen.int , use them to call our function and assert on the result. Plugins exist for integrating such tests into different testing libraries. Using Jest, for example, we can write the same test like this:

A wide variety of generators are available corresponding to different general types such as string , number , boolean , array , but also more specific types such as posInt , negInt , asciiString , JSON . It is also possible to define custom generators using combinations of these or any other generation tool (you could combine it with dharma , for example).

The output of this test looks like this:

The key fail records that the test failed with input [21, 8] . What about the key shrunk ? Here we approach what makes property-based testing really powerful. A common feature of these libraries called “shrinking” means that when a failure is detected a search is performed to find the smallest value that will trigger the failure. In this case [3, 8] was found. Having a minimal test-case often makes it much easier to locate the source of the problem.

A potentially annoying feature of property-based testing is the need to define the input types for the functions under test. If you are using Flow or Typescript, this is particularly vexing as the types have already been defined once. Some headway has been made to tackle this problem. If you are using Flow, try babel-plugin-transform-flow-to-gen on NPM: it will enable you to convert your types into generators for test-check , although, be warned, the library is not actively developed. I am not aware of anything comparable for Typescript (again, if you are looking for a project…). If you are using Joi you can easily create a generator to do property-based testing of your endpoints with juzz .

The really difficult part of property-based testing, which takes practice, is finding good assertions. In particular, it is important to avoid simply reimplementing the code under test. The following are some categories of useful assertion that you can use for inspiration:

Boundaries: test that a result always falls within certain known bounds. This was our example with addTenMax .

. Complex optimisations: if some code does something simple in a very complicated way to make it fast, you can include the simple implementation in your test and check the outputs match.

Complementary pairs: if you have pairs of functions such as encode and decode you can check that encoding then decoding reproduces the original input.

and you can check that encoding then decoding reproduces the original input. Idempotency: you can check that calling a function more than once does not alter its output (unless it is supposed to).

Commutativity: you can check that the order of arguments does not alter the result. For example, sum functions should behave this way.

functions should behave this way. Invariants: you can check invariants peculiar to your business logic.

If you are considering trying fuzzy testing for the first time, property-based testing is an excellent place to start. The feedback loop is much faster than for higher-level fuzzing and the JavaScript tooling is in a better state. Property based testing has also achieved a relatively high adoption rate in other ecosystems, notably Haskell and Clojure which include property based testing tools in their core libraries. Python, too, has a property-based testing library ( hypothesis ) that outstrips JavaScript’s offerings in terms of popularity and polish:

All the indications are that property-based testing has a lot to offer. The following claim from the Clojure documentation puts the case fairly bluntly:

Property-based, generative testing, as implemented for Clojure in test.check, has proved to be far more powerful than manually written tests.

Proved to be far more powerful? Have we been missing a trick in JavaScript land?

You will note that fast-check is the most recently updated JavaScript library, although the least popular. It is probably the most promising option at the moment and is particularly notable for including an exciting feature called “model-based testing”, which we shall look at now.

6. Model-based testing

Model-based testing is a form of generative testing designed for applications with complicated state, such as UIs. Many bugs in such systems require a complicated series of steps to be repeated in order to get the application into a faulty state. For example, a login page may function correctly most of the time, but may fail if the user takes a specific, complicated path through the application. To cover such scenarios, instead of generating inputs to functions, we generate operation sequences. We then check that the operations work as expected no matter what position in a sequence they hold.

Concretely, this approach requires us to define some operations we can do in the application along with their preconditions and postconditions. For example, the operation ‘click login’ might have a precondition ‘is on the login page’ and a postcondition, ‘is logged in’. The testing framework generates a valid random sequence of operations ensuring that the ‘click login’ operation always follows another operation whose postcondition is ‘is on the login page’. If any postcondition fails, the test has detected a bug.

Model-based testing algorithm (diagram by Nicolas Dubien)

Next time you are building user journeys with TestCafe or Cypress, spare a thought for fast-check , which can generate these for you!

7. Fuzz test your tests

The major obstacle standing against the adoption of all of these forms of oracle-problem-busting test techniques is the general lack of experience with them in JavaScript community. Without examples of successful applications of the techniques on JavaScript projects, and without battle-hardened tools, we can hardly be sure they are worth the time. Companies need to invest in both R and D in this area.

That is not all: the difficulty of knowing whether tests provide value actually cuts much deeper than the particular issue with fuzz testing.

The popularity of test coverage monitoring indicates a widespread desire to monitor how well an application is tested, yet coverage is a notoriously poor indicator of test quality. Coverage measures how much code was executed, but ignores whether that code was actually asserted against. The constant argy-bargy over what constitutes good testing, over when testing is too much, or not enough, or too integration-y, or too unit-y, reflects a widespread uncertainty concerning the impact any tests really have. Wouldn’t it be nice if we had some way to objectively measure — or test — the quality of our tests? A classic oracle problem…

Enter “mutation testing”. To explain what mutation testing is I cannot do better than quote the introduction on the website for stryker , currently the only available mutation testing toolkit for JavaScript:

Bugs, or mutants, are automatically inserted into your production code. Your tests are ran for each mutant. If your tests fail then the mutant is killed. If your tests passed, the mutant survived. The higher the percentage of mutants killed, the more effective your tests are…It’s really that simple.

This might sound a bit odd. Stryker will deliberately insert errors into your code, change return values, change one operator to a different one, etc…Think of it as mutational fuzz testing, but the input is your code! The idea — and it is a great one — is that if your tests continue to pass after your code is mutated, they are bad tests.

Stryker is, again, not a mature library, although it is being actively developed and has fairly considerable weight behind it. It is well worth a try.