Property Based Testing And Riak

This is a blog about the development of Yeller, The Exception Tracker with Answers Read more about Yeller here

Yeller stores exception data entirely in riak; the availability properties are very nice, and exception data fits into an eventually consistent world rather well. Riak’s eventually consistent nature means you have to write a few bits of application code not found when using traditional storage: a merge function per datatype, and often, serialization logic. These two pieces can be rather tricky to get right, if your merge function is incorrect you can throw away user data arbitrarily, corrupt your data in weird ways, and generally mess up. If your serialization logic is messed up, it can get even worse, completely corrupt/unreadable data can occur.

Siblings and the merge function

The way I prefer using riak, is with allow_mult=true . This means that whenever you have causally conflicting writes to a key, riak will store all of them, and upon your next read of that key you have to resolve them yourself. Designing your datatypes such that you can merge them is a huge topic, and an area of active research. However, even once you have a merge strategy worked out, how can you be sure that your reasoning is good? The merge functions you use have to obey a few properties: they have to be commutative, idempotent and associative, or you’ll mess things up when you have conflicts:

Commutativity

merge a b == merge b a

Commutativity says that it doesn’t matter which order you supply the arguments to the merge function, you should always get the same values.

Associativity

merge a (merge b c) == merge (merge a b) c

Associativity says that the order your merge functions are applied to your data, doesn’t matter, so long as the overall order of data elements is the same. This is a tricky property, but very powerful, especially when combined with commutativity. Together, they mean that you can take any set of siblings and merge them together in any order, and always result in the same value.

Idempotency

merge a a == a

This says that your merge function, applied to the same value twice, has to result in the original value. Riak siblings can actually be duplicated data. Imagine you submit two writes that modify the stored data in the same way at the same time, riak’s sibling detection will spit out both siblings and ask you to resolve them upon your next read.

Enter property based testing

The best method I’ve found for testing these properties of a merge function is to use property based testing. The basic idea of property based testing is:

write a generalized property of your code, that takes in a certain number of inputs, and specifies how to generate inputs randomly generate inputs and check that the property holds (optional) upon failure, try to minimize the inputs to find the smallest failing case

If you want much more detail on property based testing, John Hughes talk from Clojure West is most excellent.

It’s pretty easy to write property based for merge functions. Here are worked examples, using clojure’s test.check library (here, I’m assuming we have a test.check generator called gen-my-data that generates random samples of the data type you want to test):

Commutivity

(defspec test-my-data-merge-is-commutative (prop/for-all [a gen-my-data b gen-my-data] (= (my-merge a b) (my-merge b a))))

You can see that this maps directly to the mathmatical property, and gives you pretty reasonable confidence that your merge function is correct in this aspect.

Associativity

(defspec test-my-data-merge-is-commutative (prop/for-all [a gen-my-data b gen-my-data c gen-my-data] (= (my-merge a (my-merge b c)) (my-merge (my-merge a b) c))))

Likewise, this checks for associativity, looks exactly like the mathmatical property expressed as code, and gives you great coverage.

Idempotence

(defspec test-my-data-merge-is-commutative (prop/for-all [a gen-my-data] (= (my-merge a a) a)))

All in all, property based testing is, in for the current tool set, perfect for testing riak’s merge function. It’s not difficult to write a random generator for your data type, and writing these properties is trivial. You gain pretty great confidence, and your assumptions about the design of the merge function will be well tested.

Serialization

Serialization is another thing you have to test when using riak. Yeller uses fressian for serialization, and serializes a number of custom data types (I’ll do some more detail on fressian in a future post, with Yeller’s actual benchmarks that compelled a switch). As such, I have a reasonable amount of custom serialization code (314 lines, at the time of writing). Serialization is another case where property based testing is amazing - because it lets you roundtrip a bunch of samples through bolth serialization and deserialization, checking for equality against the original value.

An example test.check test might look like this:

(defspec test-my-data-serialization-roundtrips (prop/for-all [a gen-my-data] (= (deserialize (serialize a)) a)))

This lets you easily check that you can at least roundtrip correctly, and if your generator produces good enough examples, you can be pretty sure that your serialization logic works.

That’s about all I do at the moment with property based testing and riak. I’d love to hear about your use of property based testing, or how you test code that interacts with riak, you can mail me at tcrayford at yellerapp dot com , or toot at me: @t_crayford.

This is a blog about the development of Yeller, the Exception Tracker with Answers. Read more about Yeller here