Testing Against Datomic

This is a blog about the development of Yeller, The Exception Tracker with Answers Read more about Yeller here

Yeller uses datomic for things like user account data, billing info, etc. Basically anything where consistency is required, and arbitrary write scalability/availability isn’t.

Recently somebody in the #datomic irc channel asked about testing strategies with datomic. I’ve been very very happy with how well the design of datomic fits with how I write tests, and how well it fits with the way I’ve been doing software design on my own projects for the past year or so, so I thought I’d write it up.

This isn’t an introduction to datomic or clojure at all, so there’s a bunch of assumed knowledge here.

So, generally there are two main kinds of interaction one wants to test with any database:

getting data out (query)

putting data in (transactions)

Testing these two are inherently coupled by the stateful nature of databases - what’s the point of testing that you can put something in without testing that you can get it back out again? Likewise most queries require some data in the database to be usefully tested.

However, datomic has three features that let you test it in a manner that’s more powerful, dramatically faster, but still fully compliant with a real database

You can do speculative transactions, that return a new database value, so teardown is done by the JVM’s GC You can run a fully compliant database in memory, with an in memory hashtable as the “storage” Transactions are just data, so your code that makes transaction data is pure

So a typical test that involves datomic for yeller looks something like this:

Generate some transaction data Speculatively apply it to the database with datomic.api/with Query against the resulting database

All of these steps run fully in memory, but do full schema checking, and use the real production queries. This means these tests run incredibly fast - the typical time I see is around 1.5ms per test. This enables a very fast feedback loop - in fact, I can run all of yeller’s datomic facing tests in under 300ms, so I don’t even think after starting a run (see Gary Bernhardt’s post on feedback cycles and their impact on the day to day part of writing software).

So what does that actually look like?

So the first thing is that datomic transactions are just data. As such, you can build transaction data in completely pure code - there’s no need for I/O there at all. To get the testing workflow outlined above, you have to separate constructing transaction data from submitting it. Once you’ve split the two bits apart like that, you can test transactions speculatively, without mutating a database at all. This also has some nice impacts on your application - for example you can use concat to compose two bits of transaction data into a single larger transaction.

Yeller has a helper fn that sets up an in memory db with schema, and returns a connection. This is fast enough to run in unit tests: https://gist.github.com/tcrayford/9162211 (97.5th percentile is 267.4 microseconds). (there’s an outlier there, because the very first time you add the schema takes a bit longer)

Once you’ve done that, run the test, assert against the results. Pretty simple. An example from yeller’s tests:

( deftest creating-a-user-saves-their-email-and-full-name ( = "Joe" ( :user /name (find-by-email (speculate (create-new-user-transact - 1 ; arbitrary tempid { :fullname "Joe" :email "joe@example.com" :encrypted-password "encrypted-pass" })) "joe@example.com" )))) ;; with is just a simple helper function ;; that wraps datomic.api/with: ( defn speculate [t] ( :db-after (d/with (d/db (empty-db)) t)))

That’s a reasonable amount of code, but something you can see just from the shape - this test is purely functional: it creates a speculative value of the database using some transaction data (created by production code), and then uses a production query against that and asserts that a value is true.

Compared to the traditional relational database (postgres, mysql et al)

I’ve been writing tests against relational dbs for 4 years or so (nearly all of them in Ruby on Rails, but a few in Haskell). Compared to testing against datomic, tests against a relational db lose in the following ways:

they require vastly more setup/teardown. Often this is automated away by the testing tool you’re using, but with datomic, my only setup is ensuring I have the schema setup correctly, which I do in code anyway (and it’s trivial to do). I’ve wasted a lot amount of time with rails by forgetting to run rake db:test:prepare after performing a migration, for example.

after performing a migration, for example. they require me to run an extra process, and interact with it over the network (yes, you can use sqlite in memory, but I really don’t want to waste time worrying about the differences between postgres/sqlite [assuming production is postgres])

dramatically slower - typically in the 10s-20s of ms per test (for very simple tests!). This means my full suite would go to the tens of seconds as opposed to the sub 1s Yeller gets right now

parallelizing the tests requires sharding runners across multiple databases. This isn’t something I’m worried about now, but I know that my datomic facing tests are trivially parellel - they only do purely functional data manipulations, so I’m not worried about if if I ever do need to do that for speed reasons.

Overall, I’ve been very happy with testing against datomic - it’s much easier than testing against a typical relational database, orders of magnitude faster, and significantly easier to reason about as well.

Many thanks to Bobby Calderwood and August Lilleaas for reading drafts and suggesting improvements.

This is a blog about the development of Yeller, the Exception Tracker with Answers. Read more about Yeller here