Snapshotting: A neat problem and solution in Haskell

I solved an interesting problem recently in Haskell.

Now, I’m no expert at Haskell, and this may not even be a very good solution. I got where I did through a lot of weird side paths, a lot of help from more experienced Haskellers on IRC, but I also made a lot of my own mistakes. It is, though, definitely an interesting solution. So I’ll describe it here.

Of course, this all came about to solve a particular problem. In this case, the problem was a multi-user dungeon (MUD). These programs are text-based network applications that allow users to connect and interact with a common virtual world.

I chose to represent the “world” using a nifty module called “software transactional memory” in the Haskell core library. STM is a great thing, and you should learn about it if you don’t know about it already. I found Simon Peyton-Jones’ paper at http://research.microsoft.com/~simonpj/tmp/beautiful.ps very useful. The general idea, though, is that mutable variables are encapsulated with a type constructor TVar , and can only be modified in a transaction.

I wanted to preserve a little of the pure functional flavor, minimize the use of TVars, and generally follow the philosophy of avoiding premature optimization; so the (incomplete) beginnings of the world look something like this:

data World = World { worldStartRoom :: TVar RoomId, worldNextRoomId :: TVar RoomId, worldNextPlayerId :: TVar PlayerId, worldNextItemId :: TVar ItemId, worldRooms :: TVar (M.Map RoomId Room), worldPlayers :: TVar (M.Map PlayerId Player), worldItems :: TVar (M.Map ItemId Item), worldPlayerNames :: TVar (M.Map String Player) } deriving Eq

Now, occasionally, it is also necessary to occasionally save the state of the world out to disk. That data is being constantly accessed and updated by bazillions of threads, and I need an internally consistent state. Great, because STM solves exactly that problem! This transaction will read the whole world, so it will certainly have its costs — but at least it’ll be correct, right?

Here’s where it gets ugly: STM transactions may not perform I/O operations. In other words, the code will need to first do all its STM stuff, save it off in memory somewhere, and only then begin writing to a file. After defining a bunch of data structures containing TVars (okay, only one for now, but that might change), it now appears that I would need to define an identical set of data structures that provide access to a copy of the world from outside of an STM transaction. In other words, it now looks like I need a SWorld , as follows:

data SWorld = SWorld { sworldStartRoom :: RoomId, sworldNextRoomId :: RoomId, sworldNextPlayerId :: PlayerId, sworldNextItemId :: ItemId, sworldRooms :: (M.Map RoomId Room), sworldPlayers :: (M.Map PlayerId Player), sworldItems :: (M.Map ItemId Item), sworldPlayerNames :: (M.Map String Player) } deriving Eq

I immediately don’t like this, because it looks identical to the definition of the earlier World, and in fact it is a requirement of the application that the two remain identical. That’s a lot of manual maintenance. Fortunately, we get type polymorphism to the rescue!

data World a = World { worldStartRoom' :: a RoomId, worldNextRoomId' :: a RoomId, worldNextPlayerId' :: a PlayerId, worldNextItemId' :: a ItemId, worldRooms' :: a (M.Map RoomId Room), worldPlayers' :: a (M.Map PlayerId Player), worldItems' :: a (M.Map ItemId Item), worldPlayerNames' :: a (M.Map String Player) } deriving Eq

So now World is a parametric type, with a type parameter a . But a is not just any old type parameter! It has a kind of (* -> *) . The earlier World type is now just World TVar . The snapshot type is a little uglier, but not much. It would be nice to have a simple identity type constructor, but such a thing doesn’t exist. (In the course of looking for it, I did come across a discussion about adding it into a forthcoming release of Haskell on the haskell-prime mailing list; but there are substantial arguments against it.) So in lieu of a true identity type constructor, I can just use the Identity monad. The snapshot type is now World Identity . Instead of having to make sure they both have the same fields, I know they do because they are the same type!

(One interesting change is that if I want to derive the Eq type class for World , I have to enable undecidable instances in GHC. I don’t really understand why, but it doesn’t confuse me; I’m doing some fairly bizarre stuff here.)

Now it is convenient to have a way to put things in and out of a TVar or Identity without worrying too much about which kind of World I’m using. A type class does the trick! Because dealing with TVar values must be done in STM, the operations are defined in the STM monad.

class Wrapper a where extract :: a b -> STM b inject :: b -> STM (a b) instance Wrapper TVar where extract = readTVar inject = newTVar instance Wrapper Identity where extract = return . runIdentity inject = return . return

Admittedly, that last instance for Identity looks odd. In inject , for instance, I want to take a value, wrap that in the Identity monad, and then wrap that in an STM block (not because it’s necessary, but because it’s part of the type signature for the class.) The extract function uses runIdentity to get the value out of the Identity monad, and then return s is in the STM monad to satisfy the type class. A little more boilerplate…

mkWorld a b c d e f g h = do a' <- inject a b' <- inject b c' <- inject c d' <- inject d e' <- inject e f' <- inject f g' <- inject g h' <- inject h return (World a' b' c' d' e' f' g' h') convertWorld (World a b c d e f g h) = do a' <- extract a b' <- extract b c' <- extract c d' <- extract d e' <- extract e f' <- extract f g' <- extract g h' <- extract h mkWorld a' b' c' d' e' f' g' h' worldStartRoom = extract . worldStartRoom' worldNextRoomId = extract . worldNextRoomId' worldNextPlayerId = extract . worldNextPlayerId' worldNextItemId = extract . worldNextItemId' worldRooms = extract . worldRooms' worldPlayers = extract . worldPlayers' worldItems = extract . worldItems' worldPlayerNames = extract . worldPlayerNames'

These functions handle the grunt work of dealing with various kinds of worlds. The first one defines a constructor that takes simple values, and automatically wraps them up in TVar s or the Identity monad. The second gets these things out of one kind of world and into another. The third through tenth extract the values of fields automatically, so that code isn’t littered with readTVar and runIdentity statements all willy nilly. (It may look like this overhead is worse than the cure. That would be wrong, because that last block of code would have to have been written even if I’d used a completely separate type for the snapshot. The only real added overhead is the class and instances for Wrapper, and that’s independent of type! That means if I change Player to have some TVar fields later on, there’s no extra code to write.)

So this accomplished my goal. And sure enough, it basically does what I want. In the end, though, the boilerplate was too much for me. I ended up turning to Template Haskell to handle it for me. The code now looks like this:

data World a = World { worldStartRoom' :: a RoomId, worldNextRoomId' :: a RoomId, worldNextPlayerId' :: a PlayerId, worldNextItemId' :: a ItemId, worldRooms' :: a (M.Map RoomId Room), worldPlayers' :: a (M.Map PlayerId Player), worldItems' :: a (M.Map ItemId Item), worldPlayerNames' :: a (M.Map String Player) } deriving Eq $(snapshottingType ''World)

And that looks to me like an elegant, maintainable piece of code.