



Distributed Systems in Haskell

Assorted Tips and Tricks

I recently completed UT Austin's Distributed Computing class, taught by Lorenzo Alvisi. My project partner Pato and I decided to give it a go in Haskell, and ended up using Haskell for all project assignments (Chandy-Lamport, Paxos, and Bayou).

This turned out to be an excellent idea. We put in a fraction of the time most implementations (in Java, Python, or C++) required.

This article represents a summary of what I learned over the course of the class, as well as an example program applying these principles. Some of this is Haskell-specific, and some is more general.

TL;DR:

Only ever block if there are no messages whatsoever waiting for your server. Don't use interrupt-based timeouts. Separate your server logic and any networking. Try to have pure server logic. Use Monads to simplify your code as it gets bigger. Use Cloud Haskell and Lenses and other nice libraries to simplify your life and your code.

Haskell-nonspecific Advice

1: Do Not Block

Every time we used multiple blocking reads from the network, it came back to haunt us. For example, we would send a Ping and wait for a Pong before continuing. This leads to all sorts of bad behavior. What happens if both servers Ping each other at the same time? Deadlock. Even blocking reads that seemed innocuous at first usually led to confusion and race conditions later on.

Instead, use an asynchronous architecture. Your program should block in exactly one place. Each node in the system should have a "superloop" that performs blocking reads from the network using an efficient epoll -like mechanism and then dispatches messages appropriately.

Why?

It may seem like this architecture introduces uneccesary logical complexity compared to a bit of blocking sprinkled throughout the code, but in every instance we came across, blocking in exactly one place turned out to be much easier in practice. We eliminated all race conditions we came across and maximized performance by eliminating any unneccesary delays. A server that only blocks in one place is guaranteed to process any waiting message the moment it has free CPU cycles.

2: Use Asynchronous Timing

Some algorithms (especially probabilistic ones) rely on things like timeouts.

In our experience, implementing timeout and other time-based behavior as a blocking read with a timeout is a recipe for confusion.

Instead, we found that the best approach was to spawn, for each node, a separate "tick generator" thread. This "tick generator" simply sends its parent thread an empty Tick message at a given frequency.

Why?

There are several advantages to this approach.

All timeout logic is handled with standard run-of-the-mill server logic. There are no interrupts, timers, or exceptions to deal with. You just process Tick messages like any other message. You need to keep track of separate timeouts for separate servers. This approach vastly simplifies doing so. You just keep a map from servers to the number of Tick s since you last heard from them. If this number gets too high, it's a timeout. Compare this to any solution that involves reads with timeouts or checking system time. In our experience, this was much simpler. This architecture unifies two different timeout scenarios. One scenario is a complete loss of incoming messages, which normally has to be dealt with by using a blocking read with a timeout. The other scenario is when the server still receives messages on a regular basis, but doesn't receive messages from a particular server for a long time. Blocking reads with timeouts won't cover the latter failure case. Tick s cleanly handle both cases.

3: Separate Networking and Logic

This is a somewhat specific architectural recommendation; no doubt there are algorithms where this advice does not apply. However, in all our use cases, this approach worked very well (and we tried quite a few approaches).

Distributed systems papers are often as poorly written as they are clever. The included code rarely works properly, if at all. One bad habit that these papers tend to have is the thorough mixing of network operations and algorithm logic. It's pretty common to see things along the lines of

send(server,msg); x = receive_msg(); y = process(x); send(server,y);

but with a lot more junk thrown in.

It turns out that this is not conducive to clean, understandable code. There's a lot of implicit state being offloaded to the network when you structure things like this, and it makes it a lot harder to recover from things like network interruptions or servers going offline. You end up using timeouts and all sorts of ugly constructs to make things work in practice.

Instead, you should completely separate your server logic and your network functionality. Again, this might sound like a lot of work, but it's almost guaranteed to save you more time in the long run.

In Haskell terms, your server logic will (ideally) have a type like this:

serverStep :: Config -> State -> Message -> ( State , [ Message ])

In prose, the server logic takes three arguments:

The server's configuration, which does not change (Hostname, directory, etc.)

The server's previous state

A message received from the network

The server logic then returns

The new server state

A list of messages to send

Then you just have to write a simple wrapper around this function that receives messages from the network, feeds them into the function, and sends the responses out to the network.

With a bit of work, any program requiring a sequence of sends and receives can be transformed into this form (a single receive followed by arbitrarily many sends), so even an the most stubbornly ugly distributed paper can be adapted to this form.

Why?

This form guarantees that you meet suggestion #1 and get all the advantages of doing so. In particular, your server will never block unless there is nothing in the incoming message queue. Therefore, your server will process any incoming messages the instant it has free CPU cycles. Network code is simpler. There's just one place you send and receive messages, and it's very straightforward to implement. Testing is much easier. When your server logic is a pure function as described above, server behavior is entirely deterministic and much more amenable to testing. It's easy to build a test harness that "simulates" the network. All you have to do is keep a list of all your servers' states and a queue of messages waiting to be delivered. A test harness looks like this:

while queue is not empty: pop msg off queue (new_state, new_msgs) = serverStep configs[msg.dest] states[msg.dest] msg states[msg.dest] = new_state put new_msgs into queue

If you want to test things like out-of-order message delivery, you just mix up your queue instead of putting messages in in order. You have complete control!

When we implemented Paxos (with some pedagogical shortcuts), this approach saved us a lot of trouble. We would run the simulation and check the Paxos invariants at each step. We found a lot of bugs this way! This would be very difficult on a real network.

Haskell-specific Advice

1. Monad It

I suggested earlier that we use this type for server logic.

serverStep :: Config -> State -> Message -> ( State , [ Message ])

If you want to compose multiple serverStep -style functions, you have to feed the Config to both, feed the State output of one function into the next, and concatenate their [Message] outputs. This is super boring and easy to mess up by typing something wrong. Well, check this out...

The Config -> ... behavior is described by the MonadReader typeclass.

The ... State -> ... -> (State, ...) behavior is described by the MonadState typeclass.

The ... -> (..., [Message]) behavior is described by the MonadWriter typeclass.

Basically, for those of you who haven't used Monad transformers a lot, the act of chaining together functions like serverStep is super predictable and boring and syntactically noisy. Because Haskell is really flexible, we can write stuff that does this chaining automatically so we don't have to. In this case, the chaining behavior we want is described by MonadReader (all functions get the same config), MonadWriter (concatenate all output message lists), and MonadState (pass the state output of one function into the state argument of the next function).

We can easily transform between these two types:

serverStep :: Config -> State -> Message -> ( State , [ Message ])

type ServerMonad m = ( MonadReader Config m, MonadWriter [ Message ] m, MonadState State m) serverStep :: ServerMonad m => Message -> m ()

If you can't have a pure server function (e.g. because you need database access), you can also translate between e.g.

serverStep :: Config -> State -> Message -> IO ( State , [ Message ])

type ServerMonad m = ( MonadIO m, MonadReader Config m, MonadWriter [ Message ] m, MonadState State m) serverStep :: ServerMonad m => Message -> m ()

You get all of these MonadWhatever instances for free from RWS Config [Message] State (pure) and RWST Config [Message] State IO (impure), so we don't actually have to do any work here. RWS(T) is short for "Reader/Writer/State (Transformer)".

Here's a quick bidirectional reduction proof showing that these two type signatures (the function and the Monad) are semantically equivalent:

logic :: Config -> State -> Message -> IO ( State , [ Message ]) logic = \cfg state msg -> execRWST (logicM msg) cfg state logicM :: Message -> RWST Config [ Message ] State IO () logicM = \msg -> RWST (\cfg state -> logic cfg state msg >>= (\(s,w) -> return ((),s,w)))

In other words, one can always be implemented in terms of the other.

Why?

This one is harder to explain if you're less familiar with Haskell. The question here is why

serverStep :: Config -> State -> Message -> ( State , [ Message ])

is worse than

type ServerMonad m = ( MonadReader Config m, MonadState State m, MonadWriter [ Message ] m) serverStep :: ServerMonad m => Message -> m ()

in many cases.

First, let's do a quick comparison. Both of these do the same thing.

sendMsg1 :: Config -> State -> Message -> ( State , [ Message ]) sendMsg1 cfg state msg = (state, [ ForwardToAlice msg]) sendMsg2 :: Config -> State -> Message -> ( State , [ Message ]) sendMsg2 cfg state msg = (state, [ ForwardToBob msg]) sendBoth :: Config -> State -> Message -> ( State , [ Message ]) sendBoth cfg state msg = (state'', output1 ++ output2) where (state', output1) = sendMsg1 cfg state msg (state'', output2) = sendMsg2 cfg state' msg

sendMsg1 :: ServerMonad m => Message -> m () sendMsg1 msg = tell [ ForwardToAlice msg] sendMsg2 :: ServerMonad m => Message -> m () sendMsg2 msg = tell [ ForwardToBob msg] sendBoth :: ServerMonad m => Message -> m () sendBoth msg = do sendMsg1 msg sendMsg2 msg

As you can see,

After the initial syntactic cost of defining ServerMonad and making an implementation that satisfies ServerMonad (the easiest is using RWS(T) , which already does the work for us), our type signatures are much shorter and there's less syntactic noise from composing server actions. Besides composing server actions requiring less code, it's also prone to fewer errors. Imagine how easy it would be to accidentally write output1 ++ output1 or state' instead of state'' . sendBoth is much clearer in the Monad version.

As with many things Haskell, there's a somewhat higher upfront cost, but it makes your life a lot easier as your programs scale up.

Side note: If you need high performance, you will want to use a slightly different approach. The W part of RWS(T) is relatively slow. See this thread. Basically just use State(T) instead of RWS(T) for the best performance.

2. Use Cloud Haskell

Cloud Haskell is a library designed to make distributed systems development in Haskell obscenely easy. It takes advantage of Haskell's tremendous multithreading support and very powerful type system to allow for safe, fast, and simple message passing and control over a network. It does all the hard work of building a messaging layer, and it does it well. It basically took all of Erlang's features and ported them to Haskell.

Even type-safe serializing and deserializing is pretty much automatic with Haskell + Cloud Haskell these days. You tell the compiler what your intentions are ("I want to send values of this type over the network.") and it does all the work for you. I've written a demo app that uses Cloud Haskell at the bottom of this post.

3. Use Lenses

Many distributed systems algorithms are described in a very imperative way. It turns out that it's very easy to write imperative-style programs in Haskell in a very disciplined way.

Lenses are a Haskell concept that are reasonably well described as setters and getters on steroids. They are objects that describe how to pull information out of and put information into data structures. For example, I could make a lens that describes how to get the second value out of a tuple and how to put something else in its place. (Turns out that this exists and is called _2 .)

Because Lenses are so well structured, one can programatically compose and manipulate lenses in very interesting ways.

One of the interesting things the lens package exports is a series of operators that interface with MonadState and allow us to write code that looks just like regular imperative code. For example, if our State had a field called counter that held an Int , we could write

incrementAndDouble :: ServerMonad m => m () incrementAndDouble = do counter += 1 counter *= 2

Example

Let's write a simple distributed application. We'll write some servers that send each other Bing s every once in a while, and if they get a Bing from someone, they send back a Bong . Each server counts the number of Bing s and Bong s it's received thus far.

The complete code is here.

I'll skip over all the imports and stuff, but you can check them out at the link above.

First, let's write out the types.

BingBong is the type of a Bing or a Bong . We'll tell GHC to make a Show instance for BingBong so we can turn it into a string for printing. We'll tell GHC to make BingBong Typeable so Cloud Haskell can safely send it over the network. We'll also tell GHC to make BingBong Generic so it can automatically write code to serialize BingBong s.

data BingBong = Bing | Bong deriving ( Show , Generic , Typeable )

A Message is what gets sent from server to server. It contains the sender, receiver, and message content (a BingBong ).

data Message = Message { senderOf :: ProcessId , recipientOf :: ProcessId , msg :: BingBong } deriving ( Show , Generic , Typeable )

A Tick is what each server's tick generator sends it.

data Tick = Tick deriving ( Show , Generic , Typeable )

We'll have GHC automatically generate serialization code for BingBong , Message , and Tick . We could do it ourselves, but there's really no reason unless we need to hand-tune performance or something.

instance Binary BingBong instance Binary Message instance Binary Tick

The ServerState is what it says on the tin. It has the bing and bong counts and a random number generator state.

is what it says on the tin. It has the bing and bong counts and a random number generator state. Notice how we put underscores before the field names and then used makeLenses to generate Lenses for bingCount , bongCount , and randomGen .

to generate Lenses for , , and . Note that, for testing purposes, we can use pre-determined random generator seeds! This means that we only get "true" randomness (i.e. actual pseudorandomness) when we want it, but we're fully deterministic when we want to be (like for testing). If we'd used side-effectful randomness (like C's random() or reading from /dev/urandom ), we wouldn't get that.

data ServerState = ServerState { _ bingCount :: Int , _ bongCount :: Int , _ randomGen :: StdGen } deriving ( Show ) makeLenses '' ServerState

ServerConfig is just the server's ID as well as a list of the IDs of all servers on the network.

data ServerConfig = ServerConfig { myId :: ProcessId , peers :: [ ProcessId ] } deriving ( Show )

ServerAction is a custom Monad that gives us all the behavior we want (reading a config, outputting messages, and updating state). It's really just a wrapper around RWS , so we don't really have to do anything. We just tell the compiler which features we want copied from RWS , such as its Monad behavior.

newtype ServerAction a = ServerAction { runAction :: RWS ServerConfig [ Message ] ServerState a} deriving ( Functor , Applicative , Monad , MonadState ServerState , MonadWriter [ Message ], MonadReader ServerConfig )

Now, let's write out our server logic.

tickHandler :: Tick -> ServerAction () tickHandler Tick = do ServerConfig myPid peers <- ask random <- randomWithin ( 0 , length peers - 1 ) let peer = peers !! random sendBingBongTo peer Bing msgHandler :: Message -> ServerAction () msgHandler ( Message sender recipient Bing ) = do bingCount += 1 sendBingBongTo sender Bong msgHandler ( Message sender recipient Bong ) = do bongCount += 1 sendBingBongTo :: ProcessId -> BingBong -> ServerAction () sendBingBongTo recipient bingbong = do ServerConfig myId _ <- ask tell [ Message myId recipient bingbong] randomWithin :: Random r => (r,r) -> ServerAction r randomWithin bounds = randomGen %%= randomR bounds

tickHandler processes Tick s. It randomly chooses a peer and sends that peer a Bing .

processes s. It randomly chooses a peer and sends that peer a . msgHandler processes Message s. It responds to Bong s and increments counters when appropriate.

processes s. It responds to s and increments counters when appropriate. sendBingBongTo is a helper function that creates a message annotated with the sender and receiver and then outputs the message (using tell , which lets us write to the MonadWriter output).

is a helper function that creates a message annotated with the sender and receiver and then outputs the message (using , which lets us write to the output). randomWithin , given an upper and lower bound, picks a random element in those bounds. It also updates the server's random number generator state. You probably won't recognize that %%= operator unless you use Lenses a lot, but it's one of many useful State Monad-oriented operators from the Lens library.

Let's write out the network stack (i.e. the necessarily impure part of our code).

runServer :: ServerConfig -> ServerState -> Process () runServer config state = do let run handler msg = return $ execRWS (runAction $ handler msg) config state (state', outputMessages) <- receiveWait [ match $ run msgHandler, match $ run tickHandler] say $ "Current state: " ++ show state' mapM (\msg -> send (recipientOf msg) msg) outputMessages runServer config state'

This takes a server's config and initial state. It waits for a message. Depending on which type of message we receive (a Tick or a regular Message ), receiveWait runs the approprate handler. Either handler returns a new state and an output message list, so we get both of those. We use Cloud Haskell's say function to send a debug message to the logger process (which, by default, just prints to stderr ). It sends any output messages to their intended recipient. We repeat the process with the new state.

Now let's write the initialization code.

spawnServer :: Process ProcessId spawnServer = spawnLocal $ do myPid <- getSelfPid otherPids <- expect spawnLocal $ forever $ do liftIO $ threadDelay ( 10 ^ 6 ) send myPid Tick randomGen <- liftIO newStdGen runServer ( ServerConfig myPid otherPids) ( ServerState 0 0 randomGen) spawnServers :: Int -> Process () spawnServers count = do pids <- replicateM count spawnServer mapM_ ( `send` pids) pids

spawnServer spawns a new process which does the following: Get my PID. Wait for someone to send me everyone's PIDs. Spawn a ticker process that sends me a Tick every second (1 million microseconds). Create a random number generator seed. Create the appropriate ServerConfig and initial ServerState and call runServer .

spawns a new process which does the following: spawnServers (plural) simply spawns count servers using spawnServer , collects all their PIDs, and sends the list of PIDs to each server.

And for our main :

main = do Right transport <- createTransport "localhost" "0" defaultTCPParameters backendNode <- newLocalNode transport initRemoteTable runProcess backendNode (spawnServers 10 ) putStrLn "Push enter to exit" getLine

First, we create a network transport endpoint. This is how Cloud Haskell actually talks over the network. (We don't use it here since everything is local, but we could trivially spread this demo app over multiple machines.)

Next, we create a local node (which manages all the processes on this machine) and attach it to the network transport.

Next, we run spawnServers 10 on our local node. If you recall, this spawns 10 communicating processes.

on our local node. If you recall, this spawns 10 communicating processes. Finally, we wait for the user to push enter before exiting.

And we're done! Try running the code yourself.

Again, completed code is here.