Programming with distributed functional futures

The dataflow model of computation is a programming language abstraction exploited for pipelined parallelism over infinite streams, and task parallel execution over data-independent tasks. There are numerous Haskell libraries for dataflow-style programming, some that focus on stream processing and others that focus on task parallel scheduling on (shared memory) multicore CPUs and on (distributed memory) clusters and HPCs. Here are a few examples:

Dataflow values Memory Finite & single assignment Streams Shared monad-par, lvish conduit, pipes, iteratee Distributed CloudHaskell-Platform, HdpH





Dataflow task parallelism with futures

This post concentrates on dataflow task parallelism with Haskell and Erlang. The Haskell below uses the monad-par library to construct the dataflow task graph on the right, executed in parallel with runPar :

import Control.Monad.Par f :: Par Int g,h :: Int -> Par Int j,k :: Int -> Int -> Par Int main = print $ runPar $ do a >= g) c >= h) d >= \x -> get c >>= \y -> j x y) e >= \x -> get d >>= \y -> k x y) get e

This simple parallel programming abstraction can be thought of as synchronous dataflow using rendevouz futures (R. Halstead, 1985). A future can be thought of as placeholder for a value that is set to contain a real value once that value becomes known, by evaluating its corresponding spawned task.

Now to compare some functional libraries with futures using three properties to measure them. The contenders are monad-par (Marlow et al, 2011), Erlang Remote Procedure Call (RPC) (Cesarini, 2009), the CloudHaskell Platform (CH-P) (on github), HdpH (Maier, 2014) and HdpH-RS (Stewart, 2015). Those properties are: does the runtime system...

balances load between processors? support distributed memory deployment? is the scheduler resilient to failure?

Library Load Balancing Fault tolerant Distributed memory monad-par ✔ n/a ✗ Erlang RPC ✗ (✔) ✔ CH-P ✗ (✔) ✔ HdpH ✔ ✗ ✔ HdpH-RS ✔ ✔ ✔





The monad-par library is designed for shared-memory execution with GHC on one node, and load balancing between processor cores. Erlang's RPC library supports multiple node distribution, but the Erlang VM does not support automatic load balancing. Fault tolerance only works if spawned processes are explicitly linked or monitored, hence (✔), leaving programmatic recovery to the user. CH-P is unsurprisingly the same as Erlang RPC, but does require additional Template Haskell code for explicit closure creation. HdpH and HdpH-RS are designed for distributed memory, and load balancing is done with hierarchical work stealing --- between sparkpools across nodes and between threadpools across cores on each node.

Creating futures

The following table shows the primitives for creating futures.

Placement monad-par HdpH(-RS) Erlang CH-P Local only spawn spawn async_call/4 async Lazily disitributed spawn Eagerly distributed spawnAt async_call/4 asyncSTM





The monad-par and HdpH(-RS) spawn function returns an empty IStructure, called an IVar which represents an empty future, to be filled once the execution of the spawn task is complete, and spawnAt in HdpH(-RS) supports explicit node placement. The Erlang RPC function rpc:async_call/4 eagerly places future tasks, returning a key. The functions rpc:yield/1 , rpc:nb_yield/1 and rpc:nb_yield/2 are used to read values from keys.

Synchronising with futures

Once futures are created ascynhronously, there needs to be some way of making a synchronous blocking request to get data from a future once it has been filled.

Operation monad-par HdpH(-RS) Erlang CH-P Blocking read get get yield/1 wait Non-blocking read tryGet nb_yield/1 poll Timeout read nb_yield/2 waitTimeout Check fullness probe check





In monad-par and HdpH(-RS), a blocking wait on a future is get . In Erlang RPC it is rpc:yield/4 , and in CH-P it is wait . The non-blocking version in HdpH and HdpH-RS is tryGet , in Erlang RPC it is rpc:nb_yield/4 and in CH-P it is waitTimeout .

Dataflow by example: Fibonacci

This is Fibonacci with the RPC library in Erlang using rpc:async_call .

-module(fib_rpc). -export([fib/1,random_node/0]). %% Compute Fibonacci with futures fib(0) -> 0; fib(1) -> 1; fib(X) -> Key = rpc:async_call(random_node(),fib_rpc,fib,[X-1]), Y = fib(X-2), Z = rpc:yield(Key), Y + Z. %% Select random node (maybe our own) random_node() -> I = random:uniform(length(nodes()) + 1), Nodes = nodes() ++ [node()], lists:nth(I,Nodes).

This is Fibonacci with the monad-par library in Haskell using spawn .