October 29, 2018

Concurrent Ruby with Tasks

Yes, you read the title correctly. And yes, ruby still has a GIL. However, ruby has long been able to support concurrency for IO-bound calls. Ruby will not block the current thread if it is waiting on IO. In practice, this means that many of our ruby programs (like web applications) can use ruby’s threads to concurrently serve multiple requests.

Working directly with ruby’s threading primitives can be complicated. This is a problem that the concurent-ruby library aims to solve. This library is mature and comprehensive but it offers a staggering number of APIs for modeling concurrency in your application.

I’d like to suggest a different path. The dry-monads library exposes the Task monad which is built on top of concurrent-ruby . In this post I’ll explore the Task monad as well as the newly-released do syntax.

Tasks Introduction

First, let’s assume you have the following in your Gemfile :

source "https://rubygems.org" gem "dry-monads" , "~> 1.0" , require: "dry/monads/all"

We can now write a simple program that uses a Task to perform an asynchronous computation. We’ll use sleep to represent any kind of long-running IO operation (think database queries, HTTP requests, etc).

class Async include Dry :: Monads :: Task :: Mixin def go slow_task ( 1 ). value! end def slow_task ( i ) Task [ :io ] do sleep 1 i end end end p Async . new . go # => 1

First, we include Dry::Monads::Task::Mixin in our class. This gives us access to the Task constant which we can use to build our Task s. Next, we use Task[:io] and provide it a block to start a Task . This creates a Task on the :io executor. There are three default executors: :io for IO-bound tasks, :fast for CPU-bound tasks and :immediate which runs on the current thread (usually used for testing).

As soon as our Task is initialized, it begins running in its own thread. It will not block the main thread of execution.

Finally, we call the value! method to block until our Task returns a value.

If we save this in a file named monads.rb and run it, we’ll see it takes a bit over a second to execute.

$ time ruby monads.rb 1 ruby monads.rb 0.19s user 0.08s system 21% cpu 1.255 total

Note that if we remove the value! call, our code will execute without waiting on the value of the Task . Let’s try it.

class Async include Dry :: Monads :: Task :: Mixin def go slow_task ( 1 ) end def slow_task ( i ) Task [ :io ] do sleep 1 i end end end p Async . new . go # => 1

Now let’s run it.

$ time ruby monads.rb Task ( ? ) ruby monads.rb 0.18s user 0.08s system 102% cpu 0.254 total

Our program returned faster than a second. It didn’t wait on the Task to complete but rather returned the value of the Task itself. See the ? in there? It means that our program doesn’t know the value of the Task yet because it hasn’t finished running.

Let’s try updating our code to sleep a bit in order to give the Task time to finish executing.

class Async include Dry :: Monads :: Task :: Mixin def go slow_task ( 1 ) end def slow_task ( i ) Task [ :io ] do sleep 1 i end end end result = Async . new . go sleep 2 p result # => 1

$ time ruby monads.rb Task ( value = 1 ) ruby monads.rb 0.17s user 0.09s system 11% cpu 2.257 total

Now that we’ve slept longer than the Task takes to execute, you’ll notice that our Task does indeed have a value.

Idiomatic Task Usage

Using value! isn’t great. First off, it blocks the main thread of execution while waiting on a value. Second, it doesn’t handle errors. Let’s introduce an exception in our task and see what happens.

class Async include Dry :: Monads :: Task :: Mixin def go slow_task ( 1 ). value! end def slow_task ( i ) Task [ :io ] do sleep 1 raise "boom!" end end end p Async . new . go

$ time ruby monads.rb Traceback ( most recent call last ) : <...> monads.rb:15:in ` block in slow_task ': boom! (RuntimeError) ruby monads.rb 0.17s user 0.08s system 20% cpu 1.250 total

That’s not good. What should we use instead of value! ? There are two primary methods: bind and fmap . Let’s talk about bind first.

The bind method takes one argument – a block. That block will be called with the value of the Task when it successfully completes. It is expected that the block provided to bind return a new Task . Let’s use bind to chain together multiple Task s.

class Async include Dry :: Monads :: Task :: Mixin def go slow_task ( 1 ) . bind { | i | slow_task ( i + 1 ) } . bind { | i | slow_task ( i + 2 ) } end def slow_task ( i ) Task [ :io ] do sleep 1 i end end end p Async . new . go

Ok, let’s run it.

$ time ruby monads.rb Task ( ? ) ruby monads.rb 0.17s user 0.10s system 103% cpu 0.254 total

Whoops, what happened? It turns out bind is non-blocking. Our program exits immediately without waiting for the result of our chain of calls. Let’s add back our handy sleep to see if we can find out what’s going on here.

class Async include Dry :: Monads :: Task :: Mixin def go slow_task ( 1 ) . bind { | i | slow_task ( i + 1 ) } . bind { | i | slow_task ( i + 2 ) } end def slow_task ( i ) Task [ :io ] do sleep 1 i end end end result = Async . new . go sleep 5 p result

$ time ruby monads.rb Task ( value = 4 ) ruby monads.rb 0.18s user 0.10s system 5% cpu 5.263 total

There we go. Our first Task produced the value 1 after sleeping for a second. Our first call to bind waited for the value to be available and then produced a new Task , adding 1 to it and sleeping for another second. Finally, our last call to bind waited for this value ( 2 ) and returns another Task adding 2 more to this value. This Task eventually returns 4 .

The fmap function behaves exactly like bind except that the provided block returns a raw value rather than another Task . Let’s add one to our chain.

class Async include Dry :: Monads :: Task :: Mixin def go slow_task ( 1 ) . bind { | i | slow_task ( i + 1 ) } . bind { | i | slow_task ( i + 2 ) } . fmap { | i | i + 3 } end def slow_task ( i ) Task [ :io ] do sleep 1 i end end end result = Async . new . go sleep 5 p result

$ time ruby monads.rb Task ( value = 7 ) ruby monads.rb 0.17s user 0.09s system 5% cpu 5.249 total

Our last call to fmap waits for the aforementioned value of 4 to be produced and then adds 3 to it, returning the raw value of 7 . The fmap method knows to re-wrap that value in a Task that returns immediately.

Handling Exceptions

An exception raised inside of a Task puts it into an error state and stores the exception. Both bind and fmap are “error aware” in that they will not execute their blocks on a Task in the error state. Instead, they will ignore the provided block and simply return the errored Task .

class Async include Dry :: Monads :: Task :: Mixin def go slow_task ( 1 ) . bind { | i | slow_task ( i + 1 ) } . bind { | i | error_task } . fmap { | i | i + 3 } end def slow_task ( i ) Task [ :io ] do sleep 1 i end end def error_task Task [ :fast ] do raise "boom" end end end result = Async . new . go sleep 5 p result

$ time ruby monads.rb Task ( error = #<RuntimeError: boom>) ruby monads.rb 0.19s user 0.09s system 5% cpu 5.261 total

In the above example, the block provided to fmap never executes because our error_task method returns a Task in the error state.

Parallel Execution

So far we’ve run all our Task s in serial. Each waits for a value from the previous Task and returns a new Task . If we have 4 Task s that sleep for a second each, the computation will take 4 seconds.

Often we will have Task s that we prefer to execute in parallel. To achieve this, we’ll use the List monad and the traverse method.

class Async include Dry :: Monads :: List :: Mixin include Dry :: Monads :: Task :: Mixin def go List :: Task [ slow_task ( 1 ), slow_task ( 2 )] . traverse . bind { | a , b | slow_task ( a + b ) } end def slow_task ( i ) Task [ :io ] do sleep 1 i end end end p Async . new . go . value!

We’re creating three tasks above, but we’re running the first two in parallel. We achieve this by creating a List monad using the List::Task[...] invocation, providing our list of Task s inside of the square brackets. Next, we call traverse . The traverse method “flips” a List monad. That is, given a List of Task s, it will return to us a Task of a List . Said differently, traverse will wait until each Task in the List successfully completes and then it will build a new Task with a List of the provided values. Our next call to bind destructures the list into two block arguments, which we add together to make a new Task .

Importantly, the call to traverse allows the Task s to run in parallel. Even though we’re starting two Task s that each take a second to complete, that stage of processing should only take roughly a second. Our next call to bind creates a serial Task that also takes a second meaning our total time should be roughly two seconds. Notice that we reintroduced the value! call to block until the Task result is available. Let’s run it.

$ time ruby monads.rb 3 ruby monads.rb 0.17s user 0.11s system 12% cpu 2.271 total

As we can see, our code ran in just over two seconds. Yeah parallelism!

The Correct Way to Block

At some point our concurrent code is going to want to return a value back to the rest of our program. We don’t want to be stuck in async land forever. In order to return a value, we’ll have to somehow block until our Task has completed. So far we’ve only seen two ways to do this: call value! or sleep . Both of these are bad.

Fortunately, because Task s are monads, we can convert them to other well-known monadic types. Specifically, we can convert a Task into a Result by calling to_result . This will also block.

class Async include Dry :: Monads :: Task :: Mixin def go slow_task ( 1 ) . bind { | i | slow_task ( i + 1 ) } . bind { | i | slow_task ( i + 2 ) } . fmap { | i | i + 3 } . to_result end def slow_task ( i ) Task [ :io ] do sleep 1 i end end end p Async . new . go

$ time ruby monads.rb Success ( 7 ) ruby monads.rb 0.20s user 0.16s system 10% cpu 3.342 total

Discussing Result is beyond the scope of this post, but please follow the above link to read more if you’re interested.

Using the Do Syntax

Last but not least, we will discuss the so-called “do syntax”. Many languages that rely heavily on monads recognize that it is not always intuitive or convenient to create chains of weird looking function calls to access the values inside of the monad. Calling bind and fmap constantly can be confusing, especially if nested calls are required. The do syntax helps solve this problem.

In dry-monads , the do syntax is accomplished using the yield keyword. Let’s take a look at some example code.

class Async include Dry :: Monads :: Task :: Mixin include Dry :: Monads :: Do . for ( :go ) def go a = yield slow_task ( 1 ) b = yield slow_task ( a + 1 ) c = yield slow_task ( b + 2 ) Task [ :immediate ] do c + 3 end end def slow_task ( i ) Task [ :io ] do sleep 1 i end end end p Async . new . go

First, you’ll notice the new line include Dry::Monads::Do.for(:go) . This tells dry-monads to add the do syntax behavior to our go method. Next, within the body of go , any time we used to call bind we’re now calling our Task -returning function as normal and then passing it as an argument to yield . The do syntax has added a block argument to our go method, which we invoke by calling yield . This block checks the value of the Task to determine if it is successful. If so, it returns the value unwrapped. If it is not successful, it raises an error to stop the execution of our method. This error is caught by another wrapping helper and returned as the result of the method. Importantly, yield also blocks. Let’s see what happens when we run this code.

$ time ruby monads.rb Task ( value = 7 ) ruby monads.rb 0.18s user 0.08s system 8% cpu 3.259 total

Note that any method using the do syntax should always return a monadic type because any time a call to yield returns an unsuccessful result, that value will be directly returned. We want to make sure our method always returns the same type in both success and error cases.

Let’s see how this works in an error case.

class Async include Dry :: Monads :: Task :: Mixin include Dry :: Monads :: Do . for ( :go ) def go a = yield slow_task ( 1 ) b = yield error_task () c = yield slow_task ( b + 2 ) Task [ :immediate ] do c + 3 end end def slow_task ( i ) Task [ :io ] do sleep 1 i end end def error_task Task [ :fast ] do raise "boom" end end end p Async . new . go

$ time ruby monads.rb Task ( error = #<RuntimeError: boom>) ruby monads.rb 0.18s user 0.13s system 23% cpu 1.292 total

See how code ran in just over a second? As soon as our invocation of error_task returned an unsuccessful Task , the execution of the method stopped and returned the error Task immediately.

The do syntax works with parallel execution too.

class Async include Dry :: Monads :: List :: Mixin include Dry :: Monads :: Task :: Mixin include Dry :: Monads :: Do . for ( :go ) def go a , b = yield List :: Task [ slow_task ( 1 ), slow_task ( 2 )]. traverse Task [ :immediate ] do a + b end end def slow_task ( i ) Task [ :io ] do sleep 1 i end end end p Async . new . go

$ time ruby monads.rb Task ( value = 3 ) ruby monads.rb 0.17s user 0.11s system 22% cpu 1.269 total

Wrap Up