In the previous parts we’ve seen:

In this last part we’ll look at the final feature that makes actors stand out: error handling, supervision and the actor hierarchy.

What is this all about? Things fail all the time. Whenever you communicate with an external system, there might be a network error; the service might have a bug; requests can be malformed; servers might be down; etc.

Error handling is often a significant part of our application. Because it’s surface area is so large, it has a tendency to creep into each corner of our code and make it harder to read and understand. That’s why there are numerous efforts to contain the situation and separate the error handling code from the business logic.

If successfull, we’ll get clear, readable business logic, but also clear and readable error handling logic. Another part of the challenge is to get a degree of certainty that our error handling actually works!

Akka borrows from Erlang’s “let it crash” philosophy. The key idea is not to try to handle all errors in a process. Firstly, this leads to error code getting tangled with the business logic. Secondly, an actor can simply lack context to be able to fix the error.

For example, if there’s an actor whose sole responsibility is to read from a queueing system, and the connection to the queue breaks, what should the actor do? Re-create the connection? But how, if that’s not the responsibility of the actor?

That’s what supervision hierarchies in Akka (and Erlang, and other actor implementations) are for. Each actor has a parent; if an error is not handled by the actor, it is propagated to the parent. The parent can decide if the child process should be resumed, restarted, stopped, or if the error should be escalated to its parent.

Parent, grand-parent and so on, actors might have more and more context, and might hence be able to run appropriate logic — e.g. re-creating the connection to the queue and creating a new child actor which will read from the queue.

Using supervision hierarchies we also achieve the separation of concerns that we were after: the business logic is in the actor, while the error handling logic is in the supervisor.

Akka

As in the previous parts, we will start with an Akka example and see how to implement the same logic using Akka Typed, Monix and Zio. In this example we’ll:

connect to an external queueing system through a QueueConnector trait after obtaining a connected Queue instance, read from the queue forward any messages to interested consumers upon any errors, attempt to re-connect, beforehand attempting to close() the old queue connection.

Here are the base traits we’ll be working with:

The traits are parametrised with a wrapper (higher-order) type F[_] , which should be capable both of representing successful and failed computations. In the Akka example, we’ll be using QueueConnector[Future] and Queue[Future] , as that’s the container type that Akka works with best.

We’ll implement a pattern that’s also known as “error kernel”. We’ll keep the imporant state safe & protected in a parent actor: here the state will be the set of registered message consumers (to which the messages read from the queue should be forwarded). The risky operations, which might fail: connecting to and consuming from the queue, will be delegated to a child actor. That way even if the child actor fails, the state will not be lost.

The parent actor will receive two types of messages:

Subscribe to add an actor to the set of consumers interested in receiving messages, and Received , sent by the child actor, when a new message has been received from the queue. The subscribe and received-message handling logic is quite straightforward:

But with this definition alone, nothing will really happen, as we never try to connect to the queue. That’s why when the broadcast (parent) actor starts, we’ll spawn a child actor:

What does the child actor do? Its internal state will consist of the currently connected Queue instance (if any). Once again we’ll use the preStart callback to try to connect to the queue immediately after the actor starts. As this is an asynchronous operation, we’ll pipeTo the result to the actor. That way the result of the connect operation will be sent as a message to the actor:

Once the connected queue is received, we can start reading messages from it. Each message, once available, will be forwarded to the parent actor, wrapped in Received . After a message is received, we can receive the next one by sending the queue to self (the current actor):

But what if there’s an error? If either connector.connect.pipeTo(self) or queue.read().pipeTo(self) fails, the actor will receive a Failure(e) message. We don’t really know what to do with that, so we are taking the easiest route: re-throwing the error — which will cause the actor to fail — and hence propagating the error to the parent.

Whatever the reason for the child actor to be stopped (either failure or a regular shutdown of the application), we make one last effort to clean up in the postStop method:

If there’s any connected Queue instance (there might not be, if connecting failed), we try to invoke its close() method. As this is an asynchronous process, and the postStop method is synchronous, we have no other choice but to use Await.result .

And that’s all there is to the child actor; notice that there’s almost no error handling code at all (except for re-throwing any exceptions).

What will the parent do once a child fails? That depends on the supervision strategy. The default one is to Restart a child on “normal” exceptions. The strategy is defined in the parent actor as an overridable method:

Here we have a simple hierarchy with one child actor, but in more complex examples, besides restarting the actor (one-for-one), there is also the possibility of restarting all child actors if only one fails (all-for-one).

In addition, there’s also some flexibility in how the child actor is restarted. One option is to use backoff, that is not to restart the child actor immediately, but after a (growing) delay. If a system is down, it’s quite possible that it will be down if we try again right after failure. But if we wait a bit, it has a higher chance of getting back to shape. This is possible by wrapping the child actor in a BackoffSupervisor .

The example above is available in the GitHub repository, together with tests which simulate failures at various stages of the application. There’s quite a lot of logging going on, so you can observe what happens at each moment, when and if the actors are created and restarted.

Akka Typed

The Akka Typed implementation is slightly different in two aspects. First of all, failure handling is not tied to the parent actor. Instead, it’s a wrapper for a behavior which gives us more flexibility. Failure handling can be both defined in the parent, or can come pre-defined with the child actor behavior.

Secondly, if a parent actors spawns multiple child actors, each of them can have different supervisor handling — unlike the “global” configuration of the supervision strategy in the “traditional” Akka approach.

To implement our example we’ll define broadcastBehavior which will describe how the parent actor should behave. It will handle the same two types of messages as before, but because we need to parametrize the behavior with a single type, we introduce a common trait:

The message handling logic won’t have any mutable state. Instead, once again it will be a method parametrised with the state —the set of consumer actors — which is called recursively:

But, before handling any BroadcastActorMessage , we should try to connect to the queue and start receiving messages. We’ll do that in a separate actor, spawned when the broadcast behavior is first created:

We’re using Behaviors.supervise to wrap the child actor behavior ( connectToQueueBehavior , which we’ll define next) so that whenever a RuntimeException happens, the actor will be restarted. Note that supervise is a wrapper for any Behavior , yielding a new Behavior . We could have defined it completely separately and outside of the parent actor. Depending on the use-case, it might be more logical to define it inside, or outside of the supervisor.

Even easier than before, we can also use delayed restarts with a backoff by using SupervisorStrategy.restartWithBackoff (and others), instead of SupervisorStrategy.restart as in this case.

There’s an important difference between “traditional” Akka and Akka Typed. In the previous approach, we’ve seen that the default supervisor strategy for “normal” exceptions is to restart the child actor. In Akka Typed, the default is to stop the child actor. That’s why we need to explicitly specify what to do on child failures using onFailure .

The second difference from the previous implementaiton is that the child actor will in fact consist of two actors: one for connecting to the queue, the other for a connected queue. The reason why we need not only two behaviors but also two actors is that both of them will handle different types of messages. That’s the small price we’ll need to pay for type safety.

We won’t be sending any messages from the parent actor to the child actor, hence its type, as viewed by the parent actor, will be Behavior[Nothing] . Inside the actor, however, we are sending a message containing the connected queue, so we’ll need to create a behavior which accepts a Try[Queue[Future]] and then hide that fact from the parent using narrow :

Using the self-reference from the context, we are sending a message to self once the queue is connected ( connector.connect.andThen { case result => ctx.self ! result } ). If it’s a failure, we rethrow the error which will cause the supervisor in the parent to be invoked. If it’s a success, we spawn a child actor with the queue-consuming behavior ( consumeQueueBehavior , defined below).

Note that instead of the preStart callback, in Akka Typed we simply create a behavior which runs the desired code when the actor is setup (using Behavior.setup ), and then returns the “proper” behavior. There’s no looping in this actor, it only ever receives one message.

But that’s not the end. If the queue-consuming actor fails, we need to propagate that error to the parent. That’s not done automatically, we need to watch the new child actor (using ctx.watch ). Then, the only thing left to do in the actor is to wait for the child’s termination signal (when things go wrong), and propagate that to the parent.

Termination signals are sent through a different channel than normal actor messages, hence the dedicated behavior factory ( Behavior.receiveSignal , instead of the usual Behavior.receiveMessage ).

Finally, we get to the behavior of the queue consumer:

Similarly to the “traditional” Akka implementation, we invoke reading from the queue and once the message is ready, we forward it to self ( queue.read().andThen { case result => ctx.self ! result } ). Once a message is received, we send it to the sink (that will be the parent actor) and recursively call the same behavior.

If it’s a failure, we simply throw the exception. That will cause the queue-connecting actor to be notified, which will in turn notify the parent actor.

What about closing the queue before the queue-consumer actor finishes (for whatever reason)? There’s no postStop method to override here like before. Instead, we modify the created behavior adding a receiveSignal handler. If we get a PostStop signal, we try to close the queue. Again, we need to synchronously return a new behavior, but the closing action is asynchronous — hence the need for the Await .

It’s important to note here that once again we are leveraging the fact that Behavior s, just like Monix’s Task s and Zio’s IO are lazy. This allows modifying the (recursive) behavior by adding additional handlers or meta-data. Here, we are modifying Behaviors.receiveMessage[Try[String]] so that the signal handler is installed as well. If the behaviors were eagerly executed, the receiveSignal would never be called.

One more case where separating description of a computation from its interpretation is beneficial.

Monix

Let’s start examining the Monix implementation from the end, that is from the description of the task which will connect to the queue, consume messages from it and close it in the end (either due to normal termination or an error).

Instead of using lifecycle hooks ( preStart , postStop in Akka), we’ll simply define a process which performs the connect-consume-close steps in sequence.

As in the previous parts, to communicate with the parent process we’ll use an MVar (a bounded, 1-element queue) which will store elements of type BroadcastMessage :

Next, we’ll define three separate tasks which connect to the queue, consume elements from the queue and finally close it:

The task definitions are pretty straighforward: they simply invoke the appropriate methods on the connector or a connected queue instance and perform some additional logging. Note that consumeQueue will never end normally, as after reading a single message and sending it the parent process (using inbox.put(Received(msg)) ), it’s always restarted to read another message ( restartUntil(_ => false) ).

Task[Queue[Task]] might look weird, but well … it’s a task which, when run, creates a Queue which in turn, wraps the results of its method in a Task .

How to combine these three tasks into a whole? We’ll use bracket :

Note that we are using inbox as the name for the communication channel between the consume and broadcast processes to avoid name clashes, as Queue is already taken by our domain class.

bracket in an operator that forms one of the basic building blocks of error handling in Monix (and ZIO as well). It’s equivalent to the well known try ... catch ... finally construct from Java/Scala. The connect task should allocate the resources; then the first bracket parameter is the resource usage. Regardless of the way the resource usage part ends (either the task completes, there is an error or the fiber is cancelled), it’s guaranteed that the third task, to release the resources, will be evaluated as well.

And that’s exactly what we need! Using bracket , we can ensure that the queue will at least be attempted to be closed however the queue consumption ends.

It’s not quite clear what will happen when both resource-usage and resource-release parts throw an error. Which error will the user get? It’s the first one, and the second will be discarded. An important detail to keep in mind.

The above guarantees proper behavior when an error happens. But what if we just want to end the process gracefully? We might be no longer interested in consuming the queue. With Akka it was enough to stop the actor. Here we have to use cancellation.

In the previous parts we’ve used Fiber.cancel as well, to end a forked (asynchronous) process. Here the consumption logic will also be run asynchronously (as we’ll see below soon). If the user decides that queue consumption should stop, cancellation is the only hope to break the infinite consumption loop.

However, there’s a catch: by default a lot of things aren’t cancellable. For example, the infitite flatMap chain in consumeQueue (if we unfold the recursive invocations) will never be cancelled. That’s why we need to add a cancellation boundary using cancelable . This will cause the flat-map chain to allow stopping mid-way.

What cancelable does, in essence, is to instruct the interpreter of the task that when it receives a cancellation request for a fiber (ligthweight thread), and there’s an opportunity to stop executing the task — for example because the interpreter just finished one flatMap operation and is about to start another one — the task will be cancelled.

So far we’ve talked only about connecting to the queue. What about the rest? We still need to define the message-broadcasting process which will send the read messages to interested consumers. For that, we create a task which will handle both Subscribed and Received messages:

Nothing out of the ordinary that we haven’t seen before. We describe a never-ending process which reads messages from a queue, maps them to the appropriate tasks (updating the internal state — the set of consumer s — if necessary) and recursively calls itself.

We still need to define how and when the consume process should be restarted:

That part of the broadcast process definition corresponds to the supervisor strategy. When a consume task fails — which can only happen due to an error — we have to decide what to do. Here we simply log the result and restart the process, just like as the supervisor’s restart .

While not built-in, that’s the place where we might use backoff or a limited retry mechanism; however, we’d have to code that by hand.

We’ve also managed to maintain the separation between the business and error-handling logic, however here it’s not enforced through a special mechanism. Instead, we are separating the Task description into a “single” consume task and a task which manages the restarts. Creating fine-grained, single-responsibility task descriptions is one way of creating readable, maintainable code when using Monix.

Finally, we need to tie all the parts together and kick-start the background processes:

To start the broadcast, we start two asynchronous processes: one consuming from the queue in a loop, the other processing messages. The two processes communicate through the inbox MVar .

The return type of the method consists of both the inbox — so that external clients have the possibility to subscribe new consumers, and of a task which, when run, will cancel the whole process. Note how the fact that Task is lazy allows us to simply create the description of the cancellation logic: f1.cancel *> f2.cancel ( *> flat-maps the two tasks, discarding the result of the first), without fear of running the cancellation prematurely.

Cancelling f1 will invoke the bracket’s release, while cancelling f2 will cause messages to no longer be read from the inbox .

ZIO

Finally, let’s see how ZIO handles errors. As expected, the implementation is quite similar to Monix, however this might be deceptive at times: there are some very important differences, especially in the cancellation model.

However, the overall structure of the solution is the same as before. We’ll be using the same two messages to communicate with the broadcast process:

With the difference that in Subscribe , the consumer results in an IO instead of a Task or a Future . Following the same order as in the previous section, the description of how queue consumption should work looks familiar:

The bracket operator works the same way as in Monix (though the release-resource and use-resource arguments are reversed): it guarantees that, if the connect action succeedes, releaseQueue will be evaluated (closing an open queue connection), both when consumeQueue finishes normally or due to an error.

There are two important differences in the code, though. First of all, in consumeQueue you might notice that in the Monix version we had to explicitly mark the flatMap -chain as cancelable so that it’s possible to stop queue consumption from the outside. Here, that’s not needed: flatMap chains are by default auto-cancellable.

Secondly, the release-resource part in bracket must handle all errors: and that’s enforced through the type system, as the type of the release-resource parameter is A => IO[Void, Unit] . That’s why there’s no problem what to do in case the release action results in an error: normal errors aren’t possible (as the type states), and if the action does throw an exception (which is always possible), this is considered a programming defect and will be reported to the fiber’s supervisor and/or logged.

The broadcast process implementation corresponds directly to the Monix implementation without significant differences:

To reiterate on the previous description: we create two processes, one which tries to connect to the queue and consume messages from it ( consumerForever ), restarting the whole procedure if necessary. The second one ( processMessages ) maintains the state — the set of current subscribers (and hence implements the Error Kernel pattern). As a result of the whole action, we return:

a queue to which new subscribers can be sent

a way to stop the whole process

Stopping the process involves, as before, interrupting the fiber which tries to connect to the queue and consumes messages from it, and another fiber which broadcasts the incoming messages.

Interruption in ZIO and cancellation in Monix

The way interruption and cancellation works in ZIO and Monix is one of their distinguishing differences, so it might make sense to compare them side-by-side.

Creating cancelable actions

In Monix, cancelable actions can be created using:

Task.create , where the user needs to provide a Cancelable instance which should stop (or try) the asynchronous computation. Upon cancellation, this callback might run concurrently with the cancelled action

, where the user needs to provide a instance which should stop (or try) the asynchronous computation. Upon cancellation, this callback might run concurrently with the cancelled action cancelable operator, which causes flatMap chains in the Task to become cancelable (by default they are not)

In ZIO we have:

IO.async0 , where the user needs to provide a Canceler which will be run when the action is cancelled. The canceller might be run concurrently with the cancelled action

, where the user needs to provide a which will be run when the action is cancelled. The canceller might be run concurrently with the cancelled action flatMap chains are cancellable by default, no need to explicitly mark them as such

Both libraries offer an uncancelable (Monix)/ uninterruptibly (ZIO) operators which prevent the described action from being cancelled — even if it’s built out of cancellable operations.

In neither of the libraries atomic actions (such as a single flatMap step, or wrapped synchronous code) will be attempted to be interrupted/cancelled e.g. using Thread.interrupt .

Cancelling fibers

The way fiber cancellation/interruption tasks work is another important difference. In Monix, fibers can be interrupted by evaluating a task returned by the Fiber.cancel: Task[Unit] method. This task will complete once the cancellation is sent.

In ZIO, we have the Fiber.interrupt(t: Throwable): IO[E, Unit] method. It’s similar, as when evaluated, it will interrupt the target fiber. But it’s also different in two aspects. First, we can specify a specific interruption reason (an exception). That reason will be then reported to the any action that attempts to join the interrupted fiber, or to the fiber’s supervisor, allowing logging or restarts.

Second important difference is that the action returned by interrupt will only complete once the interruption is successful or the fiber ended. If we need the interrupt-and-forget semantics from Monix, this can be achieved by forking the fiber interruption into a fiber ( .interrupt(...).fork ).

What can be cancelled

When can cancellation be invoked? Both Monix and ZIO provide a way to cancel/interrupt a running fiber, as described above.

Additionally, when a Monix task is run asynchronously e.g. using runAsync , it returns a CancelableFuture . That’s an extension to the regular Future which can additionaly cancel a running computation through the side-effecting cancel() method.

ZIO doesn’t have such possibilities, however the same effect might be achieved by forking the IO action to a fiber and obtaining (through the synchronous unsafePerformIO ) a Fiber instance, which can then be interrupted.

Cleaning up

Both Monix and ZIO have a bracket operator which works the same way: when applied to a resource-create action, it guarantees that a resource-release action will be run once the resource-use action completes successfuly, with an error or is cancelled.

ZIO also has some handy aliases, like ensuring (corresponds to finally ) and bracketOnError .

Cancellation callbacks

Is it possible to find out that an action has been cancelled within the action itself?

Monix has two such operators. Firstly, doOnCancel(cb: Task[Unit]) runs the given task when cancellation occurs (there’s also a counterpart which runs when the task ends normally, doOnFinish ). Hence, it’s a “partial bracket”.

The second, onCancelRaiseError , causes the action to fail with the given exception, instead of becoming non-terminating on cancel. There’s no way to specify the cancellation reason from the cancelling fiber, but it’s possible to specify it in the cancelled fiber. On the other hand, in ZIO it’s only possible to specify the reason in the interrupting fiber, and the interrupted fiber is always terminating — with that exception.

ZIO has no operators which would allow to find out in the interrupted that an interruption happened. Instead, interruption will be reported to any actions that attempt to join the fiber that is being interrupted.

Fiber supervisors

ZIO has two additional mechanisms for fiber supervision which have no counterpart in Monix.

The first one are fiber supervisors. When forking an IO to a fiber it’s possible to specify a handler which will be invoked on any exceptions not handled by the fiber: fork0[E2](handler: Throwable => Infallible[Unit]) . If this resembles supervisors in actors — it should!

If no supervisor is specified, a default one is used which logs the exception.

The second mechanism is the IO.supervised(t: Throwable) method which causes any fibers forked as part of evaluation of the given action to be interrupted with the given exception, once this action completes. Again, this is similar to all child actors being stopped when the parent actor is stopped, however here it’s optional, not mandatory.

We’ve seen an example of using this feature in part 2, where the worker fibers were automatically interrupted once the crawler finishes.

Summary

Is Monix or ZIO an alternative to Akka actors? Yes: state encapsulation, communication and error handling/supervision can all be implemented using Task s or IO actions, without much effort, at the same time keeping the code readable and maintainable, in a more type-safe way.

However, Akka is not lagging behind, as there’s an alternative to “traditional” Akka actors in Akka itself: Akka typed, which is definitely a viable alternative as well.

Whichever approach we choose, as we have seen in the examples presented in the 3 parts of the series, the overall structure of solutions written using the four approaches is the same:

all of them use asynchronous message passing

all of them communicate using queues: implicit actor mailboxes or explicit queues

all of them use concurrently running, independent light-weight processes: actors or fibers

However, as the saying goes, the devil is in the details: the level of type-safety, the model of evaluation, supervision, cancelling and error handling differs significantly. Below is a summary of the various features that we have covered in the series (also available in textual format on Google Sheets):