Supervision is one of the core operations that an Actor can fulfill. Handling errors is not always easy in a classic object oriented programming context as exceptions can be difficult to predict as they are fully embedded in the normal execution flow. In the Akka Actor Model, errors are handled in a well-structured isolated execution flow: not only this makes the exception handling more predictable but it also forces developers to design a proper fault-recovery system. This article describes how to use Actor Supervisors to handle error and recover from them.

Actor Supervision: Overview

Actors have a well-structured tree hierarchy built according to specific rules:

– Your Father (i.e.: the Actor that created you) is your Supervisor.

– Every Actor has a Supervisor, a part from the Guardian Actor ( /user ) which is the first one created by the system (same as a root node in a tree structure).

– Your Children (i.e.: the Actors you have created) follow your destiny: if you are restarted/stopped/resumed, they are restarted/stopped/resumed as well.

– If unable to handle an exception, escalate it to your Supervisor.

– If the Guardian Actor is unable to handle an exception, the system will shutdown.

Akka provides two categories for our strategies:

– OneForOneStrategy where the strategy is applied only to the child actor that failed.

– AllForOneStrategy where the strategy is applied to all the children actors when one fails.

Although Akka provides two predefined failure-recovery strategies, called defaultStrategy and stoppingStrategy , most of the time we need to define our own: this can be easily done as shown in the following tutorial.

Actor Supervision in Practice!

In this tutorial we want to trigger an actor supervision operation when a specific word is contained in the received message:

– if the message contains the word “restart”, the child actor is restarted

– if the message contains the word “resume”, the child actor is resumed after the failure

– if the message contains the word “stop”, the child actor is stopped…FOREVER! 😈

– if the message contains the word “secret”, we throw an unhandled exception that forces the Guardian Actor to shutdown the system

First of all, let’s define our protocol and exceptions:

// file protocol.scala package com.danielasfregola object PrinterProtocol { case class Message(msg: String) } class RestartMeException extends Exception("RESTART") class ResumeMeException extends Exception("RESUME") class StopMeException extends Exception("STOP")

Then we define the behaviour of our Actor and when we are going to throw the exceptions. Note that we have also added some utility methods to better observe the life cycle of our Actors.

// file PrinterActor.scala package com.danielasfregola import akka.actor.Actor class PrinterActor extends Actor { import PrinterProtocol._ override def preRestart(reason: Throwable, message: Option[Any]) = { println("Yo, I am restarting...") super.preRestart(reason, message) } override def postRestart(reason: Throwable) = { println("...restart completed!") super.postRestart(reason) } override def preStart() = println("Yo, I am alive!") override def postStop() = println("Goodbye world!") override def receive: Receive = { case Message(msg) if containsRestart(msg) => println(msg); throw new RestartMeException case Message(msg) if containsResume(msg) => println(msg); throw new ResumeMeException case Message(msg) if containsStop(msg) => println(msg); throw new StopMeException case Message(msg) if containsSecret(msg) => println(msg); throw new Throwable case Message(msg) => println(msg) } private def containsRestart = containsWordCaseInsensitive("restart")_ private def containsResume = containsWordCaseInsensitive("resume")_ private def containsStop = containsWordCaseInsensitive("stop")_ private def containsSecret = containsWordCaseInsensitive("secret")_ private def containsWordCaseInsensitive(word: String)(msg: String) = msg matches s".*(?i)$word.*" }

Finally, the Supervisor just needs to create the actor and define the failure-recovery logic:

// file PrinterActorSupervisor.scala package com.danielasfregola import akka.actor.SupervisorStrategy._ import akka.actor.{Actor, OneForOneStrategy, Props} class PrinterActorSupervisor extends Actor { override def preStart() = println("The Supervisor is ready to supervise") override def postStop() = println("Bye Bye from the Supervisor") override def supervisorStrategy = OneForOneStrategy() { case _: RestartMeException => Restart case _: ResumeMeException => Resume case _: StopMeException => Stop } val printer = context.actorOf(Props(new PrinterActor), "printer-actor") override def receive: Receive = { case msg => printer forward msg } }

That’s it! Now we just need to have fun with our buddies 🙂

When initialising our Actor system, all the Actors are created and automatically started:

import PrinterProtocol._ implicit val system = ActorSystem("printer-service") val printerSupervisor = system.actorOf(Props(new PrinterActorSupervisor), "printer-supervisor") // "The Supervisor is ready to supervise" // "Yo, I am alive!"

If no special keyword is send, nothing happens to our actors:

printerSupervisor ! Message("...please, print me...") // ...please, print me... printerSupervisor ! Message("...another message to print, nothing should happen...") // ...another message to print, nothing should happen...

When restarting our actor, it is stopped and replaced by a brand new one. Also, the event is recorded in the logs.

printerSupervisor ! Message("...why don't you restart?!") // ...why don't you restart?! // Yo, I am restarting... // Goodbye world! // ...restart completed! // Yo, I am alive! // From the logs: // ERROR [OneForOneStrategy]: RESTART // com.danielasfregola.RestartMeException: RESTART // at com.danielasfregola.PrinterActor$$anonfun$receive$1.applyOrElse(PrinterActor.scala:24) ~[classes/:na] // ...

When resuming, nothing happens but a nice warning is in the logs for us:

printerSupervisor ! Message("...fell free to resume!") // ...fell free to resume! // From the logs: // WARN [OneForOneStrategy]: RESUME

When stopping, the behaviour is similar to the restart case scenario:

printerSupervisor ! Message("...you can STOP now!") // ...you can STOP now! // Goodbye world! // From the logs: // ERROR [OneForOneStrategy]: STOP // com.danielasfregola.StopMeException: STOP // at com.danielasfregola.PrinterActor$$anonfun$receive$1.applyOrElse(PrinterActor.scala:28) ~[classes/:na] // ...

Finally, let’s see what happen with an exception that it is not handled. Note that both PrinterActor and PrinterActorSupervisor are killed as the whole system is shutdown by the Guardian Actor.

printerSupervisor ! Message("...this is going to be our little secret...") // ...this is going to be our little secret... // Goodbye world! // Bye Bye from the Supervisor // From the logs: // ERROR [LocalActorRefProvider(akka://printer-service)]: guardian failed, shutting down system // java.lang.Throwable: null // at com.danielasfregola.PrinterActor$$anonfun$receive$1.applyOrElse(PrinterActor.scala:30) ~[classes/:na] // ...

Summary

The Akka Actor Model allows the creation of failure-recovery systems thanks to its well-structured hierarchy of Actor Supervisors. This article has provided a tutorial on how supervision can be used to control the life cycle of Actors in order to handle and recover from errors.