There are plenty of technologies to organize parallel computing. One of the most promising and simple ones is the model of actors. It allows to partially get rid of such major problems of parallelism as race conditions, locking waits of operations termination, endless mutual exclusions and synchronizations, and so on and so forth. This approach also simplifies the code paralleling.

We are going to review it by the example of akka framework using Java language. As for akka, it is written in Scala.

Theory and Principles

Actor is an isolated object, meaning that there are no direct references to an instance. It deals with asynchronous processing of incoming messages. A message is any invariable object implementing Serializable interface. They are stored in a queue, and when the system passes control to the actor (we’ll talk more about it soon), objects are extracted from the queue according to FIFO principle. They are processed according to the internal logic which we should set explicitly. The actors form a tree-like hierarchy. Each of them has an actor (or a system of actors) as the creator. In order to send a message to an actor, we should have a reference to it (do not mix it up with the object reference). It can be of two types: ActorRef and ActorSelection. The reference has .tell(message, sender) method that sends the message along the specified path. Sender is also a reference of ActorRef type. A receiver-actor will receive it when calling getSender() method during the message processing.

ActorRef is a reference to a certain, guaranteed to be an existing (at the moment of the reference creation) actor that is the analog of StrongReference. We can get ActorRef explicitly by calling getSender() method either when processing the messages either after having got a direct reference to the sender, or when creating children actors. To guarantee message delivery, we can use either a reply-message, or an agent. We are going to discuss it below.

ActorSelection is a weak reference. Its key distinction is the ability of creating ActorSelection, knowing just a name and a path to the actor. But the reverse side of the coin is that there are no guarantees that there is something existing on the specified path. We can send a message, but no one can guarantee its delivery. Therefore, if we need to send some data with the guarantee, but there is no direct reference to the necessary actor, we can send a random message to it and get the strong ActorRef reference from the reply.

That’s the base minimum, after which we can continue studying all of it in practice. In order to avoid manual enabling of dependencies, we should also use all libraries provided in akka (we can also utilize maven/etc).

Some Practice

The asynchronous world begins with a system of actors. Let’s create it. Usually, there’s no need to have more than one actor system within the application.

ActorSystem system = ActorSystem.create("learning2hard"); The system has plenty of settings we should specify before execution. It is active right after the creation. We can either set it in the external config file, or pass by the second argument to the constructor-method. You can read more about it at the developer’s website. We don’t need it for the moment, but it’s better to know about it anyway.

Now, let’s create the kernel actor. All the future logic will be the child one to it. Technically, the actor is an object inheriting UntypedActor. Let’s do it.

public class Kernel extends UntypedActor { @Override public void onReceive(Object arg0) throws Exception { unhandled(arg0); } }

The actor class is ready. unhandled(arg0) method marks the incoming object as the unprocessed one and writes about it in the log. Let’s add the kernel instance to the system of actors:

final ActorRef kernel = system.actorOf(Props.create(Kernel.class), "kernel"); We have created a child actor of the Kernel class with regard to the system and named it “kernel”. The system is the root of a hierarchic tree of actors. String names and paths are commonly used in akka. If it is necessary to pass some objects to the constructor of Kernel, we can do it with the help of Props.create(Kernel.class, obj1, obj2, obj3…).

We have made a programming standard meaning a program that does nothing useful. Let’s add some functional. For example, processing commands that are read from the console. Within the context of an actor, it is not allowed to use calls/operations that are locking the flow. Waiting for the command input to the console is one of them (Actually, there is a way to place them there. We are going to review the mysterious Future monads a bit later). Therefore, it is advisable to remove and avoid locking calls in the world of actors. Let’s apply the first method and read commands from the console outside the system of actors and then send them in the form of messages to the kernel.

Scanner sc = new Scanner(System.in); while (sc.hasNext()) { String f = sc.nextLine(); if (f.equals("exit")) break; kernel.tell(f, ActorRef.noSender()); } system.shutdown();

The only thing to mention is that we are sending a message from “the outer world” to the world of actors. There’s obviously no reference to the sender. That’s why we are going to use a stub.

Let’s write processing of the received string to kernel class and make a simple echo.

@Override public void onReceive(Object arg0) throws Exception { if (arg0 instanceof String) { String s = (String) arg0; System.out.println(s); return; } unhandled(arg0); }

That’s how the logic of any actor look like and the list of locks with instanceof condition and return + unhandled(arg0) is finished, after all the locks, in case something has not been processed. We have used ActorRef.noSender() as the sender. It’s time to see how the address looks like:

The sender address: " + getSender()); We get Actor[akka://learning2hard/deadLetters] and see that the address exists and it is not null. It points to the stub of the basket that accepts objects and does nothing with them. Such an address type is called local and absolute. It is right only within the limits of the current system of actors. We’ll discuss addresses in details below.

But one is not enough. So, lets create more actors and think up a simple distributed task and check, whether a number is prime. We are going to search them through without optimizations. Enter a number from the console, send it to the kernel. The latter will estimate the number of divisors and send work to the operating ones. Each of them will search out its own range of divisors. When at least one of them announces that a number is composite, it will write a result. If all of them have finished their work and found nothing, the number is prime.

To tell the truth, it’s easier to write than to tell about it. So, let’s create a worker with the clear logic.

public class PrimeWorker extends UntypedActor { @Override public void onReceive(Object arg0) throws Exception { if (arg0 instanceof Job) { Job task = (Job) arg0; for (int i = task.from; i < task.to; i++) if (task.number % i == 0) { //The number is composite. getSender().tell(new JobResult(task.jobID,false), getSelf()); return; } getSender().tell(new JobResult(task.jobID,true), getSelf()); } unhandled(arg0); } public static class Job implements Serializable { private static final long serialVersionUID = 5095931000566324969L; public final int jobID; //The task identifier. It is used to aggregate results public final int number; //The number under consideration public final int from; //The lower limit of the search range public final int to; //The upper limit of the search range public Job(int jobID, int number, int from, int to) { this.jobID = jobID; this.number = number; this.from = from; this.to = to; } } public static class JobResult implements Serializable { private static final long serialVersionUID = -1788069759380966076L; public final int jobID; public final boolean isPrime; public JobResult(int jobID, boolean isPrime) { this.jobID = jobID; this.isPrime = isPrime; } } }

There should be nothing complicated about it. Now, let’s get down to the kernel code. It distributes the work and joins results.

public class Kernel extends UntypedActor { //work ID -> (currentWorkerCount, isPrime) private TreeMap<Integer, Pair<Integer, Boolean>> jobs = new TreeMap<Integer, Pair<Integer, Boolean>>(); private int job_id_counter = 0; @Override public void onReceive(Object arg0) throws Exception { if (arg0 instanceof String) { int i = Integer.valueOf((String) arg0); for (int j = 2; j < i; j++) getContext().actorOf(Props.create(PrimeWorker.class)).tell(new PrimeWorker.Job(job_id_counter, i, j, j+1), getSelf()); jobs.put(job_id_counter, Pair.get(i - 2, true)); job_id_counter++; return; } if (arg0 instanceof JobResult) { JobResult jr = (JobResult) arg0; Pair<Integer, Boolean> task = jobs.get(jr.jobID); if (!jr.isPrime) task.second = false; task.first--; if (task.first < 1){ System.out.println("Number " + jr.number + (task.second ? " prime" : " composite")); jobs.remove(jr.jobID); } getSender().tell(PoisonPill.getInstance(), getSelf()); //The actor has done his job. Send the command of deletion to it. } unhandled(arg0); } public static class Pair<A,B> { public A first; public B second; public static <C,D> Pair<C,D> get(C a, D b) { Pair<C,D> p = new Pair(); p.first = a; p.second = b; return p; } } }

Let’s not worry about the absence of checks, that was not our goal. What we can notice is some wastefulness in a way that we create actors for one operation and then delete them. Of course, we can place references to them in the array and call them from it in turns. But, fortunately, it’s all done. Let’s try to improve the code by applying off-the-shelf solutions.

Router is a specialized object that passes incoming messages to actors by using a certain strategy of choosing them. It can be of two types: a pool and a group. A group chooses actors according to the strategy in the specified path. A pool creates them itself. A router is rarely used as an independent object. As a rule, it is encapsulated to the actor. We are going to deal with such actor-routers exactly. There are plenty of choosing strategies. We can read about them at the website of the framework developers. The most multipurpose ones are SmallestMailboxRouter(SM) and BroadcastRouter. The first one chooses the least loaded actor from the set (according to the mailbox size). The second one sends out messages to all of them. Let’s declare a router with a pool of actors and SM strategies.

ActorRef router = getContext().actorOf(new SmallestMailboxPool(5).props(Props.create(PrimeWorker.class)), "workers"); In this case we assign the pool size statically. We can also create dynamic changing pools, or even write our own logic for changes. But all of that is beyond this article.

Let’s modify the code of sending the kernel tasks (and remove sending the message to get rid of the actor):

if (arg0 instanceof String) { int i = Integer.valueOf((String) arg0); for (int j = 2; j < i; j++) router.tell(new PrimeWorker.Job(job_id_counter, i, j, j+1), getSelf()); jobs.put(job_id_counter, Pair.get(i - 2, true)); job_id_counter++; return; }

It’s done. Now the router will take care of tasks distribution. We’ve reviewed the basic moments. It’s time to move on to more subtle aspects.

Execution Context. Locking Operations. State Transitions.

It has been already mentioned that we pass control to actors, but did not specify the way we do it. The execution context is in charge of it. We indicate it explicitly in the configuration file. fork-join executor is used by default. It is multi-purpose and the most efficient one for common tasks. Create some number of flows that pass control to actors by choosing them in n flows from the list. The criterion of switching to another actor is either an empty queue of messages, or processing k messages in a row, provided that nothing else is specified. It is obvious that when an actor tries to call a locking operation, the flow is locked and other actors begin being processed much slower. If the number of locked actors becomes equal to the number of flows, the system dies out.

Fortunately, there is a way out from even such an awkward situation. Its name is Future. They are not the Futures we are used to in Java, but a functional good thing from Scala. To begin with, let’s talk about Future in Java. It’s an object that can contain the result of executing an asynchronous operation. The main distinction of this future is callback, which is the possibility of performing some actions after the task termination, absolutely asynchronously. The lion’s share of the framework’s “strength” lies in futures. How does it look like?

Future<String> f = future(new Callable<String>() { @Override public String call() throws Exception { //Some blocking stuff return "hello habr"; } }, getContext().dispatcher()); f.onComplete(new OnComplete<String>(){ @Override public void onComplete(Throwable arg0, String arg1) throws Throwable { System.out.println(arg0 != null ? arg0 : arg1); } }, getContext().dispatcher());

In addition to Callable (the analog of Runnable with the return value), we also pass some dispatcher during creation. That is what the actor executes. But if we make the actor context execute a locking action, will the latter freeze? The thing is in akka having two pools of flows. One of them is for actors and we can not lock it. The other one is for some trifles like Future. This pool of flows is of a variable size. Both of them are limited. But when calling a locking function, we almost (within reasonable limits) do not risk slowing down the basic set of actors.

Getting back to the mentioned above code, we can see Callback. They have left a lasting impression on the author. Those being familiar with functional languages can notice that Future behaves like a monad. Actually, it is a monad. This means that we can make compositions of them and partially apply functions. All of that is the Futures module.

Even if we do not go into details of the things that are far from the classic Java, it is a possibility of creating asynchronous operations in the code, without creating flows and thinking up the architecture. We can also input by organizing chains of “read a file, ask a user and, at the same time, process the data; save the result depending on the respond” type. All of that should be done quickly, simply and compactly.

Of course, a fly in the ointment in this situation is debugging. The code is extremely reactive. Akka executor helps you in many ways. For instance, exceptions are not derived inside a future. At all. If something has happened there, you will learn about it when noticing that OnComplete call is absent. And you will never know, what exactly happened. There’s a dirty hack for the more handy debugging — recover. Akka is famous for its failsafe policy. Futures are no exception. Therefore, they have inbuilt tools to remove an incorrect result of a future. It looks like the following:

Future<String> f = future(new Callable<String>() { @Override public String call() throws Exception { //Some blocking stuff return "hello habr"; } }, getContext().dispatcher()); f.recover(new Recover<String>() { @Override public String recover(Throwable arg0) throws Throwable { arg0.printStackTrace(); return null; } }, getContext().dispatcher()); f.onComplete(new OnComplete<String>(){ @Override public void onComplete(Throwable arg0, String arg1) throws Throwable { System.out.println(arg0 != null ? arg0 : arg1); } }, getContext().dispatcher());

We can see that recover does not fix the error in the given example and returns null to onComplete. At least, it writes about it. As a result, it is clear that Future is an extremely powerful tool. It allows to implement asynchronous branching right amid the asynchronous code and not be afraid of locks. But it does require some caution.

There is a useful and widespread template for the world of actors. It is based on Future and its name is ask.

Future<Object> ask = Patterns.ask(ActorRef/ActorSelection, Object, timeout) It sends the Object message to ActorRef/ActorSelection actor and waits for the timeout reply. If it has not awaited it, we will get an exception. If otherwise, it calls the OnComplete method. We will delete the temporary actor after execution. Future does not use a pool of flows for actors. Therefore, we should not be afraid of locking operations. As well as regular futures, asks can be inbuilt without limit. A similar template allows not to create unnecessary actors for the execution of simple tasks and organize event-driven sequences of actions.

Another useful derivative from Future is the agent that is the carrier of the asynchronous state transition.

Agent is a serialized object. All its copies are bound with a serial data bus. The state of each instance of an agent is available instantly. In addition to that, there is an ability to update asynchronously the state of all agents. It works like this:

Let’s create an agent

Agent<Config> agent = Agent.create(new Config(), getContext()) We can send it in the form of a message, or even pass it to another machine (more about it soon). We can extract the instance of Config from it at any moment with the help of .get() method. But its most useful feature is send(config) method. It sends an argument as a state to all instances of the agent regardless of their location. I guess, that’s all about the basic tools. It’s time to go to the network.

TCP/IP

Even if we look at the list of libraries in the supply, it becomes clear that the network is operating on netty. As for the framework, we have to deal with a much bigger level of abstraction.

Let’s consider a real case of parsing the incoming data by quite a real example. We are going to review the case of parsing arbitrary TCP packets. It’s good form to distinguish packet parsing from their logic. Therefore, we’re going to need a codec and a handler. The raw data in the form of ByteString (more about it below) comes to a codec-actor. There it forms ready packets. After that, the data is sent to the handler-actor, where the logic affects it. Let’s create an actor receiving incoming connections. It will also create a codec + handler chain.

public class TCPListener extends UntypedActor { private ActorRef net; @Override public void preStart() throws Exception { net = Tcp.get(getContext().system()).manager(); net.tell(TcpMessage.bind(getSelf(), InetSocketAddress.createUnresolved("127.0.0.1", 90), 100), getSelf()); } @Override public void onReceive(Object arg0) throws Exception { if (arg0 instanceof Connected) { Connected msg = (Connected) arg0; ActorRef handler = getContext().actorOf(Props.create(PacketHandler.class)); ActorRef codec = getContext().actorOf(Props.create(TCPCodec.class, getSender(), handler)); //The Actor codec accepts the handler by the constructor argument getSender().tell(TcpMessage.register(codec), getSelf()); //The raw data will go to the codec net.tell(msg, getSelf()); } unhandled(arg0); } }

Now, let’s take a look at the codec logic. As for the handler logic, it is quite trivial and depends on the type of data you are parsing and things you need from it.

public class TCPCodec extends UntypedActor { private ActorRef connection, target; private ByteString buffer = ByteStrings.empty(); public TCPCodec(ActorRef conn, ActorRef target) { this.connection = conn; this.target = target; } @Override public void onReceive(Object arg0) throws Exception { if (arg0 instanceof Received) { buffer = buffer.concat(decrypt(((Received) arg0).data())); return; } }

Well, it turns out to be a sort of a “zen-code” that is similar to the current one, but does nothing. We have received the incoming data and written it to the previous one. Ok, — you might say, — the incoming flow of bytes is put into buffer. But how do we extract real strings, numbers and arrays from it? Akka does not know the data type you are going to transfer. Therefore, we will have to write the methods of operation ourselves. Something like this:

public class ReadableProtocolBuffer implements DataInput { public static final Charset CHARSET = StandardCharsets.UTF_8; private ByteString string; private int index; public ReadableProtocolBuffer(ByteString string) { this.string = string; this.index = 0; } @Override public void readFully(byte[] b) { readFully(b, 0, b.length); } @Override public boolean readBoolean() { return readByte() != 0; } @Override public byte readByte() { return string.apply(index++); } @Override public int readUnsignedByte() { return readByte() & 0xFF; } @Override public short readShort() { return (short) readUnsignedShort(); } @Override public int readUnsignedShort() { int h = readUnsignedByte(), l = readUnsignedByte(); return (h<<8)+l; }

I guess, that’s it. Since the case was general, it looks quite low level. By analogy with the provided above example, we can write reading from the ByteString data of an arbitrary structure. If we need to send data to a client:

connection.tell(TcpMessage.write(ByteString data, getSelf()); In general terms, that’s the way akka operates. When connection is closed, the code receives Tcp.PeerClosed$ message. In this variant, the actors will not die after the close. That’s why we should kill them by sending PoisonPill killer-message. After receiving it, the actor terminates.

Clusterization and Addressing

Finally, we have made it to the main framework advantage. It’s the efficient clusterization out-of-the-box, with no efforts on the programmer’s side. But we should deal with actors addressing we have already mentioned before. There are three types of addressing:

Local relative: «kernel/core/worker»; Local absolute: “/learning2hard/user/kernel/core/worker” — the address begins with a slash; Network absolute: «akka.tcp://learning2hard@testhost.com:100500/user/kernel/core/worker».

Knowing the network address, we can send a message to a remote actor, in case it enters the same cluster as the current system of actors.

ActorSelection remove = getContext().actorSelection( "akka.tcp://learning2hard@testhost.com/user/kernel/core/worker"); Clusterization requires some config modifications, but we can also do it right from the code:

ActorSystem system = ActorSystem.create("learning2hard", ConfigFactory.parseString( "akka {

" + " actor.provider = \"akka.cluster.ClusterActorRefProvider\"

"+ " remote.netty.tcp {

"+ " hostname = \"127.0.0.1\"

"+ " port = 0" + " }

"+ " cluster.seed-nodes = [\"akka.tcp://learning2hardt@testhost.com+\"]

"+ "}"));

We can see that _remote_module is used here. But we have not mentioned it yet. It is exactly what provides communication between actors by means of the network. Clusterization is a wrapper over remote, but we are not going to dwell on it. port=0 parameter will be replaced with any other.

At this stage, the cluster is ready and can perform its functions. We are going to talk about them below.

All important events (connecting new nodes, disconnecting them, as well as problems with connection, metrics) are available to the cluster bus subscribers. Let’s take a look at the way it operates:

Cluster cluster = Cluster.get(getContext().system()); //the bus is initialized cluster.subscribe(getSelf(), ClusterEvent.initialStateAsEvents(), MemberEvent.class, UnreachableMember.class); //subscribed to receive events we’re interested in.

There’s also a class of such buses in akka. Its name is EventBus. There are three basic methods: publish, subscribe, unsubscribe. They are quite handy, but have nothing unique when comparing them with similar ones.

To balance the load, we should take off the metrics and the data about memory, the processor and the network load from each machine. Akka supports several information collectors. The most advanced and accurate one is sigar. To make akka use it, it’s enough to add sigar.jar to the list of the imported libraries. We should also add the necessary natives. All of that is googleable at the developers’ website. Metrics messages are regularly sent to the cluster bus.

Strange as it may sound, that’s it about the cluster theory. There is no doubt that there are plenty of fine points, such as the role of nodes, messages about reaching a certain size, node life cycles (as well as actor life cycles; we have not considered this question though). But you can learn about them on your own. Moreover, they are not so important at the beginning. As an example of working with a bus, let’s review a typical situation – sending a job to the newly-connected nodes.

Cluster cluster = Cluster.get(getContext().system()); cluster.subscribe(getSelf(), MemberUp.class); @Override public void onReceive(Object message) { if (message instanceof MemberUp) { if (member.hasRole("lazyslave")) getContext().actorSelection(member.address() + "/user/kernel").tell("hey, slave", getSelf()); return; } unhandled(message); }

We can see the way remote nodes are addressed. There is something new, though. It’s a role. In the given case its «lazyslave», which is just a parameter of a certain node, or rather a list of strings. It is defined in akka.cluster.roles configs.

We cannot but mention the specific routers for clusters. They do not differ from the mentioned above ones, except for the fact that they place (or search) actors on the local machine only, but on all (or with a certain role) nodes of the cluster. Here’s a common example:

int totalInstances = 100; int maxInstancesPerNode = 3; boolean allowLocalRoutees = false; String useRole = "compute"; ActorRef workerRouter = getContext().actorOf(new ClusterRouterPool(new ConsistentHashingPool(0), new ClusterRouterPoolSettings(totalInstances, maxInstancesPerNode, allowLocalRoutees, useRole)).props(Props.create(Worker.class)), "workerRouter");

I guess, that’s about it. Akka is a good and efficient framework for parallel computations. I could even call it the best in its class. It also has low barriers to entry, in contrast to the “manual” parallelism, synchronizations, mutual exceptions, race conditions and other things.

I am providing the list of answers to the questions that seemed interesting and nontrivial. You are most welcome to write any additions and requests.

FAQs

— You have mentioned failsafe. But I have not met it anywhere, except for the futures that are “silent” regarding exceptions. Why?

Not to waste time of readers, the strategies of actors’ behavior in case of an exception appearance have been omitted. Simply put, an actor has a supervising strategy that informs him what to do in case if any of his child actors throws an exception. There are 4 basic actions:

kill the child actor;

restart the child actor;

do nothing about it;

throw an exception itself.

To learn more about it, you should visit the framework website.

— You have suggested creating chains of Futures that have been acquired by means of Patterns.ask. They create a temporal actor for each instance. Then the actor is killed. Is it resource-consuming? Moreover, you have called the method of creating actors and then killing them after the task termination as inefficient.

Akka actors are very light and consume few resources. You could google the results of benchmarks. As for the temporal actors created by ask, they consume even less resources. They are removed as soon as the timeout finishes. Bringing no supervising strategies, they are much lighter. Their lifecycle is not controlled. Therefore, we should avoid creating redundant actors, but we should not be afraid of it.

— What is the optimum number of actors in a system?

Whatever you need. As for business solutions, it is suggested to create one actor for each set of stored states. Sometimes their amount is more than 4 or 5 million per node. Nevertheless, it operates successfully.

— Is searching actors by name efficient?

It is optimized well and uses Red-black trees. Thus, it’s quite efficient. By the way, it is quite obvious that the search speed is quite high on certain routes.

Actors and Java 8

Its worth noting that actors supporting lambda calculations are experimental. Therefore, it is not recommended to use them in production.

We can use lambda-actors together with regular ones. Their basic distinction is the inherited class. In the given case its AbstractActor. The argument of receive method is of PartialFunction type, which shows what and how the actor can process. But we should not be afraid of bulky types. There’s ReceiveBuilder constructor of the partial function for that purpose. A typical declaring an actor looks like the following:

public class EchoActor extends AbstractActor { private final LoggingAdapter log = Logging.getLogger(context().system(), this); public EchoActor() { receive(ReceiveBuilder. match(String.class, s -> { log.info("Echo string: {}", s); }). match(Integer.class, i -> { log.info("Input integer: {}",i); }). matchAny(o -> log.info("Unknown messsage")).build() ); }

This provides the more handy (when getting used to it) syntax. But it is known that the lambda calculus is ill-optimized in Java. It is the syntactic sugar, which has a negative impact on performance. Akka developers warn about it.