Introducing elastic actors, a distributed, persistent actor system

So what’s the alternative? What if your cache would get updated at the same time as your database? What if you had a framework that not only took care of that for you, but also make sure that the system is load balanced and horizontally scalable? That’s where the actor system comes in, for BUX. We use a distributed, persistent actor system called elasticactors. It’s an open source framework, developed by Joost, CTO and co-founder of BUX. You can take a look at all the framework code here:

Simply put, the state of each user in the system is represented in JSON format:

Again, in real life things are a bit more complex in regards to the data structure. But the above snippet is representative for what we actually store about a user at BUX.

This JSON is deserialized to (serialized from) a simple POJO:

Porfolio object contains a list of positions and the cash balance object. The only indicator that this class is more than a POJO is the JacksonActorState class it extends.

Accessing the actor state

We call the above Java class (or the JSON representation of it) the “actor state”. In order to access the above state we have to “ask” the user actor for its own state. The user actor itself is the only one who has access to that state. We can do that from a simple REST endpoint. And using Spring 5 WebFlux functionality, we can keep the boiler plate code to a minimum:

In a single line of code, I’m “asking” the actor for its state (or a part thereof), by sending it a message called PositionsRequest. And I’m expecting a response of the class “PositionsResponse”. For completeness, this is how these classes might look:

The PositionsRequest is an empty class used as an indicator to instruct the actor to perform a certain action. Note the “@Message” annotation.

The PositionsResponse contains the list of positions and is forwarded to the requesting client. Note, again, the “@Message” annotation.

What’s an actor?

I’m sure you’re seeing a pattern by now. Almost everything in our framework is a POJO. Some annotations here and there, maybe an interface to implement. But by having simple building blocks, everything is straight forward, easy to follow, to understand and to code against. Once a new developer learns a couple of conventions we have in our code base, they’re good to go almost from day one.

But what is the “ActorRef” object I’m creating and referencing in the controller? As the name says, it’s the “actor reference”, the class that represents the actual actor in our system. An actor always has two components: the actor state and the actual actor which is responsible for all the business logic.

The actor handles the PoisitionsRequest class by answering with a PositionsResponse to whoever sent the original message (sender.tell method call).

I’m overriding the “onReceive” method that handles all incoming messages. Right now, I’m only handling a single message, but you can imagine that as an actor grows, its responsibilities grow as well. It will end up handling more messages and having an endless if/else or a big switch statement is not something that’s easily maintainable. Fortunately, the framework allows for separate message handlers for each individual message. All we have to do is create a method with the appropriate annotation and we’re good to go:

Exact same functionality as when overriding the onReceive method, but achieved in a more elegant way.

Under the hood

You’re probably wondering, by this point, how does the actor receive the message and how does the controller receive the response? It’s all done by the actor framework: through RabbitMQ (or any other messaging bus, if desired) the actors listen to messages and can also send replies on the same messaging bus. The framework hides a lot of this complexity, obviously. For example, in the controller, when the ask method is called, a lot of things happen in the background for the controller to be able to listen to a possible response from the actor it’s “asking” for positions.

Changing actor state

Of course, queries are only half the picture. What about actually changing the actor state? Well, same as a query, everything is done by sending the actor a message. The actor will have to know how to handle that message. We can build a new message class that will change the actor’s state, let’s say by adding a position to the user’s portfolio.

In the controller we can again have a very simple endpoint that sends this message to the actor:

Added a POST method that sends the above defined method to the user actor

In this case, I’m not waiting for a reply, so I’m not “asking” the actor anything. I can then use the “tell” method. The actor will handle the message in a very similar way to the PositionsRequest message:

Business logic definitely has a place in a message handler. Note that because no response is required, none is sent from the handler method.

The next time we ask the actor for its state, we will receive the updated positions view.

Actor system data storage

And that’s it! The actor is now able to answer our request and “tell” us what its state is and we are able to change said state. But how does the actor come to be? Where is its state stored, how is it retrieved in the first place? When handling a message the actor has access to the state that is loaded in memory and can modify it. The next time it handles a message any modifications will be reflected as the actor state will just be stored back in memory. But what about when the server restarts? What if the actor state is not loaded in memory, where does it come from? Well, the answer to that is simple, from the persistent storage!

The actor framework has a simple FIFO cache in which it keeps the actor states. It loads the JSON state from the persistent storage every time an actor is accessed and its state was not yet loaded in the cache. At BUX we persist all our actors in a Cassandra database, but this can be replaced with any type of storage. We chose Cassandra because it’s right for our use case: it has excellent writing performance, it can scale and it’s highly available. When an actor state changes, its state is persisted both in the cache but also to Cassandra. By default, after handling every message, an actor will persist its own state. The developer does not need to worry about that, it’s handled automatically by the framework. Of course, it’s a waste to persist the actor state if it has not changed (when handling a query, for example). We are in control of which messages trigger a state persist, with a simple annotation:

Note the “@PersistenceConfig” annotation. We are purposely excluding the PositionsRequest message. This means that after handling that message the actor state will not be persisted to the actor storage. Any changes made in handling of that message will be lost with a server restart. For all other messages, state is persisted.

OK, but I hear you say, what’s so special about this? This whole thing could have been accomplished with a simple database and a straight forward write to cache first, database second. I agree. That works fine, for a single server. But what happens when you want to scale out? Your cache will only be up-to-date on the server that handled the write request.

But will it scale?

We discussed what an actor system is and by this point I believe you have a pretty good idea about how our actor system stores and retrieves state. But what about the distributed part? Well, that’s where the “elastic” name comes in. The elasticactors system is partitioned into shards. When an the system is bootstrapped for the first time, it’s configured with a certain number of shards. To give you an idea about scale, our production systems usually contain 256 shards. Each actor created in elasticactors is assigned to a single shard, by using a hashing algorithm on the actor id. That means, that for as long as that actor will exist, it will always be assigned to the same shard.

OK, so now we split each actor into shards. The actual “distributed” part comes in when we add servers to the cluster. Each shard is assigned to a server through, you guessed it, a hashing algorithm. I’m not going to go into the details, but if the number of machines in a cluster stay the same, a shard is always guaranteed to live on the same server. Every time. So, now we can scale out our cluster.

Actors are represented by smiley faces. Each shard will hold more or less the same amount of actors.

Load balancing

The load balancer is not aware of where a request needs to be handled, but in practice that’s not a big problem. A controller on any of the servers in the cluster gets the REST request. It then forwards a request to an actor. Does that actor live on a shard assigned to the same server? Great, the actor framework just forwards the message to the actor with one less network hop. It does not, no problems, the message can just be sent on the messaging bus and it reaches the actor anyway. Because the actor state is kept in memory, queries are really fast. And because the actors are distributed across the whole cluster and the framework itself makes sure that the right actor receives the right message, the developers don’t have to worry about keeping the cache in sync. And, finally, because elasticactors is designed to scale, servers can just be added to the cluster without any problems. When a new server joins the cluster, shards from existing servers are redistributed as equally as possible across all the machines. And when a shard moves to a new machine, so does every actor who is assigned to that shard.

By scaling horizontally not only do we reduce the amount of queries each server needs to perform, but we are also creating more memory space for our actors. You can imagine that as an app grows, so does its user base. If yesterday you could fit most of your user actors in memory on 3 servers, tomorrow 5 might barely be enough. For an actor system to be as efficient as possible it needs to have as many (if not all) of the actors that are regularly queried available in memory.

The main difference between having a single database and a distributed actor system is how you think about your data. An actor becomes both data (through the actor state) and business logic (through the message handlers). Obviously, in certain cases, this is not desired. But for many of our use cases our actor system gives us great performance out of the box with scalability included for free. It’s also a very simple model to develop against, making for a really shallow learning curve for new hires.

In conclusion

Now, should you start using actors for all your business problems? As the old saying goes, if you only have a hammer, everything looks like a nail. There are limitations to the actor model. The biggest one is that, by default, you just cannot ask the system for example “how many users have less than 2 positions?”. This would be a really simple query to do for many databases. But in our actor system you can only get the state of a single actor at a time. Of course, there are ways around that, but they usually involve relying on a separate database for queries like this.

I mentioned in the beginning that a actor system is not a silver bullet. I hope that now, that we’ve reached the end of this article, you have a better understanding of what an actor system can do for you, but, more importantly, where are the caveats when using one.