Akka references serialization with Protobufs (up to Akka 2.5)

or what type should the SinkRef be of?

Photo by davide ragusa on Unsplash

Akka has been using Java serialization, which is known to be less than perfect — for various reasons that I’m not gonna point out here, by default until version 2.6. So if only your class is Serializable it’ll be serialized that way and if you use Akka remoting and Protobufs where your messages inherit from Protobuf’s GeneratedMessage then “proto” serializer kicks in. All this serialization mechanism is flexible and fully customizable, so you can configure what and how it should be serialized.

For this post let’s assume there is Akka-clustered application set up and we need to send messages to some actors. As actors are location-transparent, we don’t know whether the receiver is remote or local. Additionally these messages may contain Akka-specific references like ActorRef , StreamRef`s like SourceRef or SinkRef e.g. for establishing streams remotely. Also we want these messages to be serialized with Protobuf serializer, not the default Java one.

Let’s say we have a message with the following structure to send:

case class SubscribeToControlData(

config: StreamConfig,

sink: SinkRef[ControlData]

)

First we need Protobuf’s message definition so that we have a way to build actual Protobuf object that can be serialized later upon sending it over the wire. So let’s get to that.

First question: what type should sink field be of? There is no SinkRef type available for us. Actually, if you search through Akka streams repository you can find some (https://github.com/akka/akka/blob/master/akka-stream/src/main/protobuf/StreamRefMessages.proto) but that isn’t something that’s published in Akka artifacts so we cannot really use these directly.

In order to correctly serialize it we need to use Akka serialization insfrastructure (lookup and use specific serializer from the pool of configured ones etc.). Taking this into account final version of our message would look like below:

message SubscribeToControlData {

StreamConfig config = 1;

SerializedAkkaPayload sink = 2;

} // Serialized Akka object, eg. SinkRef.

message SerializedAkkaPayload {

bytes payload = 1;

string manifest = 2 [(scalapb.field).no_box = true];

int64 serializer_id = 3 [(scalapb.field).no_box = true];

}

Serializer id and string manifest are required so that receiving side has all the information required to deserailize payload to correct type. Let’s write code to serialize/deserialize SubscribeToControlData (or actually any other message that contains any of Akka references: ActorRef , SinkRef , SourceRef etc.).

Akka refs serialization

Next, let’s go through this code. First, we need to grab serialization extension from actor system at hand so that we have serialization infrastructure available to us. Next let’s grab serializer that is configured for given class instance (in our case SinkRef ). This is one of the serializers preconfigured out of the box in Akka. Then, we need sort of hint for deserializer of what type it should expect as single deserializer can potentially handle various classes. Lastly, we take the actual ref and turn it into bytes sequence and having all the information required we can build SerializedAkkaPayload that can be safely sent over the wire.

Deserialization is a bit simpler as it is just a one-liner that when provided with serializer id, manifest and bytes sequence turns the bytes into desired class instance (with a little help of asInstanceOf ).

With the setup above we should be good to go, but it turns out the stream binding cannot be established. This is because in remote environment there is one important detail we miss: correct system’s remote transport address. Without that ref gets serialized (and later on deserialized) with assumption of local address which results with non-existing actor and in consequence with inability to set up our stream properly. Without the address included deserialized SinkRef looks as below:

SinkRefImpl(Actor[akka://app/user/app-actor-supervisor/app-actor/StreamSupervisor-12/$$c-SourceRef-1#336780572])

As you can see, there is “locality” assumed in the actor path.

The easiest way to include correct remote address in serialized message is to wrap actual serialization with Serialization.withTransportInformation .

Now the ultimate version of our serialization toolbox looks as follows:

Akka refs serialization with correct system address included

With this in place we’re done. Our deserialized SinkRef instance turns into the following, correct form (with address included) and our remote streaming gets up and running as expected: