As a continuation of my friend’s great post about schema evolution, I would like to focus now on combining schema evolution (via Schema Registry) for a topic with multiple message types. We know that currently, Schema Registry doesn’t support multiple schemas. Ok, maybe I’m not precise, it supports multiple schemas, but without any backward/forward compatibility checks. This is useless because I would like to manage schema compatibility for particular messages.

Fortunately, Apache Avro provides something called unions. If you don’t fully understand the documentation from the link, don’t worry - it will be clear if you see some basic example. Let’s say we want to produce Email, Sms and PushNotification message for a single topic. Now we need to wrap those message in a single message e.g. MessageToSend.

public class MessageToSend { String type; String correlationId; Object payload; }

Payload field is basically a union, a field that could be Email, Sms or PushNotification.

Yeap, Object type required nasty things like casting, etc. But this is the only way to represent not related types in Java. In Scala it is a little better, with avro4s and a bit of shapeless magic, the compiler will verify that you are reading/writing only a valid object from this field. Although, sooner or later, you will see constructions like this:

case Inr(Inr(Inr(Inl(emailPayload)))) => ...

Just hide them behind some nice methods and don't spread shapeless dependencies across all classes.

Schema definition for such message could look like this:

{ "type":"record", "name":"MessageToSend", "namespace":"com.ludwikowski.schema", "fields":[ { "name":"type", "type":"string" }, { "name":"correlationId", "type":"string" }, { "name":"payload", "type":[ { "type":"record", "name":"Sms", "fields":[ { "name":"phoneNumber", "type":"string" }, { "name":"text", "type":"string" } ] }, { "type":"record", "name":"Email", "fields":[ { "name":"addressTo", "type":"string" }, { "name":"title", "type":"string" }, { "name":"text", "type":"string" } ] }, { "type":"record", "name":"PushNotification", "fields":[ { "name":"arn", "type":"string" }, { "name":"text", "type":"string" } ] } ] } ] }

Now you should feel how useful unions are. Thanks to them, we created a schema for multiple message types and we can use Avro schema evolution. Let’s say we need to add an additional field addressFrom to Email class, but we need to be backward compatible with previously created messages, so this field must have a default value.

New schema could be changed to:

{ "type":"record", "name":"MessageToSendV2", "namespace":"com.ludwikowski.schema", "fields":[ { "name":"type", "type":"string" }, { "name":"correlationId", "type":"string" }, { "name":"payload", "type":[ { "type":"record", "name":"Sms", "fields":[ { "name":"phoneNumber", "type":"string" }, { "name":"text", "type":"string" } ] }, { "type":"record", "name":"EmailV2", "aliases": ["Email"], "fields":[ { "name":"addressFrom", "type":"string", "default":"andrzej@test.pl" }, { "name":"addressTo", "type":"string" }, { "name":"title", "type":"string" }, { "name":"text", "type":"string" } ] }, { "type":"record", "name":"PushNotification", "fields":[ { "name":"arn", "type":"string" }, { "name":"text", "type":"string" } ] } ] } ] }

For the sake of the demonstration (and Java compiler), I added those V2 postfixes (in names). They are redundant and usually, you should avoid them. As you can see in this test - writing MessageToSend and reading MessageToSendV2 will end with addressFrom filled with a default value.

The same thing will happen when we use Schema Registry. Your Kafka consumer will be able to read correctly both versions of the MessageToSend. Unfortunately, until this PR is not merged it is not possible to simulate this in a single unit test. That's why I implemented 2 separate modules to verify Schema Registry, please follow the instruction on how to run them.

At the end, if you're wondering how to generate those schemas, just look at this test.