Creating a Blueprint for Microservices and Event Sourcing on AWS

Using Kinesis, Lambda, S3, DynamoDB, API Gateway and Kubernetes

This is the third installment of your quest for an architectural holy grail, where you have been mercilessly strangling monoliths and courageously fighting monster architectures to finally embrace the brave new world of container orchestration with kubernetes and microservices.

It’s time now to embark yourself in yet another exhilarating part of this story, where you will build an Event Sourcing/CQRS/microservices architecture along with your monolithic application, all by yourself, with just a little help from your friendly neighborhood cloud provider, Amazon Web Services.

I need to add a disclaimer here to prevent unneeded cloud wars: this is about building this architecture on the AWS platform, leveraging AWS specific managed services. While I don’t think that every cloud provider is created equal, I also think that the other two in the top tier, Azure and GCP will provide you with enough tools to build everything, if not today, tomorrow for sure. I just happen to know AWS more.

Building an Event Sourcing/Microservices architecture has been possible for many years already, in fact none of these concepts is completely new and companies with large resources have been building similar systems for a while. But it’s only today that, leveraging fully managed cloud services, it is actually simple enough to build it — with much more limited resources and effort.

You will use Kinesis Data Streams as the Event Stream, Kinesis Firehose to backup all the events in your S3 data lake, DynamoDB as a persistent Event Store to overcome Kinesis limits for message retention, Lambda to both subscribe to Kinesis events and implement simple microservices, API Gateway and/or CloudFront Lambda@Edge to route requests to the monolith or the microservices, and Kubernetes to run the rest of your services and your monolith happily together.

If you are feeling a little hesitant and skeptic about AWS PaaS (Platform as a Service) and FaaS (Function as a Service) offerings and if you wonder if leveraging all these managed services will cause you to be locked-in with your cloud provider, well then the answer is yes, of course, and your concern is valid.

In my opinion, though, if you consider the level of effort that it would take to build all of this on AWS, and then to do it again on GCP if you decide to, and then again on Azure if you have to, it would still be less than it would take if you were trying to build all of this on your own, just leveraging the IaaS (Infrastructure as a Service) offering of your cloud provider. It will simply require much less work, expertise and, in the end, money. It’s not just about leveraging the services, it’s all the monitoring and alerting and built in security that you will have to roll on your own. It’s not for the faint of heart.

And, think about it, while you are busy phasing out your monolith while adding features to your application, the good cloud folks will be busy adding capabilities to the platform, ready for you to use.

Did you jump to the bottom of the page to write a nasty comment that I have been drinking the cloud kool aid? Not yet?

Then let’s start looking at the (managed) services that you will need, and how you will use them together, and what you will have to be on the lookout for.

Writing to the Event Stream

Both your Monolith and your new budding microservices will be tasked to write events to the Event Stream. Any change (mutation) in the overall state of the system will go to the stream. Remember, in Event Sourcing the application state is in the Stream, there’s no way around it.

As simple as it gets

Assuming that mutations are external, this means that any mutation to the application state will be sent from the external source either to your monolith, or to a domain specific microservice.

The routing, using the strangler pattern, will be implemented as an edge service, either with a CDN that supports request routing, or the AWS API Gateway. Even better, anything that supports code running at the edge, like Fastly or CloudFront Lambda@Edge will allow you to dynamically decide where to route the request, based on the user’s identity, or region of provenience or just as the result of a split test.

The mutation, once validated and possibly transformed into a resulting event, will be propagated to any other downstream service via the Event Stream.

The Event Stream is the core of this architecture, and on AWS the choice is obvious: Kinesis Data Streams. Which is not Kafka, even though they both start with K.

Kinesis Data Streams

“A taxi crossing an intersection in New York City by a busy street decorated with Christmas lights” by Alexander Redl on Unsplash

Kίνησις means movement, motion. Did I mention that I studied ancient Greek in school? I knew it would have come useful one day.

There is plenty of articles and resources that you can google for, to compare the pros and cons of Kinesis vs Kafka. And to be fair, there are also many managed Kafka offerings that you could use, but in the end it’s easy to see that Kafka is your custom built off road vehicle, with a top of the line entertainment system on board, and Kinesis is a yellow cab sedan (with the mildly annoying taxi tv). Which one would you choose?

The latter (Kinesis) will get you where you want, but the driver will politely decline if you ask to take a shortcut and to climb vertically up a cliff face (or at least, most would do). The former (Kafka), will do all you want but sometimes it will require substantial upkeep if, after some fun jostling with the road, your alignment is off, or your suspensions are compromised for taking one bump too many, and the entertainment system breaks down .

Living in New York City, I am happy with just taking cabs, most of the times. You can also turn off the taxi tv.

Because it’s like a yellow cab, there are clearly documented service limits to what you can do with Kinesis, but those limits are meant to protect you, so that you can stay within the constraints of what can be guaranteed to work, and not design something that will eventually take all your time and resources to keep it running.

There are two types of constraints with Kinesis that you will need to work with: architectural and rate/throughput limits.

A theoretical and perfect Event Stream guarantees exactly once delivery and ordering of messages, and unlimited message retention, but it’s not something that you can find easily.

The unforgiveness of the laws of physics explained with a simple tweet.

The architectural constraints are what make Kinesis an imperfect (but close) approximation of the ideal Event Stream. Messages can be sometime delivered more than once, and the ordering guarantee comes with some caveats. Oh, and you have a maximum of 7 days of retention for your messages, if you are paying extra. By default, it’s 24 hours.

The rate/throughput limits constraints instead are, well, about API requests rate limits and throughput, but don’t be discouraged by them, they are well documented and you can work within them in most cases.

As a side note, I remember once dealing with a managed datastore from a well known cloud provider, for which there was no way to have any kind of documented throughput or latency limits. The answer from the support engineers was only: “there are none”. So, if you look for something where you can suddenly push a googolplex of petabytes per second with 0ms latency, ask me which one to use. And also which bridge I have on sale.

Designing your system around these constraints is paramount, but fear not: you have control on the context, the problem domain and the nature of messages that you are sending and receiving in the stream, and on the logic used to process (or reject) them. Doesn’t it feel good to know that?

Rate/Throughput limitations

A single Kinesis stream or shard may not be enough to handle the throughput that your application needs.

A Kinesis stream can increase its throughput by dividing it up in shards, and each shard can support up to 5 read requests per second, for a maximum data rate of 2 megabytes per second. And you can write up to 1,000 records per second, and each record cannot be bigger than 1 megabyte, and your total write throughput cannot exceed 1 Megabyte per second.

In theory, you can handle terabytes of data per second on a single stream if you want. Just shard and shard and shard again. But sharding does bring some issues with it, particularly when it comes to ordering guarantees, as we will see later.

These are the current limits. I think that is likely that in future AWS will announce less restrictive ones, but don’t bank on it, and start designing your system within these constraints, otherwise you may have to wait for wormholes and quantum teleportation to be a thing before you get anything running.

Lack of uniqueness

In Kinesis there’s no guarantee about the uniqueness of messages, you will get duplicates when you don’t really expect them. It can be happening because of network or application errors either on the producer or on the consumer side.

This is probably the simplest constraint to address, by just designing the handling of your events to be idempotent: each message should represent the full state of the entity at the moment of the event, and not the delta between the previous and the current state.

Let’s go back to our Medium example from the previous post, and the claps event message.

Idempotency for a the claps event

This message captures the state of the claps from user 32142459 for the post 4897249772 in its entirety. The clap button was pressed 21 times. Or maybe, this user already had clapped 20 times before, and then she came back to clap once more. It doesn’t really matter. The application that produced it (either the webapp or the mobile app) knows the full state of claps for the post and transmits it as a single event. Even if this message had been transmitted or received twice, the final count of claps from this user for this post would be the same, and no damage is done.

Think if instead each clap had been sent as an individual event. A duplicate would effectively inflate the number of claps, and nobody would want that.

In some ways, Event Streaming is a type of State Transfer (Roy, are you reading this?) between an event emitter and a one or more event subscribers. You need to model your domain so that entities are granular enough that their state is not shared among different emitters. If can you do that, then you can achieve idempotency (and also strict ordering) with relative ease.

The count of claps for a post, sent by the claps service

Consider the entity TotalClapsPerPost. It belongs to the Post entity. The full state of this entity is undetermined in the browser, because the browser doesn’t know if somebody else is also clapping for the same story. Imagine what would happen if your posts service was just subscribing to the claps events, using them as partial updates, and summing them up as they come for the corresponding post. Since the processing is not idempotent, duplicates would inevitably inflate again the total claps.

Just find, in your application domain, which agent is the emitter that can generate the entire state of the entity, and listen to their messages. In this example, it’s the claps service. So, for each clap event for a post, the claps service is the authoritative emitter agent for the TotalClapsPerPost entity: it will query its own datastore and emit an event with the count of claps for the post anytime you clap or unclap.

Lack of ordering

In Kinesis, exact ordering of messages within a shard is guaranteed only if you are putting records sequentially, one at a time with the PutRecord API, and also specifying the optional argument SequenceNumberForOrdering , but not if you are using the more efficient PutRecords bulk load API, where individual records in an array could be rejected and must be retried later (hence out of order).

Note that if you are using something like the Kinesis Aggregation Library, which packs multiple user records in a single, larger record, you could get by using sequential PutRecord calls with SequenceNumberForOrdering , and still maximize your throughput without losing ordering on a single shard. Please also note that using instead the Kinesis Producer Library alone is not sufficient to guarantee ordering.

But if you need to push more data, using multiple shards, it quite hard to have any reliable ordering at all, since the sequenceNumber parameter that is added by Kinesis in the message is unique to the shard, and when you are reading from each shard you have no safe way to determine the proper temporal sequence, because each shard is independent from the other, and one could deliver messages “faster”.

Compared to the lack of uniqueness, this “somehow ordered” architectural constraint is more problematic, especially when you think that we defined an Event Stream as a ledger and ledgers are, by definition, ordered.

On the plus side, not all type of messages need to be delivered in a strict order. For instance, if you and I are posting two stories, and they are are coming to Medium out of sequence, it’s not necessarily a problem. Or, if we are clapping to the same story, it still doesn’t matter which message comes first, yours or mine.

So the thing to keep in mind, before panicking and giving up on Kinesis, is that in many cases, the lack of ordering just doesn’t matter. You really have to think of the domain of your services, messages and application, and see what are the contexts it is a problem that you need to address.

claps, unclaps and shards

Typically, ordering matter for two messages related to the same entity. For instance, you could have clapped for this story 21 times but also decided to quickly “unclap” it.

Yes, it’s tricky but possible to do that, and really unfair to the writer. Think about it.

If Medium has two or more shards in the stream, it’s likely that your claps are going to be received out of order once sent to downstream services by your claps service. Say that the claps service sends a TotalClapsPerPost message that the posts service will subscribe to. In this case, there would be two events, one with no claps, and one with 21 (assuming nobody else is clapping too). So the post service may record 21 claps for this story, because the no claps event came first, and the other one last. This is obviously an example where ordering matters. You want to be able to change your mind, and make it count when you unclap!

using a better partition key

You need to consider what needs to be taken in account for the ordering requirement. In this example, ordering clearly matter only for messages from the same user, or for the same post, so you can use either the PostId or the MemberId , or both, and combine them together to use them as partition key. Your claps and unclaps will go in the same shard, and ordering will still be preserved.

using a domain specific partition key

Using a domain-specific partition key, and using PutRecord with SequenceNumberForOrdering is the simplest way to guarantee strict ordering on the records that will be retrieved from each shard in the stream. Using other things, like application specific timestamps or the Kinesis generated approximateArrivalTimestamp field will just give you an approximate order, and that is probably not enough for your consistency requirements.

I know, right now you are feeling cheated, because I made it too simple. And what if the volume of state mutations for the same entity is so high that a single shard is not enough and then scaling out is not a solution? Before you start pushing and keeping pressed the clap button to see what happens (insert evil grin here), please consider that if you exceed the throughput limitations of a single shard for a single entity, maybe you are modeling the problem wrong.

You would need to have 1,000 state changes or 1MB/second of mutations for each entity before you cannot solve the problem by scaling out to multiple shards. That would be a really hot shard. I am sure it’s a possibility, but I am also sure that you should also be able to rethink the application domain where this extremely fickle entity exists and split it into many sub-entities, using different partition keys and hence shards.

Reading the Event Stream

Photo by Jilbert Ebrahimi on Unsplash

Well well, now you have some ideas on how to produce idempotent and ordered events in the Kinesis Data Stream, and what about reading them? That should be simpler right?

Wrong.

It’s not as trivial as it seems, because in Event Sourcing, when subscribing to a Stream, the reading state is stored in the subscriber, and you need to keep track of the last event you read from each shard. Your need to do that in a durable way so that if your application dies in the middle of reading you can start from where you left off, and also so that you are able to manage dynamic shard allocations, to make sure that when your stream is split in more shards, you can still subscribe to all of them.

You can use the Kinesis Client Library which does all of that for you, but only if you run your stream subscribing applications on EC2 (meaning on servers), and you are willing to spend some time configuring it correctly. There’s nothing wrong with that of course, and you could leverage Kubernetes to do elegantly as a scalable deployment. Or you can just use Lambda Kinesis subscriptions using event-source mapping.

Lambda Kinesis subscriptions

Look Mom, it’s serverless!

In this model, AWS will simply handle basically everything for you. It will periodically invoke a Lambda function for each shard in your stream, passing an object with at most BatchSize records; it will also handle keeping track of the last record that you read in the stream, restarting from the last safe checkpoint if any error happens during processing in your function, and finally it will keep track of newly added shards and instantiate new copies of your function for each one of them.

It’s like magic. It’s perfect. I cannot think honestly of a reason to not use it. Reading from Kinesis using Lambda is very much a code it, deploy it and forget it experience. It just works.

Until it doesn’t.

But it’s not a bug, it’s a feature. Remember when I talked about rate limits? And how 5 reads per second per shard is one of them? That means that if you have just 5 functions subscribing to the stream, and you expect to read data more frequently than once per second because you care about latency in your system, it just won’t work. Also, adding shards to the stream will not help, because for each shard, you will still have 5 functions reading from them, and their combined read rate will still be at most 5 reads/second per shard.

This constraint would not be so restricting if you could define low latency, “high priority” functions, that are invoked more often, at the expense of low priority ones which can afford a higher latency because of less stringent consistency requirements.

That’s where you would think that the BatchSize parameter that you define for your functions would become important. Because a function that has still more records to read will be invoked more often, you could set a smaller batch size for functions that have more real-time requirements, and set a larger batch size for those other that can tolerate a higher latency.

But, alas, it’s not enough and it doesn’t work like that in practice. Testing this scenario with multiple functions with different batch size doesn’t show a clear and predictable behavior.

AWS provides a Lambda metric, IteratorAge , which measures the age of the last record for each batch of records processed. This metric is essential to track how much your function is lagging when subscribing to a stream.

The function with the smaller batch size will be probably invoked more often, but it will also be throttled more, and it will end up possibly lagging behind, with a higher IteratorAge than the other functions. By empirical observation (I learned that from my old school buddy Galileo) it seems that the contention on the rate limits shared resource causes the scheduler that takes care of invoking your functions to randomly succeed in executing one function but not the other, and there’s very little guarantee that it will be the one that you wanted. It kind of works, but it doesn’t.

To better understand this behavior, I have deployed 16 sink functions that are just reading from the same stream and sinking the records that have been read. For each one of them I have assigned a batch size (see the table on the left) and started pumping records into Kinesis. I thought that, in theory, I would have been able to see a pattern emerge, with the sink functions from sink00 to sink03 being invoked more often and consequently having a smaller IteratorAge compared to the other functions, such as sink12 to sink15, which have a larger batch size. I have also set up a CloudWatch dashboard to keep track of the various functions and their metrics.

What I got, after 15 minutes of pushing records in the stream, with a rate of 1.3k records/minute is this beautiful piece of modern art: