As a developer, I find it very convenient to develop and deploy Lambda functions. My favorite programming language is Python, so both Serverless and Zappa are great.

When scaling a serverless application into a real architecture, that is composed of many resources and elements — it becomes hard to design it properly. Questions such as “how many functions should I have?”, or “how should my functions communicate with each other?” often remain unclear to most of us.

In the following post, I will shed some light on how to distribute messages between AWS Lambda functions, taking in consideration service decoupling, end-to-end performance, troubleshooting, and more.

Inter-Function Communication

When building a distributed architecture, communication between processes (or in our case — functions) is critical. Luckily, if you are using Lambda, AWS provides several ways to do it.

Let’s explore the options using two Lambda functions (A)→(B):

AWS SDK (AKA boto3 in Python)

The AWS SDK allows us to manage and use our AWS resources. In our case, we want to invoke Lambda B, with a payload from Lambda A:

Invoking a Lambda

This method runs Lambda B synchronously. Before diving into details let’s present an asynchronous alternative:

Invoking a Lambda asynchronously

The differences between the methods are:

Synchronous: Lambda A waits for the response from Lambda B. You pay for both Lambdas (although Lambda A is idle).

Asynchronous: Lambda A invokes Lambda B and immediately continues.

The only reason to use synchronous invocation is if you are dependent on Lambda B’s response. In this case, consider doing some changes to decouple your functions.

Message Queue #1 — SNS

SNS acts as fully managed, simple, message queue (Pub/Sub model), that integrates seamlessly with Lambda. With a simple setup, we create an SNS queue and configure that messages will trigger Lambda B (with the payload).

Publishing message to SNS

Message Queue #2 — Kinesis Data Streams

Kinesis Data Streams offers a real-time queue, dedicated to handling and processing a mass amounts of data (such as video streams or data from users).

Setting up Kinesis is as simple as SNS, with built-in integration to trigger Lambda B for every new message.

Putting a record in Kinesis

The downsides of distributed architectures

So far, we just went through the possibilities, without understanding the implications of developing a distributed architecture.

Troubleshooting, for example, is much more difficult. If an exception is raised in Lambda B, we would like to trace back:

What message triggered the Lambda (via the SNS, Kinesis or Lambda A). What happened in Lambda A?

It gets even more complex in case we have hundreds of Lambda functions (that run real code). Using the technique of distributed tracing, we can understand the behavior of such applications and troubleshoot issues.

Additionally, performance analysis of asynchronous, end-to-end events in serverless is a complex task nowadays. Most of the solutions require from the developers to manually log everything. End-to-end performance analysis is still critical in serverless to understand the impact on our end users.

These problems are not new, but they are getting amplified when using serverless resources.

End-to-end performance analysis

Let’s get back to inter-function communication and analyze our end-to-end performance results. The overall code:

At Epsagon, we are developing a fully automated solution for end-to-end tracing of serverless applications. Let’s see the visual results: