July 24, 2019 Corentin Doue 16 min read

At first glance, serverless and web real-time doesn't seem to easily match together. The aim of serverless is to have a very short-lived backend whereas web real-time require to keep an open connection with this backend. We were curious at Theodo to understand to what extent it was possible to add a real-time feature in our serverless web applications.

I was in charge of migrating one of our web apps with a real-time feature to AWS Serverless. To do so, I benchmarked how to implement web real-time with a serverless architecture on AWS.

Before building a web real-time infrastructure, it is important to understand how AWS implements serverless and what modules they provide for such an architecture. I invite you to read my previous articles Understand AWS serverless architecture in 10 minutes and Store your data in AWS Serverless architecture if you are not yet familiar with the Serverless Architecture in AWS.

In this article I will present you the four solutions I explored to implement real-time in serverless:

1. IoT / MQTT

“MQTT is a machine-to-machine (M2M)/”Internet of Things” connectivity protocol. It was designed as an extremely lightweight publish/subscribe messaging transport.” — Source: MQTT

The core of MQTT is a broker which stores the subscriptions to topics and handles the publication in those topics by returning the payload to each subscriber. It seems appropriate to implement real-time in serverless because it outsources the PubSub handler.

Defining a Thing in AWS IoT creates an associate MQTT broker. It's scalable and has on-demand pricing service, therefore compatible with a serverless infrastructure.

How does it work

AWS Cognito is recommended to access to AWS IoT with a web client. Use AWS Amplify or aws-iot-device-sdk to subscribe to some topics of the IoT Thing MQTT Broker. The available topics are defined in the policies linked to the IoT Thing. It is possible to restrict access to only certain actions such as subscribing. With more complete integration of AWS Cognito, it is possible to define specific policies depending on the authenticated user. HTTP requests are sent to the URL of the backend. The API GateWay is the endpoint of the backend requests and it triggers a lambda for each request. In the backend lambda, the request is processed and produces some payload that is sent to each subscriber aws-iot-device-sdk enable to connect to the AWS IoT Thing and to publish payloads in topics. AWS Cognito is not mandatory and the connection could be established with certificates. Those certificates are attached to policies that define the authorized actions and topics. The IoT MQTT broker send back the payload to each subscriber through MQTT

Advantage

AWS IoT provides a broker out of the box with SDK, it only requires a bit of configuration to be used to publish and subscribe

Limits

The frontend authentication is only optimized for Cognito and custom authorizer does not work with a web client. The workaround consists of requesting temporary credentials with custom policy to STS through the backend every hour.

While IoT is not designed for web, it could be used to implement web real-time but it twists the original goal of the service.

How to implement

There are several articles that use IoT to implement their real-time serverless app. I created a POC following those:

Use case

You can use IoT to implement the real-time feature of your application if you want to use Cognito or if you don’t need to restrict access to the subscription channels according to your client profile. If the channel your client should have access to depend on business roles, I recommend you to use one of the other solutions of this article.

Keep also in mind that AWS IoT is not firstly designed for the web, using it to implement real-time feature means twisting it, with the risks that entail.

Other interesting sources

2. AppSync

AWS AppSync is a high-level service that provides a serverless GraphQL API. One of its main features is to implement GraphQL Subscription. Contrary to IoT, it’s a service designed to implement real-time in serverless.

How does it work

AWS provides a schema to explain an AppSync architecture:

AppSync automatically creates a GraphQL endpoint from the schema you provided to it. Then you can configure GraphQL resolvers to use several data providers:

DynamoDB, Aurora RDS or ElasticSearch : It directly links AppSync to a data storage. There is no code to provide. AppSync generates migrations from the schema to keep your data storage architecture up to date with the schema. Then it will resolve the GraphQL queries, migrations, and subscriptions directly in the data storage. If your business rules are implemented in your frontend and you only need your backend to store and retrieve data, I would recommend this implementation.

: It directly links AppSync to a data storage. There is no code to provide. AppSync generates migrations from the schema to keep your data storage architecture up to date with the schema. Then it will resolve the GraphQL queries, migrations, and subscriptions directly in the data storage. If your business rules are implemented in your frontend and you only need your backend to store and retrieve data, I would recommend this implementation. Lambda : It enables you to create completely custom resolvers. The Lambda is called with a message which is customized according to the corresponding resolver configuration file. It generally includes at least the name of the GraphQL operation and the payload. Then you can implement resolver which processes the data according to your business rules and/or stores your data in services not natively supported by AppSync

: It enables you to create completely custom resolvers. The Lambda is called with a message which is customized according to the corresponding resolver configuration file. It generally includes at least the name of the GraphQL operation and the payload. Then you can implement resolver which processes the data according to your business rules and/or stores your data in services not natively supported by AppSync HTTP: It enables you to call other APIs to resolve your GraphQL actions.

It enables you to call other APIs to resolve your GraphQL actions. None (or Local PubSub): It’s used to implement a PubSub without persisting data. It’s useful if you have some information you want to share in real-time with all your clients but that doesn’t need to be stored.

All these data providers can be used in the same application thanks to separate resolvers configuration files. For example, you can choose to use the DynamoDB to directly resolve your simplest routes and implement a Lambda to process the data of a more complicated route. AppSync is very flexible.

Advantages

It’s a serverless solution designed to build real-time web app with GraphQL. Clients side, you could use Amplify or Apollo to create your real-time app with GraphQL subscriptions.

It’s designed to be easy to use: providing GraphQL schema and resolver templates is enough to generate quickly an available GraphQL backend that supports GraphQL subscriptions.

And you can also implement complex backends with the Lambdas which enable you to implement custom resolvers.

Limits

The authorization types handled by AppSync are API KEY, AWS IAM, OpenID Connect and AWS Cognito. There is no proper way to implement custom authorizers. The workaround is to create a pipeline of resolvers where the first resolvers is a Lambda which checks the authentication.

Moreover, it’s at a high level and completely dependent on AWS solutions. It creates a vendor Lock-in and there are probably some other features than Authentication which could not be custom implemented.

How to implement

These articles explain how to implement a real-time app with AppSync:

Use case

I think AppSync is the best way to use GraphQL in serverless and I think GraphQL subscriptions are a very easy and good way to implement real-time in web applications. So I really recommend you to use it if you can use Cognito as the authentication provider and you are not afraid to dive into a full AWS way of developing.

Other interesting sources

3. Websockets API GateWay

It’s also possible to create our own broker through WebSockets using an API GateWay V2, some Lambdas and storage service.

The GateWay will keep the WebSocket connection with the web client and forward the messages to Lambdas. Then the Lambdas process the messages and send messages to the client through the WebSocket through API GateWay SDK knowing its WebSocket id.

To do so, the subscriptions which are composed of a WebSocket ID and a subscribed topic must be stored in a storage service such as DynamoDB

There are basically two ways to send a message back to the client: directly when processing the request or through a dedicated Lambda.

3.a HTTP request + publisher Lambda

This solution creates an autonomous broker composed of a subscriber Lambda which saves the subscription data in a DynamoDB table and a publisher Lambda which sends a message to the subscribers to a topic.

How does it work

0. First, the clients establish a WebSocket with the API GateWay

1.a The client subscribe to some topics by sending a message through the WebSocket (or directly when establishing the WebSocket connection via the $connect event if no payload is needed to choose the topic)

1.b A subscription Lambda is triggered on the $default WebSocket route (or $connect) and create the subscription: the WebSocket connection id of the client and the topic he wants to subscribe

1.c The subscriptions are stored in a DynamoDB table

2.a Then a client makes a request which affects other clients

2.b This request passes through the API and trigger the corresponding Lambda

2.c The Lambda processes the request (eg. store some entities in a database) and produces a result payload which should be sent back to multiple clients

2.d The Lambda uses a service to trigger the publisher Lambda with a message composed of the payload and the topic matching of the clients who need to be updated.

2.e The services available are : SNS: the message is published in an SNS topic (it’s not the topic of the subscription of the client but an internal topic). The publisher Lambda subscribes to this topic and is triggered by the message. SQS: the message is sent in a queue and is consumed by the publisher Lambda. The Lambda could consume a batch of messages. DynamoDB streams: the publisher Lambda listens to the stream of a DynamoDB table and is triggered when an element is inserted, modified or deleted. It’s very convenient if the process of the request involved to store some data, the publisher Lambda could be triggered with the same action and the topic could be deduced from the table name and operation.

2.f the publisher Lambda retrieves the subscriptions corresponding to the topic

2.g For each subscriber, the publisher Lambda tell to the API GateWay to send the payload in the corresponding WebSockets

2.h the clients receive the message and update.

PS: It doesn't appear on the schema but you should add an unsubscriber Lambda which listens to the $disconnect event and remove all the subscription linked to the disconnected WebSocket from the database

Advantages

It’s a bespoke solution that is adaptable to each project.

It’s possible to implement custom subscription rules based on a custom authentication process.

The publishing process is factorized to avoid code duplication

Limits

The communication process between two Lambdas through SNS, SQS or the DynamoDB stream is slow (SNS and SQS: 200ms, DynamoDB stream: 400ms). If you need to notify your clients instantly, use the solution below (3.b).

If you put your Lambdas in a VPC to communicate with an RDS or an ElastiCache, you will need to set up a VPC endpoint to send messages to SNS, SQS or DynamoDB, it adds some complexity in your architecture and creates some further latency.

How to implement

With a DynamoDB stream, you can follow this article: How to build real-time applications using WebSockets with AWS API Gateway and Lambda

With a DynamoDB stream and GraphQL, there is a node.js package with an example

With an SNS and GraphQL: Coming soon, I’m working on it.

Use case

If you are working with a complex architecture with many real-time features, I recommend you use this solution to externalize the broker as a service. The data flow will be clearer and it easier to add other real-time features.

If you have a simple architecture (API + Lambdas), adding a transition service (SNS, SQS or DynamoDB stream) will add some unnecessary complexity in your backend. In that case, I recommend you to use the next solution.

Other interesting sources

Variant: if you don’t need to be synchronously notified when your request has just been processed (if asynchronously receiving back the payload from subscription is enough), you can get rid of the HTTP API GateWay and only use WebSocket message in your architecture :

3.b Publish in HTTP processor Lambda

This solution sends back the payload to each subscriber directly in the Lambda which produces its. Compared to the solution 3.a there are no more other services to communicate between Lambdas because all that is needed to process a request is contained in a single Lambda.

How does it work

0. First, the client establishes a WebSocket with the API GateWay

1.a The client subscribes to some topics by sending a message through the WebSocket (or directly when establishing the WebSocket connection via the $connect event if no payload is needed to choose the topic)

1.b a subscription Lambda is triggered on the $default WebSocket route (or $connect) and create the subscription: the WebSocket connection id of the client and the topic it wants to subscribe

1.c The subscriptions are stored in a DynamoDB table

2.a Then a client makes a request which will affect other clients

2.b This request passes through the API and trigger the corresponding Lambda

2.c The Lambda process the request (store some entities in a database or whatever) and produce a result payload which should be sent back to multiple clients

2.d The Lambda retrieves the subscriptions corresponding to the topic

2.e For each subscriber, the Lambda tell to the API GateWay to send the payload in the corresponding WebSockets

2.f the clients receive the message and update.

PS: It doesn't appear on the schema but you should add an unsubscriber Lambda which listens to the $disconnect event and remove all the subscription linked to the disconnected WebSocket from the database

Advantages

Bespoke solution, adaptable to each project.

Possible to implement custom subscription rules based on a custom authentication process.

Quick notification of your subscribers: compare to solution 3.a, the notification is instantly sent to each subscriber without the latency of services such as SNS, SQS or DynamoDB stream

Limits

The process is synchronous: the HTTP response waits for each subscriber to be notified before being returned. If you have a lot of subscribers and if your client waits for the HTTP response to go on, it could affect the user experience.

It's not scalable on the number of subscribers: the Lambda execute a for loop on these subscribers. If you have too many subscribers, it will timeout the lambda. You can configure this timeout but manually.

If you have several features with real-time, you need to share or duplicate codes between your Lambdas. I know that some serverless frameworks implement sharing content between Lambdas (for example Architect) but it’s not widespread yet.

It doesn’t follow the “your Lambda should do one thing” good practice. At high scale, it’s probably more efficient to use the 3.a solution.

How to implement

Use case

If you are working with a simple architecture (API + Lambdas), I recommend you to implement this solution which quickly delivers your real-time message to your clients without complexifying your architecture.

However, be careful if you have a lot of subscribers to a single topic it could timeout the lambdas if it's not well configured.

Variant: if you don’t need to be synchronously notified when your request has just been processed, the same variant as the 3.a one is possible. You can get rid of the HTTP API GateWay and only use WebSocket flow in your architecture:

How to choose

Here are my recommendations to choose between these solutions:

Please keep in mind that AWS IoT is not firstly designed for the web, using it to implement real-time feature means twisting it.

Some other articles which could help you to choose:

To sum up, I recommend using AppSync to implement real-time in serverless. But as a high-level solution, you can’t do whatever you want. If you need more freedom, use the WebSocket API to create completely custom real-time architecture.

I only have explored solutions with AWS which is the leader in Serverless but the other Cloud providers also have their serverless platform and probably some solution to implement real-time. You should also consider some external solutions to deal with your real-time feature alongside with your serverless architecture (such as Pusher or PubNub).

Serverless is very new and the architecture possibilities evolve as quick as AWS add serverless features to its services. Feel free to suggest updates or challenges in comments!