“You have memories with Veera and 4 others to look back on today.”

Apart from the name, we’re all familiar with this sentence. To us, it’s a reminder of the memories we made on Facebook last year, but in the technical realm, it’s a simple notification. This is just one type of notification among a sea of others, which every app uses. Some are time sensitive notifications, like the OTP messages we receive, some are about events, discount codes and some reminders of inactivity or incomplete actions on the app. However, every notification plays a crucial role in the engagement and usability of a product.

At Toppr, we send millions of notifications everyday (no spam, promise) to our Android and iOS app users with the help of Firebase Cloud Messaging (FCM) service. The notification can either be sent to inform users about new comments or likes they have received on their activity, or any messages sent by our tutors on our Doubts on Chat app. While the other notifications are not time critical, it’s crucial for the chat message notifications to be delivered on a priority basis.

Over the last few months, our user base has been increasing at a faster pace than anticipated. Due to this, the push notifications to our users were sometimes delivered with a slight delay. This delay was affecting user experience, so we knew it was time to put on our thinking caps and find the problem.

So far, we had primarily been using two methods of sending push notifications:

Asynchronously via SQS: In this approach, our internal APIs would enqueue the notification payload to a AWS SQS queue, which were then consumed and delivered to FCM by our SQS workers. As with any other queue based system, this method isn’t reliable for real time delivery of notifications. The messages might get stuck in the queue if the rate of consumption by the workers is slower than the rate at which the notifications are pushed to the queue.

Synchronously within our API requests: The SQS based approach couldn’t always deliver the chat messages in a reasonable amount of time. So, we decided to skip SQS for priority messages and instead send the notification payload to FCM directly, within the lifecycle of our internal APIs. While this ensured the timely delivery of the notification, this added an overhead in the request lifecycle and increased the average response time of our APIs. The increase in response time meant that we had to run additional application instances to meet our API throughput demands, which required more machines and additional infrastructure costs. Another drawback of this approach was that, we could not reliably handle FCM request failures as retrying FCM requests would further increase our API response time.

Since both these approaches were not scalable, we decided to rethink our approach to notification delivery. After a few discussions, we came up with a list of requirements that must be met by our new notification delivery service:

Distributed and fault tolerant: The service must be distributed in the sense that, a single unit of the service should process only a single notification object at a time. Another prerequisite was that, the failures occurring in one unit should not impact the working of the other units within the service.

The service must be distributed in the sense that, a single unit of the service should process only a single notification object at a time. Another prerequisite was that, the failures occurring in one unit should not impact the working of the other units within the service. Scalable: As we are expecting the numbers of notifications to increase over time, the system should be able to scale to meet our increasing requirements.

As we are expecting the numbers of notifications to increase over time, the system should be able to scale to meet our increasing requirements. Soft real-time : The system should be able to deliver the notifications as quickly as possible, and the delivery time should not be impacted by the number of notifications.

: The system should be able to deliver the notifications as quickly as possible, and the delivery time should not be impacted by the number of notifications. Flexible and extensible: The system should be able to handle any kind of notifications and should allow us to easily add support for new destination services, like a websocket.

After a thorough evaluation of various solutions, we realized that an event-driver application platform like AWS Lambda, Google Cloud Functions or Azure Functions, would be the best candidate for running our service as they meet all of the above requirements. We decided to pick AWS Lambda, as most of our existing applications are already running over AWS infrastructure.