Within a week after releasing a new version of a mobile app on iTunes, I needed to submit the dreaded bug fix to Apple for review and approval.

The latest version of the app included a great new search capability, but the response time was not a great customer experience. Unfortunately, I made some bad assumptions about cold starting AWS Lambda functions.

In the hope of saving you — and your users — some time, here’s a quick review of what I encountered, and a solution that improved response times.

Keeping it simple with serverless

A serverless architecture seemed to be the quickest way to to validate the viability of the product idea. There were a few reasons I went this route.

Speed to the market was the most critical.

I didn’t want to spend precious time on operational overhead.

AWS Lambda is cheaper since you aren’t running dedicated hardware.

I wanted the back-end to be secure, so using Firebase as an identity provider prevented anyone outside of the app from using the API.

The following diagram is how I initially planned for the app to work:

The back-end architecture diagram of the Meal Prep app

Here’s the general flow of the process:

The app authenticates with Firebase Authentication and receives a JSON Web Token (JWT). When the user initiates a search, a request is made to the Amazon API Gateway containing the JWT token. Amazon API Gateway authenticates the token using the Firebase Auth Lambda function. Upon successful authentication, the request will proxy to the GraphQL Lambda function. This GraphQL Lambda function makes the appropriate requests to an ElasticSearch cluster or DynamoDB depending on the query.

Performance issues

Everything was magical after the initial release. Conversions went up, and 5-star reviews were handed out like candy. If only it were that easy.

The following diagram shows the performance problems.

1.9.0 Release Performance

90% of users experienced traces greater than 0.88s. In the absolute worst case it could be as high as 2.8 seconds. No one, including me, is going to wait that long for a mobile app to do something. By the way, if you aren’t using a tool like Firebase Performance, you are missing out on some valuable data.

While I encountered some performance issues while testing, the slow response was only associated with the first request. After researching about AWS Lambda cold starts. I figured user traffic would make this a non-issue. This was a bad assumption.

A cold start occurs when an AWS Lambda function is invoked after not being used for an extended period of time resulting in increased invocation latency.

Solution #1 — Trim the code

I started looking at the metrics, and noticed that the authentication Lambda function was adding a lot of overhead — that the bundle asset was 60MB in size. This was a lot of data to read in memory for AWS Lambdas.

I decided to trim it down. Most of this was a result of the Firebase admin SDK, and I didn’t want to rewrite their logic. Since it was open source, I decided to fork it and gut it. You can find that code here and on NPM.

Result: Failure! The app was still super slow. It wasn’t helping the user experience to keep the back-end authentication, so I decided to disable it in the short term.

Solution #2 — Allocate more memory

Throw money at the problem. I ramped up the memory allowance for the Lambda function. This was easy using the following configuration:

Lambda Settings

Result: Failure! This didn’t overcome the cold start issue. Back to the drawing board.

Solution #3 — Warm the function

The only way to resolve this was to update the app. I was avoiding this at all costs since there would be a delay due to Apple’s review process.

I decided to add an OPTIONS request to the app and server. This request would occur in the background as the user navigates through the app. The background request essentially warms the Lambda function so it’s ready for user interaction.

I packaged this release and sent it to Apple for review. After waiting a few days for approval and gathering data — voila!

Result: Success! The response times improved drastically.

1.9.1 vs 1.9.0 Release Performance

Note that these metrics do not include the OPTIONS request since it’s a background process and doesn’t have any impact on perceived performance.

New solutions = new problems

A serverless architecture is a great way to get to market quickly. However, new solutions add new problems that need to be solved. In the case of AWS Lambda, be aware of the impact of cold starts on user response times.

My assumptions that natural user traffic would make cold starts a non-issue were just wrong. You cannot rely on traffic patterns to deliver a good user experience.

Going forward, I’ve learned that it’s important to take the worst case into consideration. Unfortunately, a majority of users encountered the worst case due to my incorrect assumption about traffic patterns and cold starts.

If you’ve encountered a similar problem or have a different solution, I’d be interested in your feedback in the comments section below — or reach out to me directly on Twitter.