Use of serverless computing services has continued to grow since our last post on Amazon Web Services Lambda in early 2017. Despite the name, serverless technology still involves plenty of servers, containers, and Linux—as a developer, you just don’t have to think about those details. In this new environment, New Relic continues to look at data and best practices around how to measure performance.

In our previous investigation of Amazon Lambda performance, we found that cold starts—the increased latency that can occurs when functions are run for the first time or if they haven’t been invoked recently—was not very significant for a small, simple function using the Node.js runtime. Using a new data source available from AWS, however, we can now better understand the Lambda service architecture and use that data to gather new metrics for a deeper performance investigation of a larger, more complex function.

This post provides some additional background on Lambda cold starts and describes some performance and pricing lessons learned in writing serverless functions.

Understanding cold starts with traces

A key benefit of serverless compute architectures is that developers can focus on code instead of where or how that code runs.

When it comes to Lambda, lack of familiar metrics found in traditional applications or instances can make it challenging to understand why the code you write might take unexpectedly long to execute. Is the bottleneck the internal Lambda service that’s responsible for scheduling and executing your function? Is it your function initialization code or your function handler code? New trace data from Amazon X-Ray, which launched in April for Lambda, provides some clues.

If X-Ray debugging is enabled for a function (it’s an option under “Advanced Settings” in the AWS Console), you can now see the amount of time spent in the internal Lambda service alongside the time spent in your function. For a function that is experiencing a cold start (in this example named “lambdium”), a successful 7.9s invocation from the AWS Console might look something like this:

Notice how the function code, listed under AWS::Lambda::Function, starts executing almost 2 seconds after the internal Lambda service, listed under AWS::Lambda, receives the request. Before the function handler code executes, there is an also additional 200ms of “Initialization”—that’s the time it takes the function to load its dependencies and run code outside of the handler function.

If we compare with the same function being invoked a few moments later, the results are different—and much faster:

In the second “warm start” case, the function handler code starts almost immediately after the service receives the request to invoke the function—and no initialization occurs. Minimizing time spent in the Lambda service and avoiding function initialization on requests can have significant performance implications.

Converting traces to actionable metrics and alerts

Traces from X-Ray and log data from Cloudwatch can be useful for debugging individual function runs, but it’s not ideal for understanding how functions are performing over time. In particular, knowing trends in function initialization or time spent waiting for the Lambda service can be very useful operational and alerting metrics. For cold starts, it’s also useful to have an idea of how often they are happening.

As of this writing, function initialization or internal Lambda service latency are not available natively as Cloudwatch metrics, so we wrote some code designed to parse the raw X-Ray trace data in near real-time and pull out interesting numbers for analysis. Naturally, the architecture that pulls data from the X-Ray API is itself serverless. The design of this prototype trace collector function (available on GitHub) looks like this:

Finding bottlenecks over many function invocations

Using the method described above, we collected traces over a 24-hour period from a large Node.js function that ran twice every 60 seconds. The raw traces were sent to New Relic Insights as custom events for further analysis. By creating a histogram of service and function latency using these events in Insights, it’s now possible to see where invocation time is being spent:

Not surprisingly, the internal Lambda service that calls the function is often very fast—most of the invocation latency occurs during function execution. Focusing on that function, we notice how it occasionally takes more than 4 seconds to complete executing. Could the delay be caused by slow function initialization? Looking at the distribution of function initialization times suggests that may not be the reason:

For this function, initialization doesn’t seem to contribute much to slow invocation outliers—over a 24-hour period it added around 500ms in the worst case. Initialization also seems to happen very rarely—counting the number of traces with non-zero function initialization times revealed that less than 2% of sampled function invocations had to run initialization code:

By looking at the distribution of function, invocation time, and service latency, we’re able to narrow the focus of our serverless performance investigation away from cold starts and initialization to code running inside the handler function. After increasing the function memory size 50%, the 95th percentile duration of the function dropped from 3 seconds to 2.1 seconds, visualized in a heat map after increasing the memory here:

The performance gain we achieved has interesting pricing implications because functions are billed in 100ms increments. Using the AWS Lambda pricing calculator, for the same number of requests a month, the decreased execution time of the function pays for the 50% increase in memory—the cost of a slower executing function with less memory is exactly the same as a faster function with more memory. In other words, the function was under-provisioned with memory.

Lambda measurement checklist

Tuning Lambda performance is very specific to individual functions, and collecting metrics over multiple days and many function invocations gives the best context. For some functions, invocation time isn’t especially important—for a low-volume automated administration task, an additional 1 second in invocation time doesn’t particularly matter. If a function needs to quickly respond to an event trigger, though—like a user-facing request coming from a mobile app or API gateway—it becomes more significant. For latency-sensitive functions like that, it’s worth asking a handful of key questions:

Where is invocation time being spent? What’s the breakdown between time in the Lambda service versus my function?

What’s the initialization time of my function? Does changing the language runtime or reducing dependencies have any effect?

How frequently is my function running initialization code? Is this a rare event (where the function is usually “warm”), or does it happen often?

If the function has a wide distribution of invocation times, does increasing the memory of the function help?

What’s the size of the deployment archive, and does decreasing the size of the Lambda deployment archive decrease time spent in the Lambda service?

For all the above, creating alerts on these metrics can be important for understanding potential internal service issues or deploys that have a negative performance impact.

The new metrics we explored in this post, function initialization and Lambda service latency, are useful for determining how previously unobservable components affect overall response time. With increasing momentum around using Amazon Lambda for a variety of services and workloads, it’s important to understand how to understand, analyze, and alert on different types of built-in and custom performance data—from logs and metrics to traces—to effectively operate and design fast functions.

Additional Resources