Where Lambda falls short

Unless you’ve been living under a rock, you’ve heard about the Cloud Computing movement. I’ve made the case before that functions-as-a-service, and ‘Serverless’ in general, are the next step in that evolution in utility based computing. But are Serverless technologies ready for prime time? Specifically, is AWS Lambda ready for mass deployment? I recently had to roll out a small Python-based service using AWS Lambda, and as a result of that experience, I’d say Lambda still has a few shortfalls that need to be addressed.

#awslambda turns 2! Hard to believe it's been two years since we launched the AWS Lambda preview at re:Invent 2014. And much more to come! — Tim Wagner (@timallenwagner) November 14, 2016

There’s a belief in the AWS community that writing Python for Lambda is just like programming Python in any other environment. But, the more I used Lambda, the more I found myself working around its ‘self contained’ environment.

For example, consider the popular Lambda example of image processing. If you’re processing images in Python, you will eventually want to use a Python library that depends on native code. If your native code isn’t available in Amazon’s default AMI that Lambda uses, you’ll have to package the native code yourself. I’m even aware of projects who ship entire Python wheels for Lambda because of how many static libraries they have to build and link to get their projects working under Lambda. While writing this post, I discovered there is an entire project dedicated to providing various popular libraries, pre-built for use within Lambda.

While native code is just one example of a Lambda shortfall (perhaps this is a Python shortfall in general), there are others – such as the lack of /dev/shm that prevent the use of some popular concurrency libraries like the multiprocessing module.

In addition to this quirky Python environment, Lambda can be quite unpredictable on failure. AWS has given you a service that runs your function with unlimited parallelism that you can’t constrain. If you subscribe a Lambda function to S3 or SNS, and generate thousands of events, your function will run thousands of times, until it hits Lambda’s pre-defined limits. You can’t set any limits of your own (e.g. “run 5 of these max, at a time”). Once you hit errors (and you will, since everything but Lambda has been rate limited so heavily), you’ll read their documentation and find sentences like, “may be retried up to three times.” In my experience, the number of retries and the delay has been painfully unpredictable, to the point where catching all errors and implementing our retry was more reliable and predictable.

And if your code uses any other AWS APIs (such as EC2), you’ll quickly hit AWS’s API rate limits which can be throttled on a per region basis, but are basically unpublished and subject to change at any time and for any reason (and it does vary considerably) . This is in addition to the actual API calls themselves, which (typically in AWS APIs) don’t offer the ability to sort or count results server-side, meaning you’re probably going to be doing a lot more API calls than you really should need.

Lambda also has a retry policy that you cannot disable or configure, that will retry your function, “up to three times,” that even has the inventor of the boto library a bit puzzled (and check out the response linking to a forum post where Lambda features are rolled out without notice or changelog):

Is the current retry policy in AWS Lambda documented anywhere? #awswishlist — Mitch Garnaat (@garnaat) December 4, 2015

Finally, let’s go back to packaging and deployment of Lambda code – and discuss the deployment model. Lambda functions can be versioned, but there’s currently no nice way to deploy multiple Lambda functions as a group. You could easily get into a situation where you have 19 functions deployed for a new version of your application, and fail to deploy the 20th function – leaving you in a weird state where you’re 95% upgraded and unable to proceed due to something unexpected, from AWS region issues to a networking issue in your own data center.

I think it’s important to remember that Lambda is simply giving developers a building block. There are frameworks (Serverless) and tools (kappa, lambda-uploader) that are already trying to address many of Lambda’s shortcomings like deployment and packaging and better versioning, but I think we’re still barely scratching the surface when it comes to what you’d really need to deploy an enterprise production application on Lambda. Even WHO?? the brand new Serverless conference recently had a presentation on why you might be able to get away with an entire serverless application without using Lambda.

To reiterate, this not to say that Lambda is all bad. It’s simply too complex (native libraries), leaves too many things undefined (behavior and errors), and doesn’t offer enough tunables right now (user-defined limits). Perhaps the usual product announcements at re:Invent will improve on some of these items above; surely it would be trivial to support Python 3+ and the Windows platform, too.

As Lambda turns 2 today, I’ll leave you with this quote from an amazing presentation by Charity Majors at the recent Serverless Conf events:

“You can outsource work but you can’t outsource caring” – @mipsytipsy

I can’t emphasize this point enough. If you care about error handling, timeouts, native libraries, packaging & deployment, or anything else that Lambda may not do well, you should prepare to invest a lot of time into those items, using Lambda as only the starting point.