If you’ve used AWS lambda or other serverless supported solution, you may have had to interact with a database while doing so. To interact with your data, the process is pretty routine: open a new database connection, complete one or more queries, and then close the connection— as long as you consistently close these connections there’s little issue around this outside of scaling the other components of your service as you gain more users.

If you’re new to the idea of Serverless, I recommend reading the following article from Martin Fowler here, but if you’re well familiar read on.

At Stavvy, we use lambda functions for asynchronous (non time sensitive) and synchronous (time sensitive) processing of data. We have nearly 100 different functions responsible for a mixture of front end API endpoints and asynchronous data processing, such as slack notifications, audit logs, and emails.

Serverless enables you to focus more on the business logic, rather than where your code is deployed or what hardware is running it. So what are those components mentioned before that might need scaling? In a traditional deployment model, when you encounter a bottleneck or slowdown in your service, it’s typically one of the following breaking down:

CPU — from long running tasks and high throughput demands.

— from long running tasks and high throughput demands. Memory — expensive queries or business logic causing your service to run out of memory.

— expensive queries or business logic causing your service to run out of memory. Network — internet outages or data transfer size causing slowdowns

— internet outages or data transfer size causing slowdowns Software — suboptimal or slow software

— suboptimal or slow software Database (Disk) — too much data for what was provisioned, or too much throughput or connection demands (the focus of this article)

Deploying your application as serverless doesn’t exclude you from the above bottlenecks, but it does simplify a few other things.

Serverless functions require less administrative overhead when it comes to deployment and scaling, but this comes at a cost. Traditional server side deployments of applications are able to more effectively manage connection pools to a database. Establishing a new connection every invocation (or even every query!) is going to slow down your functions.

One recommendation from the AWS forums was creating DB connections outside the handler to enable connection reuse and improve performance. In our experience however, this just delays the issue rather than solving it effectively:

You have less control over how many services or distinct containers are going to be spun up to serve requests so it’s difficult to put a cap on connections. There’s not a great time to close connections opened outside of the connection handler. The inability to close connections consistently, even if there are less of them, requires you to be more careful to avoid connection leaks. No consistent shut down handler on lambda to enforce connections closing.

And because there is no long-running server process, it’s difficult to find an ideal place to save connections for reuse.

To give additional context, this was our testing database which used connections created outside of a handler or individual function:

Connections climbing on one of our testing clusters

We were leaking connections and a dump of connections on the database cluster revealed that all these connections were left in either an idle or idle in transaction state.

Don’t worry about reading this too closely — essentially a bunch of idle transactions or connections leaked and left open on the database.

Interesting — so what are the options here?

Debugging the exact source of the leaks was difficult, but we had a couple immediate solutions that came to mind:

Go through and determine where the idle connections were coming from in our logic, and triage them one at a time. Install a service like PG Bouncer…or set up an external scheduled process to clear or bounce idle connections. Move off of serverless to a traditional server-deployed application with a normal connection pool, or set up an external “always on” server specifically for mediating database transactions.

These still felt a bit like bandaids. PG Bouncer might take you to some scale, but requiring a service clean up open connections is not a resilient solution and you can still max out in short term intervals. PG Bouncer also typically isn’t an immediate option on RDS as you don’t have a root shell on the actual machines running the sql server to be able to install the software normally.

And just finding where the leaks are occurring right now wasn’t a resilient solution for the future if this became an issue again.