Migrating to Native DynamoDB Autoscaling

As with most things, moving to a new way of doing things wasn’t quite a matter of checking a box. While the AWS console is literally a checkbox on each table to enable auto-scaling, we have 150 tables. And our dev teams regularly add new tables so we want autoscaling automatically enabled for new tables. Oh and we’d like to know when tables are scaling.

It turns out that while it’s a single check-box in the AWS console, like a lot of AWS features, there’s quite a bit going on behind the curtain. DynamoDB autoscaling relies on a feature called application autoscaling (this is also used for ECS autoscaling) while requires configuring a few things to make this work. Enabling this on 150+ tables was going to be some work, so we created a small utility to enable/disable autoscaling on groups of tables. This was a great learning exercise as it allowed us to see exactly how autoscaling worked behind the curtain.

Here’s what a table looks like now we have autoscale enabled for read:

Always Autoscale

After enabling autoscaling on the existing tables and making sure it works as expected, we wanted to make sure that all new tables have autoscale enabled on them. We currently create tables using cloudformation as part of our deployment pipeline so the first thought was to have dev teams add the autoscaling option to the templates. However, the amount of extra bits and pieces to add per table (targets, scaling policies) is pretty ornery (especially if your table has indexes). So we crated a lambda function hooked up to a Cloudwatch event. This is triggered whenever a new table is added and checks if autoscaling is enabled for the table. If not, it turns it on.

The interesting learning experience here was how granular you can get with Cloudwatch API events. Take a look at the event pattern:

{

"source": [

"aws.dynamodb"

],

"detail": {

"eventSource": [

"dynamodb.amazonaws.com"

],

"eventName": [

"CreateTable"

]

}

}

Here, we’re able to have the event fire a target if it’s coming from dynamodb, only for the CreateTable API action. When this condition is met, the Lambda function is executed which does the leg work to ensure that autoscaling is enabled on “this” table and/or indexes.

See the Scaling

Our legacy table autoscaling tool used to send us an email when it was manipulating throughput. This was handy really just as a “is this really working?” notification. Native DynamoDB autoscaling documentation talked about there being an SNS topic we could subscribe to but after talking to support, it seems this doesn’t actually exist. And who wants lots of email these days?

We use Slack. A lot. So what if we could get those notifications into a slack channel? Turns out, we can use a similar strategy to capturing the new table events using Cloudwatch events. In this case, we want to know when autoscaling has triggered an update to the table. AWS support gave us a pointer to what the event rule would look like which ended up being:

{

"source": [

"aws.dynamodb"

],

"detail": {

"eventName": [

"UpdateTable"

],

"userAgent": [

"application-autoscaling.amazonaws.com"

]

}

}

Here, we’re able to have the event fire a target if it’s coming from dynamodb, only for the UpdateTable API action and only if the service triggering the event is application-autoscaling. When this condition is met, a Lambda function is executed which formats up a message to post to slack. The Slack output looks like

We’ve found that with a large number of tables, DynamoDB autoscaling scales a lot. So we have a dedicated Slack channel for this bot to post to so our main channels are not full of autoscaling goodness

There is…Just One More Thing

We’ve written before how we backup larger number of DynamoDB tables using EMR. With this process, we actually “spike” the provisioned read throughput of a table before we dump the content so we can backup the table in a reasonable time. Enabling autoscaling on this table just caused the throughput to be lowered right away. So we needed a way to pause autoscaling before we backup the table. This would be a great feature in DynamoDB but it’s not there today. What we ended up having to do was modify the scaling policy for each table before we back it up to set the minimum throughput to be our spiked value. You can see this change with this commit. It would be great if there was some simple way to use pause autoscaling on a table though.

Referenced Projects