At Compass, we want to start doing a better job of storing our infrastructure as code. Managing DNS records via config became our proof of concept for a few reasons.

DNS records are super simple to create, but can have some really disastrous consequences. One bad click in the AWS console can potentially take down our whole website.

We already have a mechanism for maintaining Route53 records in code, though this involved keeping lists of Route53 record objects with logic to directly modify the DNS records.

We are always getting requests from marketing and IT to create DNS records. We want to empower them to create these records themselves but don’t quite trust them to know everything they need to safely make changes. As such, we don’t want them to have to touch code; instead they would just have to add some ‘text’ to a config file.

It’s hard enforce Route53 record creation and cleanup via code. This led to an inconsistent state between what actually exists in Route53 and our list of Route53 record objects.

We need to be able to easily restore infrastructure in case of a disaster.

You all already know what’s up next - CloudFormation or Terraform?

Both Terraform and CloudFormation are free. You will need to pay for the resources that they create. If you want to get more out of Terraform, there are enterprise options. Pricing isn’t listed on the HashiCorp website, but you can call in for consultation and a free trial.

CloudFormation includes a comprehensive API and UI right out of the box.

You must pay for Terraform’s enterprise options to get their full API and UI.

You must pay for Terraform’s enterprise options to get their full API and UI. Different applications (or in this case hosted zones) can be easily organized and isolated by keeping them in different CloudFormation stacks.

In Terraform, you need to manage your own comprehensive directory structure — otherwise, you may accidentally update production when you want to make sure everything is working on dev first.

In Terraform, you need to manage your own comprehensive directory structure — otherwise, you may accidentally update production when you want to make sure everything is working on dev first. Terraform does its bookkeeping in a state file. By default, it’s stored locally, but you can easily configure Terraform to store the state in S3.

CloudFormation hides inner workings of the stack.

CloudFormation hides inner workings of the stack. Terraform has a built in command ( terraform refresh ) that will automatically update the state file with any changes to the infrastructure. Terraform runs this command prior to any operation. You will still need to update the config file, but terraform plan should tell you what changed.

Configuration drift is possible with CloudFormation. AWS recently released Drift Detection , but it’s not supported for every resource yet.

) that will automatically update the state file with any changes to the infrastructure. Terraform runs this command prior to any operation. You will still need to update the config file, but should tell you what changed. Configuration drift is possible with CloudFormation. AWS recently released , but it’s not supported for every resource yet. Terraform has a much nicer way to visualize changes with terraform plan than CloudFormation change sets.

than CloudFormation change sets. CloudFormation deploys automatic rollbacks and allow up to 5 rollback triggers.

Terraform doesn’t have a built in rollback system. Any failures can leave your infrastructure in a weird state where only half of your changes go through.

Terraform doesn’t have a built in rollback system. Any failures can leave your infrastructure in a weird state where only half of your changes go through. Both CloudFormation and Terraform can assume IAM roles, which makes it extremely easy to ensure the stack will only be used to manage Route53 records.

CloudFormation launched a tool, CloudFormer, that is capable of creating CloudFormation templates from existing AWS resources.

What we want to do

We wanted config-as-code for our DNS records, a way to preview changes before publishing, validation that those changes are safe, and to finally publish. All without DevOps involvement.

Our top priority is safety, but what is ‘safe’?

Any non-destructive changes such adding new resources and updates that do not involve deleting the record. We don’t want anyone to simply be able to delete the record for the website.

Well what did we choose?!?

CloudFormation for (nested, rollbackable) deployments.

Trials and Tribulations

Early on, I found that adding all of our DNS record sets under the same RecordSetGroup leads to unclear CloudFormation change set messaging. Any time I created a change set that added or removed DNS records, it would show up as one modify action on the whole RecordSetGroup. There’s no information about what fields or even which records were updated! To show more granular changes, I decided to make every subdomain its own RecordSetGroup.

Logical resource name changes fail silently. CloudFormation performs this update by adding a new CloudFormation resource and then deleting the old one. Sounds reasonable, right? Well, the add action binds the new CloudFormation resource to the pre-existing DNS record, because Route53 does not allow duplicate records in a hosted zone. Then, CloudFormation deletes the actual Route53 record when it performs the remove action. Now CloudFormation thinks the resource exists — the new logical resource name is still being kept track of in the stack, but the resource doesn’t exist in Route53. I can still update the other resources, but there will now be failures when trying to modify the resource (since it doesn’t exist in Route53)

We can either manually remove and then add resources from the CloudFormation template or delete the stack and then recreate it. Deleting the CloudFormation stack sounds super scary, but is actually safe if each resources deletion policy is set to Retain . To avoid this nightmare, we’ll enforce a naming convention for every CloudFormation resource.

Okay, time to create these stacks in production!

Oh no, our config files are too big…

Oh shoot, I forgot to double check AWS’s CloudFormation limits. I glanced at these limits at the beginning of this journey. All of these numbers looked perfectly reasonable back when I was just testing out CloudFormation on my own test hosted zone, but with so many records in the compass.com hosted zone we’ve soared past one limit and are hurtling towards another.

There are two ways to get around this file size limit. The first change I made was to send the template to S3 before updating the stack. This easy (but temporary) fix increases the template file limit from 51kB to 460kB.

A more permanent solution would be to incorporate nested stacks in your template. To create a nested stack, you’ll need to split up your resources into multiple yaml files and then declare them as CloudFormation stack resources in a parent stack.

Findings and Conclusions

I found that both CloudFormation and Terraform are wonderful, powerful tools, but decided to go ahead with CloudFormation. We in DevOps want to stop being blockers and allow anyone to deploy “safe” changes. We cannot risk any infrastructure-as-code deployment leaving the infrastructure in a weird, unexpected state. We should not have to babysit every deployment; our only responsibility should be code review.