By Scott Morris

We use Chef a lot at Opsline for config management of infrastructure deployed in the cloud. In our typical configuration, EC2 instances are launched via Autoscaling Groups and are configured by chef-client on first boot. The Chef client process registers the new instance with the Chef server — creating Chef “node” and “client” objects — and creates a user-friendly Route53 record to address the new instance.

But what happens when…

When the instance is terminated, the instance’s Chef node object, client object, and Route53 record(s) still exist. To solve this, we created a tool called Lambda-Mop. This tool handles the deletion of the chef node, client, and Route53 record(s) whenever an instance is terminated by the Autoscaling Group OR fails bootstrapping after launch.

How it works

Lambda-Mop utilizes several AWS services:

AWS Lambda — The primary component of Lambda-Mop is a Lambda python function. This function is triggered one of two ways:

when an Autoscaling Group terminates an existing instance and sends the autoscaling:EC2_INSTANCE_TERMINATE signal, along with the instance id, to the configured SNS ScalingNotifications ARN. when a newly launched instance fails during bootstrap, it may not have registered its instance id with Chef server, thus preventing looking up the node/client objects by instance id. In this case, we have a small Python script (downloaded to the instance during bootstrapping) which will detect the bootstrap failure, via cfn-signal , and send a custom event to the SNS topic of type lambda_mop:DELETE , containing the relevant Chef node name and/or IP address(es). This info is then used by Lambda-Mop to delete the matching Chef objects and Route53 record(s).

SNS — An SNS topic receives signals from the Autoscaling Group or the EC2 instance itself. Lambda-Mop is subscribed to this SNS topic.

KMS — In order to communicate with the Chef server, Lambda-Mop needs a private key already registered with the Chef server. We encrypt this private key using KMS and store it in an S3 bucket, from where it’s downloaded and decrypted by Lambda-Mop.

Setting up the infrastructure

Create an Autoscaling Group for instances. We use CloudFormation to setup the ASG, along with the notification signals configuration.

"AutoScalingGroup": {

"Type": "AWS::AutoScaling::AutoScalingGroup",

"Properties": {

"MinSize": { "Ref": "AutoScalingMinSize" },

"MaxSize": { "Ref": "AutoScalingMaxSize" },

"DesiredCapacity": { "Ref": "AutoScalingDesiredCapacity" },

"NotificationConfiguration": {

"TopicARN": { "Ref": "ScalingNotificationsTopicArn" },

"NotificationTypes": [

"autoscaling:EC2_INSTANCE_LAUNCH",

"autoscaling:EC2_INSTANCE_LAUNCH_ERROR",

"autoscaling:EC2_INSTANCE_TERMINATE",

"autoscaling:EC2_INSTANCE_TERMINATE_ERROR"

]

}

… (further ASG configuration)

}

2. Create the SNS topic (referenced as ScalingNotificationsTopicArn in the above ASG configuration)

3. Create the Lambda function that is triggered by messages to the SNS topic

4. Create a KMS key for encrypting the Chef private key

5. Use Chef API to search the Chef server for the instance

6. Use Chef API to delete the instance’s node and client objects from Chef server

7. To support the edge case of a new instance failing during bootstrapping, use a script to signal to Lambda-Mop that the instance should be removed from Chef server and Route53