Background

AWS Lambda was announced at Re:Invent 2014 as a service that “runs code in response to events and automatically manages the compute resources” necessary to handle potentially millions of simultaneous requests. When it was first released, Lambda was limited to either responding to specific events (objects being stored on S3, SNS notifications, etc.) or direct invocations through the AWS API. Since then, AWS has also launched AWS API Gateway which aims to allow Lambda to serve as the backend to APIs and standard web applications. AWS is slowly removing the need to run managed servers, allowing developers to focus on writing code instead of managing memory, disk space, etc.

CloudSploit Scans

CloudSploit is an AWS security scanning service designed to help developers and IT teams quickly locate and fix potential risks in their AWS infrastructure. This post isn’t going to focus much on the specifics of the CloudSploit service, but the relevant part is that CloudSploit functions as a plugin-based scanning service that queries the AWS APIs and then processes the results to determine the level of infrastructure risk present (for example: open security groups are a high risk while an access key not being rotated may be a medium risk). During a typical scan, anywhere from fifty to a few hundred AWS API calls may be made, each generating scores of results that are categorized and returned. Scans can be initiated both on-demand (from the browser) as well as on a schedule (server-side in the background).

Traditional Infrastructure

When CloudSploit first launched, we ran all of our scans on EC2 instances behind a load balancer. Users would visit the site and initiate a scan which would trigger a bunch of API calls to the ELB/EC2 group, each launching a scan plugin in the background. The results were then returned to the user.

A Single Scan Makes a Lot of Requests

A Single Scan Makes a Lot of Requests

As you can probably imagine, this worked well for a few users, but we immediately saw huge spikes in requests and latency once the number of simultaneous users grew beyond a handful at a time. In our case, auto-scaling did not solve the issue because by the time a new instance would launch to handle more users, it was too late. We needed a solution that could handle one request per second one minute and over a hundred per second the next. Running enough EC2 instances to continually handle the maximum number of requests was not cost or resource effective.

Lambda to the Rescue

Around this time, Lambda was gaining popularity and developers were doing more and more interesting things with it. We decided to check it out as an alternative to our existing scanning setup and were immediately convinced.

Re-writing our existing scan plugins to be compatible with Lambda took only a few changes. We designed them to take the following “event” properties (the event object is the starting point for Lambda functions):

role

external_id

region

plugin

The “role” and “external_id” are provided by the user and designate a cross-account IAM role. When the Lambda function first executes, it assumes a role within the external account by calling the STS API:

var params = {

RoleArn: event.role,

RoleSessionName: 'cloudsploit_scan',

DurationSeconds: 900,

ExternalId: event.external_id,

};



sts.assumeRole(params, function(err, data){

if (err || !data.Credentials || !data.Credentials.AccessKeyId || !data.Credentials.SecretAccessKey || !data.Credentials.SessionToken) {

return context.succeed(createErrorResponse('Unable to assume cross-account role'));

}



// Set credentials

event.access_key = data.Credentials.AccessKeyId;

event.secret_key = data.Credentials.SecretAccessKey;

event.session_token = data.Credentials.SessionToken;



callPlugin();

});

Once the Lambda function has assumed a cross-account role, it can begin executing the plugin against the given account. A complete list of plugins, as well as the source code, can be found here:https://github.com/cloudsploit/scans.

Connecting the API

Now that the scans have been converted to Lambda functions, the bulk of the processing work can be offloaded from our EC2 instances. However, we still needed a way for users to run a scan directly from the browser. To do this, we hooked up our Lambda functions to the new API Gateway service.

Although the service takes a bit of documentation reading to completely grasp, it does provide enough functionality that we can create an API, accept specific POST parameters, pass them to a Lambda function, and then return the results to the user.

Our API Gateway + Lambda Integration

The API gateway then generated a URL that we could use directly within our apps. Within the console we can adjust settings for throttling, access, headers, etc. It’s also possible to assign a custom domain, such as api.example.com, to the API.

Server-Side Scans

Besides on-demand scans, CloudSploit also allows users to have scans run against their account in the background at continual intervals. To do this, they save the cross-account IAM role and external ID within their CloudSploit account and the service will run the scan using these credentials every 36 hours.

Again, we faced issues where there would be drastic spikes in the number of running scans. While we could attempt to run all the scans around the same time so that we could auto-scale ahead of time, we decided to simply leverage our new Lambda service to execute the scans instead.

Now, we have a single micro EC2 instance that periodically checks our database for accounts that haven’t been scanned in the given time range. Each time it finds some (usually 20 or 30 accounts at a time), it invokes all the Lambda functions it needs to execute the approximately 300 plugins (20–30 accounts x 10–15 plugins each). Lambda immediately scales up to handle this demand, returning all the results in the server-side callback which are then saved to the database.

Spikes in the Lambda Console When Scans are Running

By offloading the compute-intensive portions of the scan (requesting temporary tokens, signing API requests, making HTTP calls to the AWS API, waiting for responses, and processing results), we have gone from needing numerous EC2 instances to needing only one.

Open-Source

Everything discussed here can be found in CloudSploit’s open-source repository. Specifically, look for the “lambda.js” file which contains a complete example of a Lambda entry point.