Using AWS lambda for cheap S3 content processing

Hi there! This has been deprecated in favor of an even easier way to integrate with S3, find out more about it here.

TL_DR: If you use Amazon S3 to store user generated content, their new service, AWS Lambda , can be easily setup to intelligently post-process objects by calling out third party services and even handle callbacks (using API gateway ) - all for a price point that it's really hard to beat.

Albeit the example below uses Scanii.com content processing service you could easily swap that service out for anything with a sane API, for example, you could use AWS lambda to automatically OCR your S3 objects using Google's cloud vision API.

In this example we're going to:

Setup a AWS lambda function that runs every time a S3 object is created/updated. Wire a sample function that submits the S3 content to scanii.com safely (using signed URLs) for asynchronous processing with a callback. hook. Wire an API gateway endpoint to process the callback, verify authenticity, and take action (in this case delete the content) if malware is found.

In a nutshell, here's how things will look:

Background

Protip: If you are familiar with AWS Lambda and S3, feel free to skip this section.

If you know me, you know that I believe S3 is probably one of the most impactful pieces of technology infrastructure of the last decade and no, it’s not due to its price point and novelty factor (netapp built an entire business around storing blobs of data in the 90's), but because of how ubiquitous it is, having something effectively omnipresent that you can use to store your files is just too convenient, that's why just about every company use it and I would venture to say that, if AWS were to publish these numbers, we would see that S3 is probably their biggest product in terms of penetration.

Inevitably, once all your files are being stored in S3, you are going to want to do something with them and, until recently, you had to write_deploy_manage your own application to do so, but not anymore, the clever folks at Amazon have a better solution for us, lambda functions.

Lambda functions are snippets of code that run based upon event triggers, these triggers can be an API call flowing through AWS API gateway, changes to an object in S3 or many others. Under normal circumstances, AWS lambda can also be extremely cost effective since, instead of paying per cpu/hours like a regular virtual server, you pay for the number of times your function is called and the amount of CPU time it consumes rounded up to the nearest 100 ms.

More importantly, as of this writing, AWS lambda comes in with a very generous free tier covering 1M requests and 400k GB-seconds of compute time per month that should be more than enough for most users (details here).

Setting things up

For the lambda function code we will utilize scanii's lambda sample code (https://github.com/uvasoftware/scanii-lambda) that automatically deletes the content from S3 if malware is found and can be easily extended to do perform other operations. For this example you will also need a free API key from scanii.com, if you don't have and API key yet, you can quickly create one here.

IAM Role

First we need to create a IAM role that grants our lambda function access to S3 (we'll start with read only access only for now) and basic lambda execution rights (so we can save logs to Cloudwatch logs).

Login to the AWS console Identity and access management page Click Roles Select "Create New Role" Give your role a name, from this example we'll use "scanii-lambda-role" and click "next step" Under Select Role Type choose "AWS Lambda" and click "next step" Under Attach Policy choose "AmazonS3ReadOnlyAccess" and " AWSLambdaBasicExecutionRole" then click "next step" Finally select "Create Role"

Lambda function

Now we need to actually setup the lambda function and its events sources.

Login to the AWS console lambda page and click "Get Started Now" (or "Create function" if you already have existing functions) Under Create function select “Author from scratch“ Fill in the name, description and select Node.js 8.10 for the runtime, for this example, we're calling our function scanii-process-content Under Role leave it as “Choose an existing role” and under Existing role select the role we created earlier Click on “Create function” Under Function code Copy and paste the contents of the sample function into the code window under index.js (you want to delete whatever content was there before) Under Environment variables Enter SCANII_CREDS for the key with value equal to your scanii API token in the format KEY:SECRET Under Basic settings bump up the timeout to 30 seconds Click “Save” at the top of the page to save these settings

Configuring event sources (aka triggers)

Now that we have a lambda function with a IAM role ready to go, we need to configure when and how that function should run.

Under the Add Triggers Click “S3” Under Configure triggers Under Bucket select the bucket you would like to have objects processed Under Event type select "Object Created (All)" Leave prefix and suffix empty unless you have specific need to restrain processing Click “Add”

Now we basically do the same to configure the lambda function to fire when our callback is called

Under the Add Triggers Click “API Gateway“ Under Configure triggers Under API select “Crate a new API” Under API name enter scanii-process-content-api Under Deployment stage enter “Prod” Under Security select "Open" (but our code will enforce authorization) Click “Add” and take note of the API endpoint URL Click “Save” at the top of the page to save these settings Now under API Gateway Click on “Details” and copy down the “Invoke URL” (that’s our brand new API gateway URL)

The last step is to configure our new lambda function to use that API Gateway URL

1. Click on the “scanii-process-content” or whatever else you called your function

2. Under Environment variables

1. Enter CALLBACK_URL for the key with value equal to the API Gateway trigger URL you copied above

3. Click “Save” at the top of the page to save these settings

and you are done!

Processing content

Now that you have your lambda function and API gateway callback setup you can start adding content to the bucket you chose as the event source and you should see content being submitted for processing. Keep in mind:

Lambda functions send logs to Cloudwatch, you should be able to go there for troubleshooting

You should see the processing requests showing up in your scanii.com dashboard

Enabling S3 object deletion

You might remember that when we set up the IAM role for our lambda function we only gave it read access to s3. Albeit that's a good (and safe) place to start, once you are comfortable with your lambda setup you can modify the role to grant it delete rights to the bucket you've setup for processing, that way, our sample code will automatically delete objects with malware findings (details here).

Login to the AWS console Identity and access management page Click Roles Click on the role you created above Under Permissions click on "Inline Policies" and create a new one Under Set Permissions select "Custom Policy" and "Select" Under Review Policy paste the sample policy below adjusting the bucket name accordingly and click "Validate Policy" Once the policy is validated click on "Apply Policy"

Sample policy granting object delete rights on bucket "scanii-test"

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::scanii-test/*" ] } ] }

Now, for a final and glorious test, copy a known malicious file to your s3 bucket and watch it automatically disappear after a few seconds. Don't have a known malicious file handy? You can download our sample EICAR file here

That's all folks, I hope you enjoy everything you see here and if you have any questions/comments please reach out to us at ping@uvasoftware.com

References

Change log

2018.03.13 - Updated content to reflect node 8.10 requirement and UI changes

2016.10.12 - Updated content to account for the latest scanii-lambda code that requires node version 4

2016.9.9 - Updated content to account for AWS's UI changes around event sources

Last updated on 03/13/2018.