If you’ve ever tried to run operations on a large number of objects in S3, you might have encountered a few hurdles. Listing all files and running the operation on each object can get complicated and time consuming as the number of objects scales up. Many decisions have to be made: is running the operations from my personal computer fast enough? Or should I run it from a server that’s closer to the AWS resources, benefiting from AWS’s fast internal network? If so, I’ll have to provision resources (e.g. ec2 instance, lambda functions, containers, etc) to run the job.

Thankfully, AWS has heard our pains and announced AWS S3 Batch Operations preview during the last AWS Reinvent conference. This new service (which you can access by asking AWS politely) allows you to easily run operations on very large numbers of S3 objects in your bucket. Curious to know how it works? Let’s get going.

Accessing the Preview

If you don’t have access to S3 batch operations preview, fill in the form in this page. It took a couple of days before I got an answer from AWS, so arm yourself with patience.

Getting Started

Now that you have access to the preview, you can find the Batch Operations tab from the side of the S3 console:

Access Batch operations from the S3 console

Once you have reached the Batch operations console, let’s talk briefly about jobs.

Jobs

Central to S3 Batch Operations is the concept of Job. In a nutshell, a Job determines:

In which buckets your objects are located

What operation to do on the objects

Which objects to run the operations on

We’ll soon create our first job. But first, let’s create a test bucket, just to experiment a little with Batch Operations.

Creating the Test Bucket

Before you create your first job, create a new bucket with a few objects. I created a new S3 bucket named “spgingras-batch-test” in which I uploaded 3 files (file1.jpg, file2.jpg, file3.jpg):