If you’re familiar with the idea of multi-region replication, feel free to skip to the Overview section. If you don’t know what multi-region replication is, why it’s important, or aren’t convinced that it is, I’d like you to imagine you’ve just sat down to breakfast in a small cafe. You’ve had a long night and your body is craving some refined sugar, so you decide to order a stack of toast (obvs).

After what seems like ages, your waiter finally returns with the promised mountain of carbs. You groggily reach for a slice, and in a moment of awe, realize that the toast has a depiction of a pug emblazoned on it. A PUG! You nearly collapse into tears of joy – if only you could bring this bliss to other toast lovers across the world! But wait! You remember that once upon a time you spent 4 years in an ivy league university learning the complex inner workings of all things computer.

Time to put that CS degree to work!

Well, first things first, you’re gonna need a place to store all your crispy carb creations. You start to look around at different offerings to see who can host the most toast.

Instagram is a non-starter. It’s saturated with exorbitantly decorated, almost unrecognizable slices of bread, and you can’t have your purist toast creations mingling around with those millennial abominations. I mean, why even is avocado toast? So next, you consider an already existent photo hosting company. But blackmail really isn’t your jam, so you keep searching.

Then finally, after much more glazing around, you discover Amazon’s S3 offering. Unlimited storage? Well defined APIs? And you can serve up your photos with a pluggable CDN? It doesn’t get much butter than that.

So finally, your photos have a home and you start spreading delicious, starchy joy to all corners of the earth. But then, the unthinkable happens! An engineer commits a typo! The seemingly innocuous mistake brings down your toast storage, and also some other, but much less important, crumbs of the internet. Luckily you’ve set up your CDN cache, so your users are still able to access some batches of your toast. However, they’re unable to see any non-cached toast, and you’re unable to upload any fresh carbs.

In order to prevent this lack of toasty goodness in the future, you decide to create a backup bucket for your photos in a different region. You read about Amazon’s cross-region replication functionality and implement a replicated bucket accordingly.

Everything seems to be going well with your new backup bucket. Your user base continues to grow, and soon you’re serving up millions of toast photos a day. However, your CDN can only cache so much data, and your users like variety in their grains. This means that many of your international users are getting cache misses, and are having to wheat for toast from your s3 bucket to travel across the world to them. You now realize that you knead to replicate your toast across multiple, international regions to ensure that any cache misses will only have to travel as far as the nearest international bucket. But as you rye to set up more replicas, you realize that you can’t daisy chain replications or specify more than a single bucket in the native replication configuration.

So what do you dough?

Overview

You roll your own replication! (I promise I’m done with the puns now). By combining Amazon’s S3, SNS, SQS and Lambda technologies, we can create our own replica set. An overview of the system is as follows:

Toast is uploaded to us-east-1 bucket A “write” event trigger sends the write event to an SNS topic in us-east-1 Lambdas from eu-west-2 and ap-northeast-1 that are subscribed to that topic receive the write event, then copy the initially written object form us-east-1 to their respective buckets If a write event fails, its’ acting lambda will write the failed event out to an SQS queue

Now, let’s look at each of these pieces in detail. We’re going to examine each of the architectural sections via Amazon’s CloudFormation template syntax. If you’re not familiar with this, go ahead and take a minute to read up on the basics. Most of the template snippets are pretty straight forward, but it never hurts to understand what’s going on in detail.

Additionally, the following information assumes basic knowledge regarding Amazon’s S3, lambda, SQS, SNS, and IAM offerings.

Architecture

S3

ToastHost: Type: "AWS::S3::Bucket" Properties: BucketName: !Join [ "-", [ !Ref UniqueIdentifier, "toasthost", !Ref "AWS::Region" ] ] NotificationConfiguration: TopicConfigurations: - Event: "s3:ObjectCreated:*" Topic: !Ref ToastNotificationTopic DependsOn: - ToastNotificationTopicPolicy

The first thing we need to spin up are the actual S3 buckets that will be hosting our images. Some notes about what we’re doing in this snippet:

Bucket names must be unique across all of amazon we include a unique identifier parameter in the cloudformation template for this purpose

Bucket names cannot contain uppercase letters, so don’t use them in your unique identifier

We’re triggering events only off of ‘ObjectCreated’ events We could trigger events off of any subset of supported events, but not propagating deletes is a safe first step

We need to make sure the bucket isn’t created until our Topic is, so we force the bucket to wait on the Topic’s Policy (which is created post-topic as we’ll see later on)

SNS

ToastNotificationTopic: Type: "AWS::SNS::Topic" Properties: DisplayName: "TnT" TopicName: !Join [ "-", [ !Ref UniqueIdentifier, "ToastNotificationTopic", !Ref "AWS::Region" ] ] ToastNotificationTopicPolicy: Type: "AWS::SNS::TopicPolicy" Properties: PolicyDocument: # allow s3 to write to this sns topic Id: ToastNotificationTopicPolicy Statement: - Sid: !Join [ "", [ !Ref UniqueIdentifier, "ToastNotificationTopicPolicy", !Ref "AWS::Region" ] ] Effect: Allow Action: SNS:Publish Resource: "*" Principal: AWS: "*" Condition: ArnLike: aws:SourceArn: !Join - "" - - "arn:aws:s3:*:*:" - !Join [ "-", [ !Ref UniqueIdentifier, "toasthost", !Ref "AWS::Region" ] ] - "*" Topics: - !Ref ToastNotificationTopic

We now need to create the topic that our write events are sent to. Some notes about what we’re doing in this snippet:

Topic display names can only be 10 characters or less, which is why we use the ‘TnT’ (DY-NO-MITE) shorthand

The Topic Policy allows the toasthost bucket in the same region as the topic to write events to said topic

SQS

DeadToastQueue: Type: "AWS::SQS::Queue" Properties: QueueName: !Join [ "-", [ !Ref UniqueIdentifier, "DeadToast", !Ref "AWS::Region" ] ]

This one is pretty straight forward. All it does is create an SQS queue with default settings.

Lambda

AWS Lambdas are transient compute units. They are triggered by events (either an action or time event), and spin up compute resources for the duration of running that event through their code. So, for our replication lambda, let’s look at the code snippet first.

Code

import ast import boto3 import botocore import os import urllib def _get_key_exists(bucket, key): try: boto3.resource('s3').Object(bucket, key).load() except botocore.exceptions.ClientError as e: if e.response['Error']['Code'] == "404": return False else: raise e return True def lambda_handler(event, context): s3 = boto3.client('s3') sns_message = ast.literal_eval(event['Records'][0]['Sns']['Message']) source_bucket = str(sns_message['Records'][0]['s3']['bucket']['name']) dest_bucket = os.environ.get('BUCKET_NAME') key = str(urllib.unquote_plus(sns_message['Records'][0]['s3']['object']['key']).decode('utf8')) if not _get_key_exists(dest_bucket, key): copy_source = {'Bucket':source_bucket, 'Key':key} s3.copy_object(Bucket=dest_bucket, Key=key, CopySource=copy_source)

This python code block reads in a trigger-event (a bucket write event in our case), evaluates the event to ensure it’s a bucket-based event, then either duplicates or ignores the image depending on if the image already exists in this lambda’s region (to prevent infinite duplication).

Function

ToastReplicator: Type: "AWS::Lambda::Function" Properties: Code: ZipFile: | DeadLetterConfig: TargetArn: !GetAtt DeadToastQueue.Arn Environment: Variables: BUCKET_NAME: !Join [ "-", [ !Ref UniqueIdentifier, "toasthost", !Ref "AWS::Region" ] ] FunctionName: !Join [ "-", [ !Ref UniqueIdentifier, "ToastNotifier", !Ref "AWS::Region" ] ] Handler: "index.lambda_handler" Role: !GetAtt ToastReplicatorRole.Arn Runtime: python2.7 MemorySize: 256 Timeout: 60

This CF section describes the actual lambda function. Some notes about what we’re doing in this snippet:

For a lambda function you can either provide in-line code, or provide an s3 location for the lambda to pull the code from. We’ve opted for the in-line code here as it reduces overall complexity

The ‘DeadLetterConfig’ directive sets up the event DLQ for this lambda

‘Handler’ refers to the method inside the lambda code block to pass events to

‘MemorySize’ is also directly correlated to compute power: the higher MemorySize is, the higher your compute power will be

Permissions

By default, lambdas have no permissions. This means that we need to explicitly define what our lambda is able to do.

ToastReplicatorRole: Type: "AWS::IAM::Role" Properties: RoleName: !Join [ "-", [ !Ref UniqueIdentifier, "ToastReplicatorRole", !Ref "AWS::Region" ] ] AssumeRolePolicyDocument: Statement: - Sid: !Join [ "", [ !Ref UniqueIdentifier, "ToastReplicatorRolePolicy" ] ] Effect: "Allow" Principal: Service: - "lambda.amazonaws.com" Action: - "sts:AssumeRole" Policies:

This creates the lambda role, and allows the lambda to assume said role. Now we’ll look at the policies under this role.

PolicyName: "ToastNotificationLoggingPolicy" PolicyDocument: # allow lambda to write logs Id: ToastNotificationLoggingPolicy Statement: - Sid: !Join [ "", [ !Ref UniqueIdentifier, "ToastNotificationLoggingPolicy" ] ] Effect: Allow Action: - "logs:CreateLogGroup" - "logs:CreateLogStream" - "logs:PutLogEvents" Resource: "*"

This allows the lambda to write out logging events.

PolicyName: "ToastNotificationDLQPolicy" PolicyDocument: # allow lambda to write to DLQ Id: ToastNotificationDLQPolicy Statement: - Sid: !Join [ "", [ !Ref UniqueIdentifier, "ToastNotificationDLQPolicy" ] ] Effect: Allow Action: "sqs:*" Resource: !GetAtt DeadToastQueue.Arn

This allows the lambda to write events out to its’ DLQ.

PolicyName: "ToastHostReplicationPolicy" PolicyDocument: Id: ToastHostReplicationPolicy Statement: - Sid: !Join [ "", [ !Ref UniqueIdentifier, "ToastHostReplicationPolicy" ] ] Effect: Allow Action: - "s3:Get*" - "s3:List*" - "s3:Put*" Resource: !Join [ "", [ !GetAtt ToastHost.Arn, "*" ] ]

This allows the lambda to read/write from the s3 host bucket in its’ region.

Subscriptions

All of the CF snippets above create a base stack for each region. Now we need to wire all the pieces together. Note that each of these snippets are meant to be run once for each region that this stack is replicating to. We represent the current region we’re wiring to via a ‘ToRegion’ variable in the CF parameters section.

SNS Lambda Invocation

ToastReplicationPermission: Action: "lambda:InvokeFunction" FunctionName: !Join - "" - - "arn:aws:lambda:" - !Ref "AWS::Region" - ":" - !Ref "AWS::AccountId" - ":function:" - !Join [ "-", [ !Ref UniqueIdentifier, "ToastNotifier", !Ref "AWS::Region" ] ] Principal: "sns.amazonaws.com" SourceArn: !Join - "" - - "arn:aws:sns:" - !Ref ToRegion - ":" - !Ref "AWS::AccountId" - ":" - !Join [ "-", [ !Ref UniqueIdentifier, "ToastNotificationTopic", !Ref ToRegion ] ]

This gives the SNS topic permission to invoke the replication lambdas in the other replication regions.

Remote Toast Host Reads

RemoteToastHostReplicationReadPolicy: Properties: PolicyName: !Join [ "-", [ !Ref UniqueIdentifier, "RemoteToastHostReplicationReadPolicy", !Ref ToRegion ] ] PolicyDocument: Id: RemoteToastHostReplicationReadPolicy Statement: - Sid: "RemoteToastHostReplicationReadPolicy" Effect: Allow Action: - "s3:Get*" - "s3:List*" Resource: !Join - "" - - "arn:aws:s3:::" - !Join [ "-", [ !Ref UniqueIdentifier, "toasthost", !Ref ToRegion ] ] - "*" Roles: - !Join [ "-", [ !Ref UniqueIdentifier, "ToastReplicatorRole", !Ref "AWS::Region" ] ]

This gives the replication lambda permission to read data out of the s3 buckets in the other replication regions.

SNS Subscription

ToastReplicationNotificationSubscription: Type: "AWS::SNS::Subscription" Properties: Endpoint: !Join - "" - - "arn:aws:lambda:" - !Ref ToRegion - ":" - !Ref "AWS::AccountId" - ":function:" - !Join [ "-", [ !Ref UniqueIdentifier, "ToastNotifier", !Ref ToRegion ] ] Protocol: "lambda" TopicArn: !Join - "" - - "arn:aws:sns:" - !Ref "AWS::Region" - ":" - !Ref "AWS::AccountId" - ":" - !Join [ "-", [ !Ref UniqueIdentifier, "ToastNotificationTopic", !Ref "AWS::Region" ] ] DependsOn: - ToastReplicationPermission - RemoteToastHostReplicationReadPolicy

Lastly, this subscribes the replication lambda to the SNS Topics in the other replication regions. It requires that all necessary permissions are in place before it can subscribe, which is why we include the ‘DependsOn’ directive.

TL;DR // Deploy

So now that we have all of the pieces, you mix that repo with that dough-oh, make a Texas Lambdaaa.

And by that I mean, follow these commands to deploy you some replication. These commands assume you have the aws cli set up and configured. If you haven’t, please follow the instructions here before proceeding.

Clone the Template Repo git clone https://github.com/jessicalucci/s3-multi-region.git \ && cd s3-multi-region.git export UI= Deploy the replication base stacks (do this for each region you want to replicate into) aws --profile cloudformation create-stack \ --stack-name ToastTest --template-body file://toast-base.yaml \ --parameters ParameterKey=UniqueIdentifier,ParameterValue="$UI" \ --capabilities CAPABILITY_NAMED_IAM --region Deploy the replication subscription stacks (do this for each region you want to replicate into, for each region also in the replica set) aws --profile cloudformation create-stack \ --stack-name ToastTest \ --template-body file://toast-subscription.yaml \ --parameters ParameterKey=UniqueIdentifier,ParameterValue="$UI" \ ParameterKey=ToRegion,ParameterValue= \ --capabilities CAPABILITY_NAMED_IAM --region Start replicating some toast! Upload your favorite toasty image to any bucket in your replica set. Wait a hot minute, then check the other buckets in your replica set to see your toast copies! Yes, it would be much easier to deploy all of these stacks via a script that manages all the naming/region mappings. I’m lazy.

Feel free to leave any questions or comments below!