Many AWS customers have told us that they need to move large amounts of data into and out of the AWS Cloud. Their use cases include:

Migration – Some customers have large data sets that are in a constant state of flux. Their is no natural break or stopping point that they can use to effect a one-time transfer.

Upload & Process – Other customers regularly generate massive data sets on-premises for processing in the cloud. This includes our customers in the media & entertainment, oil & gas, and life sciences industries.

Backup / DR – Finally, other customers copy their precious on-premises data to the cloud for safekeeping and to ensure business continuity.

These customers work at scale! One-time or periodic transfers of tens or hundreds of terabytes are routine. At this scale, making effective use of network bandwidth and achieving high throughput are essential, with reliability, security, and ease of use equally important.

Introducing AWS DataSync

Today we are adding AWS DataSync to our portfolio of data transfer services. Joining AWS Snowball, AWS Snowmobile, Kinesis Data Firehose, S3 Transfer Acceleration, and AWS Storage Gateway, AWS DataSync is built around a super-efficient, purpose-built data transfer protocol that can run 10 times as fast as open source data transfer. It easy to set up and to use (Console and CLI access is available) and scales to the sky!

AWS DataSync is a managed service and you pay only for the data that you transfer. It can sync on-premises data to Amazon Simple Storage Service (S3) buckets or Amazon Elastic File System (EFS) across the Internet or via AWS Direct Connect, and can also sync from AWS to data stored on-premises.

The AWS DataSync Agent is an important part of the service. You deploy the VM in your on-premises data center where it will act as a client to your NFS storage and accelerate the data transfer.

AWS DataSync in Action

Let’s take AWS DataSync for a spin! The AWS DataSync team set up a test environment for me that included the Agent and an NFS server.

Armed with the public IP address of the Agent, I open the AWS DataSync Console and click Get started:

My use case is on-premises to AWS. I select that option, and click Create agent to connect to my on-premises agent:

I download and run the VM image (this was already taken care of for me), enter the public IP address for the agent, and click Get key. Then I name & tag my agent, and click Create agent:

My agent is ready right away and I am ready to create a DataSync task to indicate what I want to sync and when I want to sync it! I click Create task to do this:

I select my use case again, and click Next to proceed:

I create a source location and point it to my NFS server, then click Next (I can configure and use multiple agents in order to increase overall throughput):

Now I create a destination location, choosing between an EFS file system and an S3 bucket:

Next, I create my task. I give it a name and accept all of the default values, and review it (not shown) on the next page. As you can see, I have options to control copying, file management, and use of bandwidth:

My task is ready to use:

I select it and either run it as-is, or override my settings:

The transfer starts right away and I can watch as it progresses:

The transfer takes place across an SSL connection; my bucket quickly fills up with files:

And I can see the final status:

If I run it again without making any changes to the source files, it verifies that the files on both ends are the same, and copies nothing:

If I had changed the files or their permissions, DataSync would transfer the changes in order to make sure that the source and the destination match. The transfers are always incremental, making DataSync perfect for those migration and disaster recovery use cases that I described earlier.

Things to Know

Here are a couple of things that you need to know about AWS DataSync:

Source/Destination – You can transfer from your on-premises servers to AWS and vice versa.

Performance – The overall data transfer speed is a function of overall network conditions; a single agent can saturate a 10 Gbps network link.

Pricing – You pay a low, per-GB charge for data transfer; there is no charge for the service itself.

Available Now

AWS DataSync is available now and you can start using it today in the US East (N. Virginia), US East (Ohio), US West (Oregon), US West (N. California), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), and Asia Pacific (Tokyo) Regions.

— Jeff;