With the recent announcement from Crashplan that they will discontinue Crashplan for Home, it was time for me to re-evaluate the backup strategy of my personal data. I’m currently running a QNAP TS-671, and used to run the Crashplan application on the device.

One of the alternatives is to use the native Hybrid Backup Sync (HBS) application available on QNAP. HBS includes support for a great variety of storage options, including local, remote and cloud based backup solutions. It’s always a good thing to use a combination of backup solutions, as the “backup 3-2-1 strategy” tells you:

Keep (at least) 3 copies of your data;

2 of which are local but on different medium/devices;

and at least 1 backup off-site.

A practical implementation of the 3-2-1 strategy is to have one copy of the data on the QNAP, one copy of the data on an external disk and one copy somewhere off-site. Currently, “off-site” is Crashplan for me, but this has to change.

An external hard disk backup (using versioning) can be implemented using the local backup option HBS. For the remote cloud based backup HBS offers support for AWS S3 and Glacier, Google Cloud Storage, Microsoft Azure Storage and OpenStack Swift. In this blogpost I will have a closer look at S3 and Glacier.

A first look at AWS S3 and Glacier

Both S3 and Glacier are storage options that can be used for backup and archiving. AWS also offers solutions like Elastic File Storage (EFS) and Elastic Block Storage (EBS), but these options are meant for other use cases. S3 and Glacier are perfectly suitable to function as a backup target.

AWS S3 is object storage. It stores objects (files) in a hierarchy (folder structure) and leverages Amazon’s scalable storage infrastructure. A bucket is the root object which S3 leverages to store files in a hierarchy. S3 is designed for 99.999999999% durability, and 99.9 – 99.99% availability (notice the difference between durability and availability).S3 comes in different flavours: S3 standard, and S3 standard – infrequent access (IA). You pay for what you use, but with IA there’s a minimum storage duration of 30 days. You pay for the amount of data stored in S3, for data requests and for (network) data out of AWS to the internet. IA has cheaper storage pricing, but data requests are more expensive. For backup purposes IA looks like the better option. You can store and retrieve data through the AWS S3 API and the AWS console.

AWS Glacier is archive storage, so just for archiving and/or long-term backup. It stores archives in a flat structure, so there’s no hierarchy here. Glacier is designed for (very) infrequent access: you have to wait somewhere between a couple of minutes and several hours before you can download your backup/archive data. The quicker you want to access your data, the more you have to pay to retrieve it. When you want to make a request to retrieve data from Glacier, you actually initiate a retrieval job for an archive. Once the retrieval job completes, you’re data will be available for download. Read more about the different retrieval options, and related costs here. There’s no SLA for Glacier, however the service is designed for an annual durablity of 99.999999999%. There’s a minimum storage duration of 90 days for Glacier.

Note that you can automatically move data from S3 to Glacier leveraging lifecycle rules. I will get into this in a future post.

From a pricing perspective there’s a big difference between the different options. Look at the following list that is applicable to storage pricing:

S3 standard storage: $0,023 per GB per month (for the first 50 TB);

S3 IA storage: $0,0125 per GB per month (for the first 50 TB);

Glacier: $0,004 per GB per month

So S3 IA is more than 3 times more expensive that Glacier, and S3 is 6 times more expensive than Glacier. With a 500 GB backup-set prices will be around $24, $75 and $138 annually. Apart from storage pricing, you also have to deal with GET, PUT, COPY, POST and list requests for S3. With Glacier you only have to pay for UPLOAD requests. Notice that data-in is free, you pay for dat- out from AWS to the internet; pricing for S3 and Glacier is identical here.

Also read this comment on reddit about Glacier pricing in combination with lifecycle rules.

Configure Glacier or S3 on your QNAP

The initial configuration for both S3 and Glacier is quite the same. Open the Hybrid Cloud Sync app, select storage spaces in the menu and create a new cloud space. Choose S3 or Glacier, create a descriptive name and select to settings to enter the account name. You have to enter an Access Key and Secret key here. This is something you configure under Identity and Access Management (IAM) in AWS:

Go to the IAM console on AWS; Select users and choose add user; Create a username and select programmatic access, so your QNAP can access the AWS APIs; In the next screen your directly attach existing policies, or create a new group with the required policies attached. For S3 you will need the AmazonS3FullAccess permission, for Glacier the AmazonGlacierFullAccess permission; Click next and right down the access key and secret key. The secret key will only be displayed once. (By the way, it’s a best practice to regenerate the access and secret key every now and then).

Paste the access and secret Key in the configuration screen on the QNAP and now select in which region you want to store your data. The rest is just business as usual in Hybrid Cloud Sync App. Notice that S3 and Glacier don’t support the versioning option of Hybrid Cloud Sync. Optionally you can choose the encrypt your data at the client side, I will dive into this in a future post.

Restoring your data, some concerns

Of course you can retrieve your data straight from the QNAP. Just select the restore option on your QNAP, select the correct storage space and create a backup job. Nothing exciting here, QNAP is leveraging the local database so you can select the files you want to restore.

But let’s say your QNAP device is lost, what are your options? S3 has a big advantage here, because files or stored in a hierarchy/folder structure. You will actually find your QNAP file structure back in your S3 bucket. Just download some kind of S3 explorer solution to download your data directly from S3.

It’s totally different with Glacier, Glacier stores archives in a flat file structure. If you look into Glacier you will just find a long list of files with IDs, you won’t recognise the original filenames. Luckily QNAP stores a metadata file in Glacier, that contains information about the uploaded files and the original file structure. What you can do here, is buy a new QNAP device, configure the original Glacier storage space and create a restore job using the “destination” option in the wizard. This option will present you with an option to restore the metadata + (all the) data, or just the metadata so you can select the files you want to restore in phase 2.

Another option is to use a QNAP metadata compatible Glacier explorer solution, such as Freeze. Freeze understands the QNAP metadata file and will present you with an understandable inventory of your Glacier vault. Of course every restore job you initiate will cost you a couple of hours because this is how Glacier works. Especially in case of Glacier it’s important to recognise the importance of this metadata file.

S3 or Glacier, what’s your choice?

We’ve seen that S3 and Glacier can both be used as a backup target, however the use-case is different. Both options can be used to backup/archive your data, but if you want to able to retrieve your data right away you should go for S3. Glacier has asks for a different approach, mainly because of the flat file structure and the usage of archives. On top of this you should evaluate for yourself if delayed retrievals are acceptable for you. For both S3 and Glacier it’s important to think about how you’re going to restore your data in case the original source (the QNAP in this case) is lost.

I hope this was helpful, stay tuned through twitter for more interesting content.