Introduction

Today I implemented a automated backup system for my CentOS server using Duplicity and Amazon S3. After I recently launched my smartphone app Positive thinking, which uses MongoDB as the database/datastore I started to concern myself that I was responsible for my users data and it was no longer just my personal loss if something ever happened.

After researching a few options I went with the above solution, because S3 is highly scalable, reliable, secure, fast and relatively inexpensive (when pared with a solution like Duplicity, which only sends file deltas).

Because duplicity uses librsync, the incremental archives are space efficient and only record the parts of files that have changed since the last backup.

So lets get started with a quick guide to getting daily backups of your CentOS server to Amazon S3. With the help of Duplicity and Duply.

Installation

First lets install the software on our server (all commands are run as the root user)

yum install duplicity duply python-boto 1 yum install duplicity duply python-boto

Set up

Next lets create a new backup config in Duply (Duply is a frontend wrapper for Duplicity, making setup easier and manages running the backup easier and better)

duply create our_backup 1 duply create our_backup

Now edit via vim/emacs/nano the config the above command generated

vim /etc/duply/our_backup/conf 1 vim /etc/duply/our_backup/conf

Comment out the GPG_KEY value and enter any password you like into the GPG_PW value

#GPG_KEY='_KEY_ID_' GPG_PW='YOUR_PASSWORD_HERE' 1 2 #GPG_KEY='_KEY_ID_' GPG_PW='YOUR_PASSWORD_HERE'

Go to S3 in your AWS console and generate a new S3 bucket, called ‘our_backup’. Replace the buckets endpoint domain with the appropriate domain (in my example below s3.amazonaws.com is the US standard region endpoint). Refer to here for the other region endpoints

Enter your AWS access and secret keys.

Set SOURCE to the root path /

TARGET='s3://s3.amazonaws.com/our_backup/main' # optionally the username/password can be defined as extra variables # setting them here _and_ in TARGET results in an error TARGET_USER='<your AWS access key>' TARGET_PASS='<your AWS secret key' # base directory to backup SOURCE='/' 1 2 3 4 5 6 7 8 TARGET='s3://s3.amazonaws.com/our_backup/main' # optionally the username/password can be defined as extra variables # setting them here _and_ in TARGET results in an error TARGET_USER='<your AWS access key>' TARGET_PASS='<your AWS secret key' # base directory to backup SOURCE='/'

uncomment MAX_AGE and set its value to 1 month

MAX_AGE=1M 1 MAX_AGE=1M

Next we want to configure which folders and files we want backed up

vim /etc/duply/our_backup/exclude 1 vim /etc/duply/our_backup/exclude

Heres a list of my folders I backup. Copy the syntax of + space <folder_path>. Dont forget the ** at the bottom, which tells Duply to only include the files in the list.

+ /var/lib/mongo_data + /var/lib/mysql_data + /etc/nginx + /usr/share/web_sites + /etc/php* + /golang_code ** 1 2 3 4 5 6 7 + /var/lib/mongo_data + /var/lib/mysql_data + /etc/nginx + /usr/share/web_sites + /etc/php* + /golang_code **

I am backing up all my nginx and php configs, my web sites, my golang code and my mongodb and mysql data files

Run the first initial backup

Now lets run the first backup which will backup everything in full (note the below is the log from a sequential backup, not the first backup but it gives you a idea of what to look for)

duply our_backup backup 1 duply our_backup backup

Start duply v1.5.11, time is 2014-02-13 02:00:01. Using profile '/etc/duply/full_backup'. Using installed duplicity version 0.6.22, python 2.6.6, gpg 2.0.14 (Home: ~/.gnupg), awk 'GNU Awk 3.1.7', bash '4.1.2(1)-release (i386-redhat-linux-gnu)'. Signing disabled. Not GPG_KEY entries in config. Test - Encryption with passphrase (OK) Test - Decryption with passphrase (OK) Test - Compare (OK) Cleanup - Delete '/tmp/duply.10415.1392256802_*'(OK) --- Start running command PRE at 02:00:02.867 --- Skipping n/a script '/etc/duply/full_backup/pre'. --- Finished state OK at 02:00:02.885 - Runtime 00:00:00.018 --- --- Start running command BKP at 02:00:02.895 --- Import of duplicity.backends.dpbxbackend Failed: No module named dropbox Reading globbing filelist /etc/duply/full_backup/exclude Local and Remote metadata are synchronized, no sync needed. Last full backup date: Wed Feb 12 05:38:53 2014 --------------[ Backup Statistics ]-------------- StartTime 1392256804.12 (Thu Feb 13 02:00:04 2014) EndTime 1392256815.70 (Thu Feb 13 02:00:15 2014) ElapsedTime 11.58 (11.58 seconds) SourceFiles 14635 SourceFileSize 440491579 (420 MB) NewFiles 72 NewFileSize 3688548 (3.52 MB) DeletedFiles 26 ChangedFiles 19 ChangedFileSize 80165809 (76.5 MB) ChangedDeltaSize 0 (0 bytes) DeltaEntries 117 RawDeltaSize 18935063 (18.1 MB) TotalDestinationSizeChange 5888919 (5.62 MB) Errors 0 ------------------------------------------------- --- Finished state OK at 02:00:18.777 - Runtime 00:00:15.881 --- --- Start running command POST at 02:00:18.791 --- Skipping n/a script '/etc/duply/full_backup/post'. --- Finished state OK at 02:00:18.809 - Runtime 00:00:00.018 --- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 Start duply v1.5.11, time is 2014-02-13 02:00:01. Using profile '/etc/duply/full_backup'. Using installed duplicity version 0.6.22, python 2.6.6, gpg 2.0.14 (Home: ~/.gnupg), awk 'GNU Awk 3.1.7', bash '4.1.2(1)-release (i386-redhat-linux-gnu)'. Signing disabled. Not GPG_KEY entries in config. Test - Encryption with passphrase (OK) Test - Decryption with passphrase (OK) Test - Compare (OK) Cleanup - Delete '/tmp/duply.10415.1392256802_*'(OK) --- Start running command PRE at 02:00:02.867 --- Skipping n/a script '/etc/duply/full_backup/pre'. --- Finished state OK at 02:00:02.885 - Runtime 00:00:00.018 --- --- Start running command BKP at 02:00:02.895 --- Import of duplicity.backends.dpbxbackend Failed: No module named dropbox Reading globbing filelist /etc/duply/full_backup/exclude Local and Remote metadata are synchronized, no sync needed. Last full backup date: Wed Feb 12 05:38:53 2014 --------------[ Backup Statistics ]-------------- StartTime 1392256804.12 (Thu Feb 13 02:00:04 2014) EndTime 1392256815.70 (Thu Feb 13 02:00:15 2014) ElapsedTime 11.58 (11.58 seconds) SourceFiles 14635 SourceFileSize 440491579 (420 MB) NewFiles 72 NewFileSize 3688548 (3.52 MB) DeletedFiles 26 ChangedFiles 19 ChangedFileSize 80165809 (76.5 MB) ChangedDeltaSize 0 (0 bytes) DeltaEntries 117 RawDeltaSize 18935063 (18.1 MB) TotalDestinationSizeChange 5888919 (5.62 MB) Errors 0 ------------------------------------------------- --- Finished state OK at 02:00:18.777 - Runtime 00:00:15.881 --- --- Start running command POST at 02:00:18.791 --- Skipping n/a script '/etc/duply/full_backup/post'. --- Finished state OK at 02:00:18.809 - Runtime 00:00:00.018 ---

Check that its working

Now check S3 to see if the backups got uploaded correctly.

Check which files and folders have been backed up

duply full_backup list #or duply full_backup list | grep quote.go 1 2 3 4 5 duply full_backup list #or duply full_backup list | grep quote.go

Scheduling a daily backup and email notifications

Running our backup daily (2am) with Emails

crontab -e 1 crontab -e

Enter the following

MAILTO=jason@mindfsck.net 0 2 * * * duply our_backup backup 1 2 MAILTO=jason@mindfsck.net 0 2 * * * duply our_backup backup

Every day you will get emailed a log like the above when we ran the initial first backup.

Make a backup of our Duply config (at the very least you should securely store the backup password). Download duply_conf_back.tar.gz to a USB drive or something, if you need to completely restore your backups on a new server having the full config will be handy.

tar czvf duply_conf_back.tar.gz /etc/duply 1 tar czvf duply_conf_back.tar.gz /etc/duply

Restoring a backup

We can fetch a single file with the following

cd /tmp duply full_backup fetch usr/share/web_sites/mindfsck.net/wp-content/plugins/wordpress-seo/readme.txt x.txt 1 2 cd /tmp duply full_backup fetch usr/share/web_sites/mindfsck.net/wp-content/plugins/wordpress-seo/readme.txt x.txt

I will write more on restoring in the future, for now refer to

man duply 1 man duply

Should you get your self into a sticky situation