The point is, you should use an advanced file system in RAID1 (on ZFS you could go higher, but I prefer simplicity and the power consumption of having just two drives, and can afford to pay for the wasted drive space) that can detect&correct errors, lets you swap in new drives and migrate out old ones, migrate to larger drives, etc. This is essentially the feature-set that both ZFS and BTRFS have, but the former is considered to be more stable and the latter has been in linux for longer.

The Raspberry Pi is running Raspbian (Debian distributed for the Raspberry Pi). This seems to be the best supported Linux distribution, and I’ve used Debian on servers & desktops for maybe 10 years now, so it’s a no-brainer. The external hard drives are a RAID1 with BTRFS. If I were doing it from scratch, I would look into ZFS, but I’ve been migrating this same data over different drives and home servers (on the same file system) since ZFS was essentially totally experimental on Linux, and on Linux, for RAID1, BTRFS seems totally stable (people do not say the same thing about RAID5/6).

For backups, I’m using Duplicacy, which is annoyingly similarly named to a much older backup tool called Duplicity (there also seems to be another tool called Duplicati, which I haven’t tried. Couldn’t backup tools get more creative with names? How about calling a tool “albatross”?). It’s also annoyingly not free software, but for personal use, the command-line version (which is the only version that I would be using) is free-as-in-beer. I actually settled on this after trying and failing to use (actually open-source) competitors:

First, I tried the aforementioned Duplicity (using its friendly frontend duply). I actually was able to make some full backups (the full size of the archive was around 600GB), but then it started erroring out because it would out-of-memory when trying to unpack the file lists. The backup format of Duplicity is not super efficient, but it is very simple (which was appealing – just tar files and various indexes with lists of files). Unfortunately, some operations need memory that seems to scale with the size of the currently backed up archive, which is a non-starter for my little server with 1GB of ram (and in general shouldn’t be acceptable for backup software, but…)

I next tried a newer option, restic. This has a more efficient backup format, but also had the same problem of running out of memory, though it wasn’t even able to make a backup (though that was probably a good thing, as I wasted less time!). They are aware of it (see, e.g., this issue, so maybe at some point it’ll be an option, but that issue is almost two years old so ho hum…).

So finally I went with the bizarrely sort-of-but-not-really open-source option, Duplicacy. I found other people talking about running it on a Raspberry Pi, and it seemed like the primary place where memory consumption could become a problem was the number of threads used to upload, which thankfully is an argument. I settled on 16 and it seems to work fine (i.e., duplicacy backup -stats -threads 16 ) – the memory consumption seems to hover below 60%, which leaves a very healthy buffer for anything else that’s going on (or periodic little jumps), and regardless, more threads don’t seem to get it to work faster.

The documentation on how to use the command-line version is a little sparse (there is a GUI version that costs money), but once I figured out that to configure it to connect automatically to my B2 account I needed a file .duplicacy/preferences that looked like (see keys section; the rest will probably be written out for you if you run duplicacy first; alternatively, just put this file in place and everything will be set up):

[ { "name": "default", "id": "SOME-ID", "storage": "b2://BUCKET_NAME", "encrypted": true, "no_backup": false, "no_restore": false, "no_save_password": false, "keys": { "b2_id": "ACCOUNT_ID", "b2_key": "ACCOUNT_KEY", "password": "ENCRYPTION_PASSWORD" } } ]

Everything else was pretty much smooth sailing (though, as per usual, the initial backup is quite slow. The Raspberry Pi 3 processor is certainly much faster than previous Raspberry Pis, and fast enough for this purpose, but it definitely still has to work hard! And my residential cable upstream is not all that impressive. After a couple days though, the initial backup will complete!).

Periodic backups run with the same command, and intermediate ones can be pruned away as well (I use duplicacy prune -keep 30:180 -keep 7:30 -keep 1:1 , run after my daily backup, to keep monthly backups beyond 6 months, weekly beyond 1 month, and daily below that. I have a cron job that runs the backup daily, so the last is not strictly necessary, but if I do manual backups it’ll clean them up over time. Since I pretty much never delete files that are put into this archive, pruning isn’t really about saving space, as barring some error on the server the latest backup should contain every file, but it is nice to have the list of snapshots be more manageable).