To complement the last BTRFS tool btrfs-snp (which allows us to schedule snapshots), I would like to share a new tool to synchronize them locally or remotely to achieve efficient data redundancy.

With btrfs-snp we can replicate our BTRFS snapshots in a different BTRFS system, and have a second copy of our versioned subvolume in a much more efficient manner than using the traditional rsync.

Features

Local or remote sync through SSH

Simple syntax

Progress indication

Support for xz or pbzip2 compression in order to save bandwidth

Retention policy

Automatic incremental synchronization

Cron friendly

Usage

The syntax is similar to that of scp

Usage: btrfs-sync [options] <src> [<src>...] [[user@]host:]<dir> -k|--keep NUM keep only last <NUM> sync'ed snapshots -d|--delete delete snapshots in <dst> that don't exist in <src> -z|--xz use xz compression. Saves bandwidth, but uses one CPU -Z|--pbzip2 use pbzip2 compression. Saves bandwidth, but uses all CPUs -q|--quiet don't display progress -v|--verbose display more information -h|--help show usage <src> can either be a single snapshot, or a folder containing snapshots <user> requires privileged permissions at <host> for the 'btrfs' command

Examples

Manual

Synchronize snapshots of home to a USB drive

# btrfs-sync /home/user/.snapshots /media/USBdrive/home-snapshots

Synchronize snapshots of home to a USB drive in another machine

# btrfs-sync /home/user/.snapshots user@server:/media/USBdrive/home-snapshots

Synchronize one snapshot of home to a USB drive in another machine

# btrfs-sync /home/user/.snapshots/monthly_2018-02-08_200102 user@server:/media/USBdrive/home-snapshots

Synchronize only monthly snapshots of home to a USB drive in another machine

# btrfs-sync /home/user/.snapshots/monthly_* user@server:/media/USBdrive/home-snapshots

Use –verbose to get more details

# btrfs-sync --verbose --delete /home/user/.snapshots user@server:/media/USBdrive/home-snapshots * Skip existing '/home/user/.snapshots/monthly_2018-01-09_200102' * Skip existing '/home/user/.snapshots/monthly_2018-02-08_200102' * Skip existing '/home/user/.snapshots/weekly_2018-02-09_140102' * Skip existing '/home/user/.snapshots/weekly_2018-02-16_150102' * Skip existing '/home/user/.snapshots/weekly_2018-02-23_150102' * Skip existing '/home/user/.snapshots/weekly_2018-03-02_180102' * Skip existing '/home/user/.snapshots/daily_2018-03-03_000101' * Skip existing '/home/user/.snapshots/daily_2018-03-04_080101' * Skip existing '/home/user/.snapshots/daily_2018-03-05_100102' * Skip existing '/home/user/.snapshots/daily_2018-03-06_100102' * Skip existing '/home/user/.snapshots/daily_2018-03-07_110102' * Synchronizing '/home/user/.snapshots/hourly_2018-03-08_090101' using seed '.snapshots/hourly_2018-03-07_090101'... time elapsed [0:00:24] | rate [11.1MiB/s] | total size [ 132MiB] * Synchronizing '/home/user/.snapshots/hourly_2018-03-08_100101' using seed '.snapshots/hourly_2018-03-09_090101'... time elapsed [0:01:05] | rate [11.1MiB/s] | total size [ 275MiB] * Deleting non existent snapshots... Delete subvolume (no-commit): '/media/USBdrive/home-snapshots/hourly_2018-03-08_090101' Delete subvolume (no-commit): '/media/USBdrive/home-snapshots/hourly_2018-03-08_100101'

Cron

Daily synchronization over the internet, keep only last 50

cat > /etc/cron.daily/btrfs-sync <<EOF #!/bin/bash /usr/local/sbin/btrfs-sync --quiet --keep 50 --xz /home user@host:/path/to/snaps EOF chmod +x /etc/cron.daily/btrfs-sync

Daily synchronization in LAN, mirror snapshot directory

cat > /etc/cron.daily/btrfs-sync <<EOF #!/bin/bash /usr/local/sbin/btrfs-sync --quiet --delete /home user@host:/path/to/snaps EOF chmod +x /etc/cron.daily/btrfs-sync

Installation

Get the script and make it executable. You can do this in two lines, but better inspect it first. Don’t trust anyone blindly.

sudo wget https://raw.githubusercontent.com/nachoparker/btrfs-sync/master/btrfs-sync -O /usr/local/sbin/btrfs-sync sudo chmod +x /usr/local/sbin/btrfs-sync

It is recommended to set up a designated user for receiving the snapshots that has sudoers access to the btrfs command.

Create a btrfs user at the both ends

$ sudo adduser btrfs

Create a public key in your sending machine

$ sudo -u btrfs ssh-keygen

Give passwordless access to the btrfs user at the remote machine.

$ sudo -u btrfs ssh-copy-id btrfs@<ip>

Give permissions to the btrfs user to use the btrfs on both ends. Create a file

# visudo /etc/sudoers.d/90_btrfs-sync

with the contents

btrfs ALL=(root:nobody) NOPASSWD:NOEXEC: /bin/btrfs

If you want to run it from cron, you might have to install it first because some distributions have already completely replaced it by systemd timers.

This was the case for me in Arch Linux. In my case, I installed cronie.

cronie logs the output to the system log by default, but you can set an email system if you want old style cron mails.

Also, note that you can use chronic if you only want logging to occur only if something goes wrong.

Comparison with rsync

The main difference between these two methods is that BTRFS works at the block level, whereas rsync works at the file level.

Because rsync works with files, it will not detect things such as renaming or moving a file, so it can only send it again. Also, it needs to analyze the existing files at the destination to see if they have been updated or not.

In order to achieve this, it can either analyze modification dates and sizes, which is relatively fast, or compare checksum of file chunks at both ends which is more robust but slower. In any case, there will be a significant processing overhead when you are synchronizating a whole partition with many thousands of files.

A plus of this approach is that you are able to exclude certain files or folders, where BTRFS works by subvolumes in an all or nothing fashion.

BTRFS on the other hand understands blocks, and because it is a COW filesystem, it already knows what bytes have changed between a snapshot and the next. If we renamed the file, only some tiny metadata has changed, and BTRFS knows that we don’t need to transfer the whole file again, only those few bytes.

The same happens when a big file changes internally, such as an image file for a virtual machine where we have been working.

This is the same reason why snapshots in COW filesystems are so space efficient, allowing us to create instant safety copies of huge volumes that only takes extra space as we change the files in them.

Obviously a drawback is that you need a BTRFS filesystem at both ends, but why would we stick to an old generation filesystem where we now have more modern and featureful ones?

Code

#!/bin/bash # # Simple script that synchronizes BTRFS snapshots locally or through SSH. # Features compression, retention policy and automatic incremental sync # # Usage: # btrfs-sync [options] <src> [<src>...] [[user@]host:]<dir> # # -k|--keep NUM keep only last <NUM> sync'ed snapshots # -d|--delete delete snapshots in <dst> that don't exist in <src> # -z|--xz use xz compression. Saves bandwidth, but uses one CPU # -Z|--pbzip2 use pbzip2 compression. Saves bandwidth, but uses all CPUs # -q|--quiet don't display progress # -v|--verbose display more information # -h|--help show usage # # <src> can either be a single snapshot, or a folder containing snapshots # <user> requires privileged permissions at <host> for the 'btrfs' command # # Cron example: daily synchronization over the internet, keep only last 50 # # cat > /etc/cron.daily/btrfs-sync <<EOF # #!/bin/bash # /usr/local/sbin/btrfs-sync -q -k50 -z /home user@host:/path/to/snaps # EOF # chmod +x /etc/cron.daily/btrfs-sync # # Copyleft 2018 by Ignacio Nunez Hernanz <nacho _a_t_ ownyourbits _d_o_t_ com> # GPL licensed (see end of file) * Use at your own risk! # # More at https://ownyourbits.com # # help print_usage() { echo "Usage: $BIN [options] <src> [<src>...] [[user@]host:]<dir> -k|--keep NUM keep only last <NUM> sync'ed snapshots -d|--delete delete snapshots in <dst> that don't exist in <src> -z|--xz use xz compression. Saves bandwidth, but uses one CPU -Z|--pbzip2 use pbzip2 compression. Saves bandwidth, but uses all CPUs -q|--quiet don't display progress -v|--verbose display more information -h|--help show usage <src> can either be a single snapshot, or a folder containing snapshots <user> requires privileged permissions at <host> for the 'btrfs' command Cron example: daily synchronization over the internet, keep only last 50 cat > /etc/cron.daily/btrfs-sync <<EOF #!/bin/bash /usr/local/sbin/btrfs-sync -q -k50 -z /home user@host:/path/to/snaps EOF chmod +x /etc/cron.daily/btrfs-sync " } echov() { [[ "$VERBOSE" == 1 ]] && echo "$@" || true; } #---------------------------------------------------------------------------------------------------------- # parse arguments BIN="${0##*/}" KEEP=0 ZIP=cat PIZ=cat SILENT=">/dev/null" OPTS=$( getopt -o hqzZk:dv -l quiet -l help -l xz -l pbzip2 -l keep: -l delete -l verbose -- "$@" 2>/dev/null ) [[ $? -ne 0 ]] && { echo "error parsing arguments"; exit 1; } eval set -- "$OPTS" while true; do case "$1" in -h|--help ) print_usage; exit 0 ;; -q|--quiet ) QUIET=1 ; shift 1 ;; -d|--delete ) DELETE=1 ; shift 1 ;; -k|--keep ) KEEP=$2 ; shift 2 ;; -z|--xz ) ZIP=xz PIZ=( xz -d ); shift 1 ;; -Z|--pbzip2 ) ZIP=pbzip2 PIZ=( pbzip2 -d ); shift 1 ;; -v|--verbose) SILENT="" VERBOSE=1 ; shift 1 ;; --) shift; break ;; esac done SRC=( "${@:1:$#-1}" ) DST="${@: -1}" # detect remote dst argument [[ "$DST" =~ : ]] && { NET="$( sed 's|:.*||' <<<"$DST" )" DST="$( sed 's|.*:||' <<<"$DST" )" SSH=( ssh -o ServerAliveInterval=5 -o ConnectTimeout=1 "$NET" ) } [[ "$SSH" != "" ]] && DST_CMD=( ${SSH[@]} ) || DST_CMD=( eval ) #---------------------------------------------------------------------------------------------------------- # checks ## general checks [[ $# -lt 2 ]] && { print_usage ; exit 1; } [[ ${EUID} -ne 0 ]] && { echo "Must be run as root. Try 'sudo $BIN'"; exit 1; } ${DST_CMD[@]} true &>/dev/null || { echo "SSH access error to $NET" ; exit 1; } ## src checks while read entry; do SRCS+=( "$entry" ); done < <( for s in "${SRC[@]}"; do src="$(cd "$s" &>/dev/null && pwd)" || { echo "$s not found"; exit 1; } #abspath btrfs subvolume show "$src" &>/dev/null && echo "0|$src" || \ for dir in $( ls -drt "$src"/* 2>/dev/null ); do btrfs subvolume show "$dir" &>/dev/null || continue DATE="$( btrfs su sh "$dir" | grep "Creation time:" | awk '{ print $3, $4 }' )" SECS=$( date -d "$DATE" +"%s" ) echo "$SECS|$dir" done done | sort -V | sed 's=.*|==' ) [[ ${#SRCS[@]} -eq 0 ]] && { echo "no BTRFS subvolumes found"; exit 1; } ## check pbzip2 [[ "$ZIP" == "pbzip2" ]] && { type pbzip2 &>/dev/null && \ "${DST_CMD[@]}" type pbzip2 &>/dev/null || { echo "INFO: 'pbzip2' not installed on both ends, fallback to 'xz'" ZIP=xz PIZ=unxz } } ## use 'pv' command if available PV=( pv -F"time elapsed [%t] | rate %r | total size [%b]" ) [[ "$QUIET" == "1" ]] && PV=( cat ) || type pv &>/dev/null || { echo "INFO: install the 'pv' package in order to get a progress indicator" PV=( cat ) } #---------------------------------------------------------------------------------------------------------- # sync snapshots ## get dst snapshots ( DSTS, DST_UUIDS ) get_dst_snapshots() { local DST="$1" unset DSTS DST_UUIDS while read entry; do DST_UUIDS+=( "$( sed 's=|.*==' <<<"$entry" )" ) DSTS+=( "$( sed 's=.*|==' <<<"$entry" )" ) done < <( "${DST_CMD[@]}" " DSTS=( \$( ls -d \"$DST\"/* 2>/dev/null ) ) for dst in \${DSTS[@]}; do UUID=\$( sudo btrfs su sh \"\$dst\" 2>/dev/null | grep 'Received UUID' | awk '{ print \$3 }' ) [[ \"\$UUID\" == \"-\" ]] || [[ \"\$UUID\" == \"\" ]] && continue echo \"\$UUID|\$dst\" done" ) } ## sync incrementally sync_snapshot() { local SRC="$1" local ID LIST PATH_ DATE SECS SEED SEED_PATH SEED_ARG # detect existing SRC_UUID=$( btrfs su sh "$SRC" | grep "UUID:" | head -1 | awk '{ print $2 }' ) for id in "${DST_UUIDS[@]}"; do [[ "$SRC_UUID" == "$id" ]] && { echov "* Skip existing '$SRC'"; return 0; } done # try to get most recent src snapshot that exists in dst to use as a seed LIST="$( btrfs subvolume list -su "$SRC" )" SEED=$( for id in "${DST_UUIDS[@]}"; do ID=$(btrfs su sh -u "$id" "$SRC" 2>/dev/null|grep "UUID:"|head -1|awk '{print $2}') PATH_=$( awk "{ if ( \$14 == \"$ID\" ) print \$16 }" <<<"$LIST" ) DATE=$( awk "{ if ( \$14 == \"$ID\" ) print \$11, \$12 }" <<<"$LIST" ) [[ "$ID" == "" ]] || [[ "$PATH_" == "$( basename "$SRC" )" ]] && continue SECS=$( date -d "$DATE" +"%s" ) echo "$SECS|$PATH_" done | sort -V | tail -1 | cut -f2 -d'|' ) # incremental sync argument [[ "$SEED" != "" ]] && { SEED_PATH="$( dirname "$SRC" )/$( basename $SEED )" [[ -d "$SEED_PATH" ]] && SEED_ARG=( -p "$SEED_PATH" ) || \ echo "INFO: couldn't find $SEED_PATH. Non-incremental mode" } # do it echo -n "* Synchronizing '$src'" [[ "$SEED_ARG" != "" ]] && echov -n " using seed '$SEED'" echo "..." { btrfs send -q ${SEED_ARG[@]} "$SRC" \ | "$ZIP" \ | "${PV[@]}" \ | "${DST_CMD[@]}" "${PIZ[@]} | sudo btrfs receive \"$DST\" 2>&1 | grep -v '^At subvol '" \ || exit 1; } | grep -v "^At snapshot " get_dst_snapshots "$DST" # sets DSTS DST_UUIDS } #---------------------------------------------------------------------------------------------------------- # sync all snapshots found in src get_dst_snapshots "$DST" # sets DSTS DST_UUIDS for src in "${SRCS[@]}"; do sync_snapshot "$src" done #---------------------------------------------------------------------------------------------------------- # retention policy [[ "$KEEP" != 0 ]] && \ [[ ${#DSTS[@]} -gt $KEEP ]] && \ echov "* Pruning old snapshots..." && \ for (( i=0; i < $(( ${#DSTS[@]} - KEEP )); i++ )); do PRUNE_LIST+=( "${DSTS[$i]}" ) done && \ ${DST_CMD[@]} sudo btrfs subvolume delete "${PRUNE_LIST[@]}" $SILENT # delete flag [[ "$DELETE" == 1 ]] && \ for dst in "${DSTS[@]}"; do FOUND=0 for src in "${SRCS[@]}"; do [[ "$( basename $src )" == "$( basename $dst )" ]] && { FOUND=1; break; } done [[ "$FOUND" == 0 ]] && DEL_LIST+=( "$dst" ) done [[ "$DEL_LIST" != "" ]] && \ echov "* Deleting non existent snapshots..." && \ ${DST_CMD[@]} sudo btrfs subvolume delete "${DEL_LIST[@]}" $SILENT # License # # This script is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This script is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this script; if not, write to the # Free Software Foundation, Inc., 59 Temple Place, Suite 330,