Building a ZFS Backup Server

I've been running an Ubuntu home server with ZFS based storage for sometime. What follows is a rather detailed walk through my implementation of a ZFS backup server. I have been meaning to implement ZFS snapshot based backups for quite some time and I figured I may as well document the journey, not only as an aide memoire for myself but also in case it's of any use to other people interested in doing the same. As with everything else I write, this guide is provided as is and your mileage may vary.

For the remainder of this article, the home server is referred to as brox and the backup server is mundo. And just in case you're wondering, the sensitive information (i.e. public keys etc) featured in this articles aren't real.

Updated 20160104T21:00:00Z

The original backup server build used recycled components (cheap mobo, AMD Athlon 64 X2 CPU) and proved problematic in a number of ways, not least getting checksum errors in both zpools. I have since changed the mobo, CPU and memory. The new specification is as follows:

Asus P8B-M motherboard

Intel Pentium G520 CPU

4 GB DDR3l ECC RAM

This has resolved (so far) the checksum problems, the dysfunctional Wake-on-Lan and the problems transferring ZFS snapshots from primary to backup server at anything over 100 Megabits/second. I've not amended the article to reflect these changes, but instead inserted this ammendment to serve as a warning to anyone looking to cut corners when relying on ZFS.

Additionally, in the original article there were some problems with the way I was invoking zpool scrubs on the backup server. I was starting them remotely via ssh, and then not checking to see if the scrub was still running/complete before shutting down the backup server. I'd lost sight of the fact that "zpool scrub" instantiates a background process. And there seems no way for forcing it to run in the foreground. Now, after starting a scrub, I check that the scrub is running, and then repeat the check every ten minutes until the srub is complete. I have worked the amendments for this in this article.

Sometime soon I will add an additional step to the backup script that causes a failure if there are any read/write/checksum errors from the zpool status, and thus results in an email notification when the cron job fails.

Hardware

The existing home server uses a Xeon E3-1220L v2 CPU, an efficient power supply, DDR3L ECC ram and Noctua CPU cooler/case fans. This amounts to a package that consumes 15 Watts of electricity in it's idle state, yet has a reasonable compute power when required.

The new (to me) backup server has afforded no such luxuries. Whilst it would be nice to use ECC ram, which is recommended for running ZFS, that would make this project more expensive. Electricity consumption is also less of a concern as I'd only expect it to be on for, at maximum an hour each day. The only new component is the case, which I bought to match the existing home server.

I intially acquired a pair of Asus P5 series boards each complete with an Intel Core2Quad (Q6600) CPU. These were binned components from what we affectionately call "the graveyard" at work. One of the boards turned out to have a series of visibly blown capacitors, so that was quickly returned. The second looked promising, but no amount of fettling would get it to boot stabley. Plan-B turned up an XFS MI-A78U-8309 AM2 motherboard complete with AMD Athlon 64 X2 CPU from eBay for the princely sum of £15. Normally, I'd consider this to be a terrible choice due to it's low CPU mark and abysmal power consumption (65 Watts at idle, before adding disks).

When it arrived, I assembled the components and added 4 2TB hard disk drives recycled out of my desktop computer and my old NAS box. An addition 500GB disk was salvaged out of an old media player to use a boot drive. Memory (4 x 1 GB DDR2) was sourced from a box of spare parts, as was a 550 watt PSU.

I've been tempted to move away from Ubuntu for some time - mainly out of the frustration I've had with the sometimes lengthy wait for patches to show up in the Ubuntu repositories. That said, the ZFS packages for Ubuntu have been rock solid reliable and the 5 year extended support period for Ubuntu's LTS releases is useful. Debian stable releases are in contrast only support for 12 months following the next stable release.

The first thing I did was boot it up. I was pleased to see it successfully POST. I rebooted and ran through the BIOS - it recognised all five SATA disks. We're off to a promising start.

Ubuntu Installation

I downloaded Ubuntu Server 14.04.3 LTS and prepared USB installation media. At this point, I encountered the first hitch - I couldn't get the XFS MI0A78U mobo to boot from USB media. Period. The only spare ROM drive I had in my parts bin was a PATA drive. Fortunately, this board has one PATA connector. I plumbed in the ROM drive and prepared an Unbuntu installation CD. On rebooting the server, I again checked the BIOS. The ROM drive was recognised, but instead of seeing 5 SATA disks I only saw 4. Early indication was that this was an effected of connecting a PATA device to the motherboard. Nevermind. I'll just have to disconnect the ROM drive post install. Yes, this inflexibility is sub-optimal but we're talking about a backup for a home server, rather than anything mission critical.

I rebooted with the Ubuntu Server installation CD in the ROM drive and launched Memtest86+ - it ran successfully for three entire passes before I rebooted the machine again and this time chose to test the installation media.

I continued with the installation which went swiftly, until I rebooted. On boot we see everything happening as normal, until standard output shows plymouth-upstart-bridge respawning too fast and then a blank display.

Post-installation Boot Woes

I rebooted (again) and using the Grub menu, dropped into a root recovery console. The file system is, by default, mounted read only. The first step is mounting it read-write:

# mount -o rw,remount /

I then edited the plymouth-upstart-bridge.conf job

# vi /etc/init/plymouth-upstart-brdige.conf

And added to the end of the file:

post-stop exec sleep 2

Sure enough, this significantly reduced the visible complaints from the kernel about plymouth-upstart-bridge, but the blank display persisted. Once again I rebooted, dropped to the recovery console and remounted the root file system read-write. I then made some changes to /etc/defaults/grub:

# vi /etc/defaults/grub

And changed the line:

GRUB_CMDLINE_LINUX_DEFAULT=""

To:

GRUB_CMDLINE_LINUX_DEFAULT="noplymouth nosplash nomodeset"

This basically disables the splash screen and stops the kernel trying to use anything but the default BIOS video mode. Finally I rebooted and the black screen of death was supplanted by a log-in prompt.

Configuring Networking

The first thing I tried, post-boot, was pinging the new server.

biscuitninja@colnago ~ $ ping mundo PING mundo.bikeshed.internal (172.16.3.49) 56(84) bytes of data. 64 bytes from mundo.bikeshed.internal (172.16.3.49): icmp_seq=1 ttl=64 time=1.11 ms 64 bytes from mundo.bikeshed.internal (172.16.3.49): icmp_seq=2 ttl=64 time=3.73 ms 64 bytes from mundo.bikeshed.internal (172.16.3.49): icmp_seq=3 ttl=64 time=1.35 ms ^C --- mundo.bikeshed.internal ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2003ms rtt min/avg/max/mdev = 1.110/2.065/3.736/1.185 ms

Result. It's received an IP address and DNS records courtesy of the DHCP server running on brox. Now, for most my devices, I'd create a DHCP reservation. In this instance, howerver, I want to statically define an IP address. I'm planning some sort of crude/semi-automated DHCP/DNS failover, where-by when I boot this server, if the DHCP/DNS services on brox are unavailable, it will take the reigns. True high availability for most home set-ups is overkill - especially on a box that with it's disks will eat best part of 100 Watts.

I ssh into mundo and swap out it's DHCP configuration for a static address:

biscuitninja@mundo:~$ sudo vi /etc/network/interfaces # The loopback network interface auto lo iface lo inet loopback # The primary network interface auto eth0 iface eth0 inet static address 172.16.1.52 netmask 255.255.248.0 gateway 172.16.1.1 dns-nameservers 172.16.1.52 127.0.0.1 biscuitninja@mundo:~$ sudo ifdown eth0 ; sudo ifup eth0

As a result I lose the ssh session. In another terminal window I check that I can ping the new IP address and then wonder off to update my DNS records accordingly.

$ ping 172.16.1.52 PING 172.16.1.52 (172.16.1.52) 56(84) bytes of data. 64 bytes from 172.16.1.52: icmp_seq=1 ttl=64 time=0.504 ms 64 bytes from 172.16.1.52: icmp_seq=2 ttl=64 time=0.265 ms 64 bytes from 172.16.1.52: icmp_seq=3 ttl=64 time=0.249 ms ^C --- 172.16.1.52 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 2999ms rtt min/avg/max/mdev = 0.249/0.320/0.504/0.107 ms $ ssh brox $ sudo rndc freeze bikeshed.internal $ sudo vi /var/lib/bind/db.bikeshed.internal

I incremented the zone's serial number and amended the record for mundo to match the new static IP address.

mundo A 172.16.1.52 $ sudo rndc thaw bikeshed.internal

I repeat the process to remove the old PTR record from the DHCP reverse lookup zone (3.16.172.in-addr.arpa) and add it to the correct reverse lookup zone (1.16.172.in-addr.arpa).

And then finally, I test the changes:

biscuitninja@colnago ~ $ nslookup mundo Server: 172.16.1.51 Address: 172.16.1.51#53 Name: mundo.bikeshed.internal Address: 172.16.1.52 biscuitninja@colnago ~ $ nslookup 172.16.1.52 Server: 172.16.1.51 Address: 172.16.1.51#53 52.1.16.172.in-addr.arpa name = mundo.bikeshed.internal biscuitninja@colnago ~ $ ping mundo PING mundo.bikeshed.internal (172.16.1.52) 56(84) bytes of data. 64 bytes from mundo.bikeshed.internal (172.16.1.52): icmp_seq=1 ttl=64 time=2.25 ms 64 bytes from mundo.bikeshed.internal (172.16.1.52): icmp_seq=2 ttl=64 time=4.67 ms 64 bytes from mundo.bikeshed.internal (172.16.1.52): icmp_seq=3 ttl=64 time=1.40 ms ^C --- mundo.bikeshed.internal ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2003ms rtt min/avg/max/mdev = 1.406/2.777/4.675/1.386 ms

Finally, I want to make sure "hostname -f" shows returns mundo's fully qualified domain name.

biscuitninja@mundo:~$ hostname -f mundo biscuitninja@mundo:~$ sudo vi /etc/host

Change the line

127.0.1.1 mundo

To

127.0.1.1 mundo.bikeshed.internal mundo.bikeshed mundo

And then test

biscuitninja@mundo:~$ hostname -f mundo.bikeshed.internal

Securing SSH

Thus far we've got the server to build, moved it onto a static IP and re-configured it's DNS records. Before moving on, it's time to generate a public/private key pair for SSH and disable password authentication.

$ ssh-keygen -t rsa -b 3072 -f ~/.ssh/id_rsa_mundo.bikeshed.internal Generating public/private rsa key pair. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/biscuitninja/.ssh/id_rsa_mundo.bikeshed.internal. Your public key has been saved in /home/biscuitninja/.ssh/id_rsa_mundo.bikeshed.internal.pub. The key fingerprint is: 99:52:54:20:d1:e7:8d:2d:95:0d:ba:b1:98:22:fa:34 biscuitninja@colnago The key's randomart image is: +--[ RSA 3072]----+ | oooo. .+ | | o. ..o . | | .oo= | | . =++o | | . o S o. | | . . o | | . E | | o . | | . | +-----------------+ biscuitninja@colnago ~ $ ssh-copy-id -o PreferredAuthentications=password -i ~/.ssh/id_rsa_mundo.bikeshed.internal mundo /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys biscuitninja@mundo's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh -o 'PreferredAuthentications=password' 'mundo'" and check to make sure that only the key(s) you wanted were added.

At this point I will take the opportunity to amend my ssh config and backup the keys:

biscuitninja@colnago ~ $ vi ~/.ssh/config Host mundo mundo.bikeshed.internal Hostname mundo.bikeshed.internal IdentityFile ~/.ssh/id_rsa_mundo.bikeshed.internal User biscuitninja biscuitninja@colnago ~ $ cp ~/.ssh/* /mnt/it/infrastructure/secure/ssh/. biscuitninja@colnago ~ $ rm /mnt/it/infrastructure/secure/ssh/known_hosts* biscuitninja@colnago ~ $ chmod 400 /mnt/it/infrastructure/secure/ssh/*

I can then test my connection without having to specify the identity file.

biscuitninja@colnago ~ $ ssh mundo The authenticity of host 'mundo.bikeshed.internal (172.16.1.52)' can't be established. ECDSA key fingerprint is e2:63:b0:84:6f:3a:d8:e1:cc:e6:65:4a:64:85:8b:4b. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'mundo.bikeshed.internal' (ECDSA) to the list of known hosts. Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.19.0-25-generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Fri Nov 6 21:21:50 GMT 2015 System load: 0.0 Processes: 98 Usage of /: 0.3% of 458.32GB Users logged in: 1 Memory usage: 1% IP address for eth0: 172.16.1.52 Swap usage: 0% Graph this data and manage this system at: https://landscape.canonical.com/ 56 packages can be updated. 25 updates are security updates. Last login: Fri Nov 6 20:47:13 2015 from 172.16.3.41 biscuitninja@mundo:~$

Whilst I have a secure shell session running, I'll tweak the configuration as to only allow publickey authentication.

biscuitninja@mundo:~$ sudo vi /etc/ssh/sshd_config

Amend/add the following entries:

PermitRootLogin no PasswordAuthentication no AllowTcpForwarding no Banner /etc/issue.net

Save changes and close the editor. I'll then setup my preferred unauthorised use warning as an ssh banner.

biscuitninja@mundo:~$ sudo vi /etc/issue.net

Insert the following, amending as appropriate:

******************************************************************** * Unauthorized use of this computer system constitutes a criminal * * offense. * * * * Anyone accessing this system expressly consents to the * * monitoring of their activity. * * * * Any suspicious or criminal activity will be reported to law * * enforcement and/or relevant service providers, rendering the * * perpetrators liable to criminal investigations and other * * appropriate sanctions. * ********************************************************************

Then restart the ssh daemon

biscuitninja@mundo:~$ sudo service ssh restart

Tweak aptitude and then Update

I'm a bit of a control freak, at least as far as technology goes. So I tend to prevent apt-get from automatically installing recommended packages.

biscuitninja@mundo:~$ sudo vi /etc/apt/apt.conf.d/99bikeshed-tweaks

Insert the following lines:

APT::Install-Recommends "false"; APT::Install-Suggests "false";

Save and close the new file. Then let's bring the new server bang up-to-date

biscuitninja@mundo:~$ sudo apt-get update ; sudo apt-get upgrade -y

Set-up and Test Wake-on-Lan (WOL)

On reading this article, my first thoughts are to shut down mundo and hunt around the BIOS settings for any configuration relating to "Wake-On-LAN". After an extensive search, I drew a blank. I could see a resume setting within the Advanced Power Management menu, but I chose to ignore it for now as I suspect that relates to resuming from a suspended state - where as I'm more interested in starting the server from a powered-off state.

biscuitninja@mundo:~$ sudo ethtool eth0 [sudo] password for biscuitninja: Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supported pause frame use: No Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised pause frame use: No Advertised auto-negotiation: No Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on MDI-X: Unknown Supports Wake-on: pg Wake-on: d Current message level: 0x000000ff (255) drv probe link timer ifdown ifup rx_err tx_err Link detected: yes

The ethtool output shows that Wake-On-Lan is supported - as demarked by the letter 'g' after the field name "Supports Wake-on". There's no 'g' for the "Wake-on" field so we have to try and enable it.

biscuitninja@mundo:~$ sudo ethtool -s eth0 wol g biscuitninja@mundo:~$ sudo ethtool eth0 Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supported pause frame use: No Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised pause frame use: No Advertised auto-negotiation: No Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on MDI-X: Unknown Supports Wake-on: pg Wake-on: g Current message level: 0x000000ff (255) drv probe link timer ifdown ifup rx_err tx_err Link detected: yes

Okay, that looks promising. The next step is to try and test it. First let's obtain the MAC address:

biscuitninja@colnago ~ $ arp -a | grep mundo mundo.bikeshed.internal (172.16.1.52) at 00:e0:61:0d:79:b6 [ether] on wlan0

Let's shut it down

biscuitninja@mundo:~$ sudo shutdown -P now

And then try and wake it up again...

biscuitninja@colnago ~ $ sudo apt-get install -y wakeonlan biscuitninja@colnago ~ $ wakeonlan 00:e0:61:0d:79:b6

No joy. This doesn't work. Try again with port 7.

biscuitninja@colnago ~ $ wakeonlan -p 7 00:e0:61:0d:79:b6

Still no joy. Further reading suggests that the ethtool change doesn't persist a reboot, so lets make it a bit more persistent.

biscuitninja@mundo:~$ sudo -i root@mundo:~$ echo '#!/bin/sh' > /etc/network/if-up.d/wol root@mundo:~$ echo 'ethtool -s eth0 wol g' >> /etc/network/if-up.d/wol root@mundo:~$ chmod a+x /etc/network/if-up.d/wol

Okay, let's reboot and enable Wake-On-LAN in the BIOS APM resume settings, boot, shutdown and then try again:

biscuitninja@colnago ~ $ wakeonlan 00:e0:61:0d:79:b6 biscuitninja@colnago ~ $ wakeonlan -p 7 00:e0:61:0d:79:b6

And again, no joy. I've a sneaking suspicion that this is happening because the network card remains visibly unpowered whilst the machine is in it's shutdown state, as indicated by a lack of blinking lights. Okay, lets try suspending the server instead of shutting it down.

biscuitninja@mundo:~$ sudo apt-get install -y pm-utils biscuitninja@mundo:~$ sudo pm-suspend

The server drops down to a suspended state - as noted by the power consumption reading 4.2 Watts, as oppose to the usual 2.4 Watts when it's propely shutdown. There's still no blinking lights upon the ethernet port, so things still don't look promising.

biscuitninja@colnago ~ $ wakeonlan 00:e0:61:0d:79:b6 biscuitninja@colnago ~ $ wakeonlan -p 7 00:e0:61:0d:79:b6

Nope. No cigar. I revisit the BIOS settings and enable all the 'wake' features (PCIE/USB/Keyboard/Mouse etc) and then suspend the server again. This time, I still can't wake it up by sending a magic packet but I can wake it up by pressing a key on the keyboard. So let's step back from the problem and examine options.

In an ideal world, a cron job running on brox will wake up mundo, and then having checked the ssh port becomes available send a ZFS snapshot to mundo. The current hardware doesn't appear to support Wake-On-LAN, so what are teh alternatives?

I could use an additional Network Adapter that does support Wake-On-LAN? It's plausible, but it might fall down if the BIOS "Wake-On-PCIE" setting is ineffective. Further more, either PCIE 2.2 is required or a NIC with an independant power connection. I'm already thinkingn about upgrading the motherboard as soon as funds allow, so spending additional cash on a short-term solution seems silly.

A second option might be using a Raspberry Pi and a remote controlled electric socket. I could leave the Pi running 24x7 and use it to power on the remote socket. All I need do is change Mundo's powered-on state BIOS option. It's quite a viable option given that the Pi uses approx 2.5 Watts and the remote socket is just 1 Watt more. Given that when shutdown and plugged in Mundo is using 2.4 Watts, the difference is tiny. The only down sides are maintaining two machines instead of one (a tiny overhead, particularly given that the Pi has a single function) and really, I'd like to keep the Pi spare as it's earmarked for a future project.

So, what else? Oh yeah, sifting through the BIOS options, I noticed on the Advanced Power Management sub-menu a "Wake on RTC" option. RTC? A quick search shows that to be Real Time Clock. So we can wake up mundo at a given time, using the server's wall clock. Bingo. Although it would be nice fire mundo up remotely using Wake-On-LAN, in the event brox has experienced a hardware failure and is no longer running DHCP/DNS etc., that can wait until I find some more flexible hardware.

I configure the "Wake on RTC" option, shut mundo down and wait. Five minutes later, it starts up of it's own accord. Sold. This approach does mean relying on the system wall clock, so brox must check first (with some tolerance) that mundo is awake before sending it any ZFS snapshots. If after a period of time mundo isn't available, then brox will need to send some sort of notification.

I'll also ensure that when it's switched on, mundo syncrhonises the wall clock with the NTP service on brox.

Configure NTP

Check the current status of NTP.

biscuitninja@mundo:~# ntpq -p The program 'ntpq' is currently not installed. You can install it by typing: apt-get install ntp

NTP isn't installed. First I remove the deprecated ntpdate...

biscuitninja@mundo:~# sudo apt-get remove -y ntpdate Reading package lists... Done Building dependency tree Reading state information... Done The following packages will be REMOVED ntpdate ubuntu-minimal 0 to upgrade, 0 to newly install, 2 to remove and 3 not to upgrade. After this operation, 312 kB disk space will be freed. (Reading database ... 56559 files and directories currently installed.) Removing ubuntu-minimal (1.325) ... Removing ntpdate (1:4.2.6.p5+dfsg-3ubuntu2.14.04.5) ... Processing triggers for man-db (2.6.7.1-1ubuntu1) ...

Then install ntp:

biscuitninja@mundo:~# sudo apt-get install -y ntp Reading package lists... Done Building dependency tree Reading state information... Done The following extra packages will be installed: libopts25 Suggested packages: ntp-doc The following NEW packages will be installed libopts25 ntp 0 to upgrade, 2 to newly install, 0 to remove and 3 not to upgrade. Need to get 474 kB of archives. After this operation, 1,677 kB of additional disk space will be used. Get:1 http://gb.archive.ubuntu.com/ubuntu/ trusty/main libopts25 amd64 1:5.18-2ubuntu2 [55.3 kB] Get:2 http://gb.archive.ubuntu.com/ubuntu/ trusty-updates/main ntp amd64 1:4.2.6.p5+dfsg-3ubuntu2.14.04.5 [419 kB] Fetched 474 kB in 2s (174 kB/s) Selecting previously unselected package libopts25:amd64. (Reading database ... 56547 files and directories currently installed.) Preparing to unpack .../libopts25_1%3a5.18-2ubuntu2_amd64.deb ... Unpacking libopts25:amd64 (1:5.18-2ubuntu2) ... Selecting previously unselected package ntp. Preparing to unpack .../ntp_1%3a4.2.6.p5+dfsg-3ubuntu2.14.04.5_amd64.deb ... Unpacking ntp (1:4.2.6.p5+dfsg-3ubuntu2.14.04.5) ... Processing triggers for ureadahead (0.100.0-16) ... ureadahead will be reprofiled on next reboot Processing triggers for man-db (2.6.7.1-1ubuntu1) ... Setting up libopts25:amd64 (1:5.18-2ubuntu2) ... Setting up ntp (1:4.2.6.p5+dfsg-3ubuntu2.14.04.5) ... Starting NTP server ntpd ...done. Processing triggers for libc-bin (2.19-0ubuntu6.6) ... Processing triggers for ureadahead (0.100.0-16) ... biscuitninja@mundo:~$ sudo vi /etc/ntp.conf

Find the line

server ntp.ubuntu.com

Specify the following servers, deleting any that already exist:

server brox.bikeshed.internal prefer iburst server 0.uk.pool.ntp.org server 1.uk.pool.ntp.org server 2.uk.pool.ntp.org server 3.uk.pool.ntp.org server ntp.ubuntu.com server 127.127.1.0 fudge 127.127.1.0 stratum 16

This configuration specifies brox as mundo's primary time source with the "prefer" option. The "iburst" confused me somewhat. The "burst" and "iburst" settings according to the relevant documentation:

burst: When the server is reachable, send a burst of eight packets instead of the usual one. The packet spacing is normally 2 s; however, the spacing between the first and second packets can be changed with the calldelay command to allow additional time for a modem or ISDN call to complete. This is designed to improve timekeeping quality with the server command and s addresses.

iburst: When the server is unreachable, send a burst of eight packets instead of the usual one. The packet spacing is normally 2 s; however, the spacing between the first two packets can be changed with the calldelay command to allow additional time for a modem or ISDN call to complete. This is designed to speed the initial synchronization acquisition with the server command and s addresses and when ntpd(8) is started with the -q option

I figure as the brox should be "reachable", then using the burst option should decrease the time it takes for ntpd to acquire and groom data before selecting a time source. However, in my testing, I found that using "iburst" meant brox was selected as a time source after a 30 or so seconds, where as with "burst" it took several minutes.

As a side note, using burst and iburst with public NTP servers is considered bad form.

The other servers are added primarily for redundancy. The last two lines mean that in the event no network is available, the local wall clock is used.

If the hardware clock on mundo drifts by over 1000 seconds (the ntp daemons default panic threshold) whilst mundo is powered down, then the ntp daemon log a warning to the syslog and then exit. We reconfigure it with the "-g" option which allows the time to be set to any value.

biscuitninja@mundo:~$ sudo vi /etc/default/ntp

Add the "-g" option, if it doesn't already exist:

NTPD_OPTS='-g'

Restart the NTP service:

biscuitninja@mundo:~$ sudo service ntp restart [sudo] password for biscuitninja: * Stopping NTP server ntpd ...done. * Starting NTP server ntpd ...done.

Test:

biscuitninja@mundo:~$ ntpq -p remote refid st t when poll reach delay offset jitter ============================================================================== *brox.bikeshed.i 94.125.132.7 3 u 3 64 377 0.246 -3.479 1.015 +kvm1.websters-c 193.190.230.65 2 u 38 64 377 22.726 -5.157 3.049 +5.77.45.219 81.63.144.23 3 u 44 64 377 24.682 -4.723 1.043 +lon.jonesey.net 81.174.136.35 2 u 30 64 377 23.332 -6.065 1.694 -159-253-77-127. 94.125.132.7 3 u 30 64 377 34.777 -3.020 1.661 +juniperberry.ca 140.203.204.77 2 u 32 64 377 24.964 -4.832 0.755 LOCAL(0) .LOCL. 16 l 1147 64 0 0.000 0.000 0.000

The asterisk denotes the remote time server mundo is using as it's time source. It might take a short while whilst the That concludes all of the prerequisite work prior to installing and configuring ZFS.

Install ZFS

Install the ZFS PPA (Personal Package Archive). More information from the ZFS On Linux project.

biscuitninja@mundo:~$ sudo apt-get install software-properties-common [sudo] password for biscuitninja: Reading package lists... Done Building dependency tree Reading state information... Done software-properties-common is already the newest version. 0 to upgrade, 0 to newly install, 0 to remove and 3 not to upgrade.

Okay, it was already installed. The above step is only necessary for an Ubuntu minimal installation. Lets add the ZFS On Linux PPA.

biscuitninja@mundo:~$ sudo add-apt-repository ppa:zfs-native/stable The native ZFS filesystem for Linux. Install the ubuntu-zfs package. Please join this Launchpad user group if you want to show support for ZoL: https://launchpad.net/~zfs-native-users Send feedback or requests for help to this email list: http://list.zfsonlinux.org/mailman/listinfo/zfs-discuss <email address hidden> Report bugs at: https://github.com/zfsonlinux/zfs/issues (for the driver itself) https://github.com/zfsonlinux/pkg-zfs/issues (for the packaging) The ZoL project home page is: http://zfsonlinux.org/ More info: https://launchpad.net/~zfs-native/+archive/ubuntu/stable Press [ENTER] to continue or ctrl-c to cancel adding it gpg: keyring `/tmp/tmpoz3eqc57/secring.gpg' created gpg: keyring `/tmp/tmpoz3eqc57/pubring.gpg' created gpg: requesting key F6B0FC61 from hkp server keyserver.ubuntu.com gpg: /tmp/tmpoz3eqc57/trustdb.gpg: trustdb created gpg: key F6B0FC61: public key "Launchpad PPA for Native ZFS for Linux" imported gpg: Total number processed: 1 gpg: imported: 1 (RSA: 1) OK

With the new PPA in tow, download/update package lists.

biscuitninja@mundo:~$ sudo apt-get update && sudo apt-get install -y libc6-dev ubuntu-zfs

As I've stopped aptitude from installing recommended software, I have to explicitly include libc6-dev. The install may take a while whilst the kernel modules are compiled. Watch the output carefully for unexpected failures.

That tail end of the output should look something like:

Loading new spl-0.6.5.3 DKMS files... First Installation: checking all kernels... Building only for 3.19.0-25-generic Building initial module for 3.19.0-25-generic Done. spl: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/3.19.0-25-generic/updates/dkms/ splat.ko: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/3.19.0-25-generic/updates/dkms/ Running the post_install script: depmod....... DKMS: install completed. Processing triggers for libc-bin (2.19-0ubuntu6.6) ... Selecting previously unselected package zfs-dkms. (Reading database ... 58704 files and directories currently installed.) Preparing to unpack .../zfs-dkms_0.6.5.3-1~trusty_amd64.deb ... Unpacking zfs-dkms (0.6.5.3-1~trusty) ... Selecting previously unselected package spl. Preparing to unpack .../spl_0.6.5.3-1~trusty_amd64.deb ... Unpacking spl (0.6.5.3-1~trusty) ... Selecting previously unselected package libuutil1. Preparing to unpack .../libuutil1_0.6.5.3-1~trusty_amd64.deb ... Unpacking libuutil1 (0.6.5.3-1~trusty) ... Selecting previously unselected package libnvpair1. Preparing to unpack .../libnvpair1_0.6.5.3-1~trusty_amd64.deb ... Unpacking libnvpair1 (0.6.5.3-1~trusty) ... Selecting previously unselected package libzpool2. Preparing to unpack .../libzpool2_0.6.5.3-1~trusty_amd64.deb ... Unpacking libzpool2 (0.6.5.3-1~trusty) ... Selecting previously unselected package libzfs2. Preparing to unpack .../libzfs2_0.6.5.3-1~trusty_amd64.deb ... Unpacking libzfs2 (0.6.5.3-1~trusty) ... Selecting previously unselected package zfsutils. Preparing to unpack .../zfsutils_0.6.5.3-1~trusty_amd64.deb ... Unpacking zfsutils (0.6.5.3-1~trusty) ... Selecting previously unselected package ubuntu-zfs. Preparing to unpack .../ubuntu-zfs_8~trusty_amd64.deb ... Unpacking ubuntu-zfs (8~trusty) ... Processing triggers for man-db (2.6.7.1-1ubuntu1) ... Processing triggers for initramfs-tools (0.103ubuntu4.2) ... update-initramfs: Generating /boot/initrd.img-3.19.0-25-generic Processing triggers for ureadahead (0.100.0-16) ... Setting up zfs-doc (0.6.5.3-1~trusty) ... Setting up zfs-dkms (0.6.5.3-1~trusty) ... Loading new zfs-0.6.5.3 DKMS files... First Installation: checking all kernels... Building only for 3.19.0-25-generic Building initial module for 3.19.0-25-generic Done. zavl: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/3.19.0-25-generic/updates/dkms/ zcommon.ko: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/3.19.0-25-generic/updates/dkms/ znvpair.ko: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/3.19.0-25-generic/updates/dkms/ zpios.ko: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/3.19.0-25-generic/updates/dkms/ zunicode.ko: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/3.19.0-25-generic/updates/dkms/ zfs.ko: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/3.19.0-25-generic/updates/dkms/ depmod.... DKMS: install completed. Setting up spl (0.6.5.3-1~trusty) ... Setting up libuutil1 (0.6.5.3-1~trusty) ... Setting up libnvpair1 (0.6.5.3-1~trusty) ... Setting up libzpool2 (0.6.5.3-1~trusty) ... Setting up libzfs2 (0.6.5.3-1~trusty) ... Setting up zfsutils (0.6.5.3-1~trusty) ... Processing triggers for initramfs-tools (0.103ubuntu4.2) ... update-initramfs: Generating /boot/initrd.img-3.19.0-25-generic Processing triggers for ureadahead (0.100.0-16) ... Setting up ubuntu-zfs (8~trusty) ... Processing triggers for libc-bin (2.19-0ubuntu6.6) ... biscuitninja@mundo:~$

The zfs kernel module won't as yet be loaded. So let's do it now:

biscuitninja@mundo:~$ sudo modprobe zfs biscuitninja@mundo:~$ lsmod | grep zfs zfs 2785280 0 zunicode 331776 1 zfs zcommon 57344 1 zfs znvpair 90112 2 zfs,zcommon spl 94208 3 zfs,zcommon,znvpair zavl 16384 1 zfs

I'm planning to run two zpools, each consisting of two mirrored drives (vdevs). Incidentally, I'm using 4 x 2TB disks, two of which are HGST, the other two being Seagate. Both sets of disks have been used as RAID pairs so the wear on them will be quite even. To reduce the liklehood of a zpool being wiped out by concurrent failure of the disks (vdevs), I want each zpool to contain one disk of each type.

Having established which disk is which (using lsblk and lshw), it's time to create the zpools.

biscuitninja@mundo:~$ sudo mkdir /zfs biscuitninja@mundo:~$ sudo zpool create -f -O aclinherit=passthrough -O casesensitivity=mixed -O nbmand=on -m /zfs/biz biz mirror /dev/disk/by-id/ata-Hitachi_HDS723020BLA642_MN1220F32027XD /dev/disk/by-id/ata-ST2000DM001-9YN164_W1E21B0T biscuitninja@mundo:~$ sudo zpool create -f -O aclinherit=passthrough -O casesensitivity=mixed -O nbmand=on -m /zfs/bikeshed bikeshed mirror /dev/disk/by-id/ata-Hitachi_HDS723020BLA642_MN1220F320M63D /dev/disk/by-id/ata-ST2000DM001-9YN164_W1E219TZ biscuitninja@mundo:~$ sudo zpool status pool: bikeshed state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM bikeshed ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-Hitachi_HDS723020BLA642_MN1220F320M63D ONLINE 0 0 0 ata-ST2000DM001-9YN164_W1E219TZ ONLINE 0 0 0 errors: No known data errors pool: biz state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM biz ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-Hitachi_HDS723020BLA642_MN1220F32027XD ONLINE 0 0 0 ata-ST2000DM001-9YN164_W1E21B0T ONLINE 0 0 0 errors: No known data errors biscuitninja@mundo:~$

Thus far we've installed ZFS and we've created two separate storage pools. At this point there are a two considerations worthy of note.

In many on-line tutorials you will see folk creating zpools or adding/attaching vdevs (disks) to pools by their device ids (e.g. dev/sdb). This is a bad idea. As hardware is added or removed from a system, these device ids can change. It's better practice to refer to disks by their disk ids rather than device ids.

Secondly, I've used disks with differing physical sector sizes:

biscuitninja@mundo:~$ sudo parted /dev/sdb unit s print Model: ATA Hitachi HDS72302 (scsi) Disk /dev/sdb: 3907029168s Sector size (logical/physical): 512B/*512B* Partition Table: gpt Number Start End Size File system Name Flags 1 2048s 3907012607s 3907010560s zfs 9 3907012608s 3907028991s 16384s biscuitninja@mundo:~$ sudo parted /dev/sdd unit s print Model: ATA ST2000DM001-9YN1 (scsi) Disk /dev/sdd: 3907029168s Sector size (logical/physical): 512B/*4096B* Partition Table: gpt Number Start End Size File system Name Flags 1 2048s 3907012607s 3907010560s zfs 9 3907012608s 3907028991s 16384s

This actually turns out to be okay - the partition alignment on both disks starts at sector 2048 which will work fine with normal and advanced format disks. In short, partition miss-alignment and newer advanced format disks with 4KiB physical sectors can have significant performance implications. ZFS seems to handle this unproblematically and will even align partitions appropriately when mixing different types of disks in the same storage pool. You can read more about the 4KiB sector size issue here.

With our storage pools created, it's time to remote onto the home server (brox) to create and transmit some snapshots.

Create ZFS Backup User and SSH Keys

Before I can securely transmit any ZFS snapshots from brox to mundo, I need to create a user and delegate them ZFS permissions.

biscuitninja@mundo:~$ sudo adduser zfsbackup Adding user `zfsbackup' ... Adding new group `zfsbackup' (1001) ... Adding new user `zfsbackup' (1001) with group `zfsbackup' ... Creating home directory `/home/zfsbackup' ... Copying files from `/etc/skel' ... Enter new UNIX password: Retype new UNIX password: passwd: password updated successfully Changing the user information for zfsbackup Enter the new value, or press ENTER for the default Full Name []: Room Number []: Work Phone []: Home Phone []: Other []: Is the information correct? [Y/n] y

In the next step, we give our new user the permissions necessary to receive a snapshot. Reading the documentation, the following should work...

biscuitninja@mundo:~$ sudo zfs allow zfsbackup receive,mount,create biz biscuitninja@mundo:~$ sudo zfs allow zfsbackup receive,mount,create bikeshed

... however I found this insufficient and resorted instead to visudo

biscuitninja@mundo:~$ sudo visudo Cmnd_Alias ZFS_CMDS = /sbin/zfs receive -Fduv biz, /sbin/zfs receive -Fduv bikeshed, \ /usr/local/bin/zfs_destroy_biz_snapshots.sh, \ /usr/local/bin/zfs_destroy_bikeshed_snapshots.sh, \ /sbin/zpool scrub biz, \ /sbin/zpool scrub bikeshed, \ /sbin/zpool status, \ /sbin/shutdown -P now zfsbackup ALL=(ALL) NOPASSWD: ZFS_CMDS

I've added here commands for destroying old snapshots (shell scripts that will remove snapshots over 60 days old), scrubbing the zpools and of course remotely shutting down the backup server.

Finally, as I'm going to copy the keys across manually, I'll create an authorized_keys file for zfsbackup on mundo and set permissions appropriately.

biscuitninja@mundo:~$ sudo su - zfsbackup zfsbackup@mundo:/home/zfsbackup$ zfsbackup@mundo:~$ mkdir .ssh zfsbackup@mundo:~$ chmod 700 .ssh zfsbackup@mundo:~$ touch .ssh/authorized_keys zfsbackup@mundo:~$ chmod 600 .ssh/authorized_keys Open up the new authorized_keys file for editing. zfsbackup@mundo:~$ vi .ssh/authorized_keys

Then (using another terminal window) on brox we create new public/private key pairs that can be used by root to SSH onto mundo. There's 7 keys in all, zfs receive, scrub and destroy for each pool and then a single key to shut mundo down. As a passphrase will be ommitted, we will swing by and turn these into single use keys later on, hence having so many of them.

biscuitninja@brox:~$ sudo ssh-keygen -t rsa -b 3072 -f /root/.ssh/id_rsa_zfsbackup_biz_mundo.bikeshed.internal -C "zfs backup biz" biscuitninja@brox:~$ sudo ssh-keygen -t rsa -b 3072 -f /root/.ssh/id_rsa_zfsbackup_bikeshed_mundo.bikeshed.internal -C "zfs backup bikeshed" biscuitninja@brox:~$ sudo ssh-keygen -t rsa -b 3072 -f /root/.ssh/id_rsa_zfsdestroy_biz_mundo.bikeshed.internal -C "zfs destroy biz snapshot" biscuitninja@brox:~$ sudo ssh-keygen -t rsa -b 3072 -f /root/.ssh/id_rsa_zfsdestroy_bikeshed_mundo.bikeshed.internal -C "zfs destroy bikeshed snapshot" biscuitninja@brox:~$ sudo ssh-keygen -t rsa -b 3072 -f /root/.ssh/id_rsa_zfsscrub_biz_mundo.bikeshed.internal -C "zfs scrub biz" biscuitninja@brox:~$ sudo ssh-keygen -t rsa -b 3072 -f /root/.ssh/id_rsa_zfsscrub_bikeshed_mundo.bikeshed.internal -C "zfs scrub bikeshed" biscuitninja@brox:~$ sudo ssh-keygen -t rsa -b 3072 -f /root/.ssh/id_rsa_zfsscrubcheck_mundo.bikeshed.internal -C "zfs scrub check" biscuitninja@brox:~$ sudo ssh-keygen -t rsa -b 3072 -f /root/.ssh/id_rsa_shutdown_mundo.bikeshed.internal -C "shutdown mundo"

On brox, grab the first key:

biscuitninja@brox:~$ sudo cat /root/.ssh/id_rsa_zfsbackup_biz_mundo.bikeshed.internal.pub

Copy the key and switch back to the terminal session which is sshed into mundo and paste the first key into the authorized_keys file and save the changes. ( - don't close it). Then on brox, check we can connect with the keys.

biscuitninja@brox:~$ sudo ssh -i /root/.ssh/id_rsa_zfsbackup_biz_mundo.bikeshed.internal zfsbackup@mundo.bikeshed.internal The authenticity of host 'mundo.bikeshed.internal (172.16.1.52)' can't be established. RSA key fingerprint is 6e:f3:53:26:e3:73:95:fe:6d:98:2f:29:78:0f:06:64. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'mundo.bikeshed.internal,172.16.1.52' (RSA) to the list of known hosts. ******************************************************************** * Unauthorized use of this computer system constitutes a criminal * * offense. * * * * Anyone accessing this system expressly consents to the * * monitoring of their activity. * * * * Any suspicious or criminal activity will be reported to law * * enforcement and/or relevant service providers, rendering the * * perpetrators liable to criminal investigations and other * * appropriate sanctions. * ******************************************************************** Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.19.0-25-generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Sun Nov 8 00:26:48 GMT 2015 System load: 0.25 Processes: 177 Usage of /: 0.3% of 458.32GB Users logged in: 0 Memory usage: 2% IP address for eth0: 172.16.1.52 Swap usage: 0% Graph this data and manage this system at: https://landscape.canonical.com/ The programs included with the Ubuntu system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. zfsbackup@mundo:~$ zfsbackup@mundo:~$ exit

Success. Okay, before we continue, we should restrict the usage of the keys we've created

biscuitninja@brox:~$ sudo -i [sudo] password for biscuitninja: root@brox:~# cd /root/.ssh root@brox:~/.ssh# vi id_rsa_zfsbackup_biz_mundo.bikeshed.internal.pub

Insert "command="unpigz | sudo /sbin/zfs receive -Fduv biz",no-agent-forwarding,no-port-forwarding,no-x11-forwarding,no-user-rc " as the first part of the public key. Close the file and save the changes.

The resultant public key should look like:

command="unpigz | sudo /sbin/zfs receive -Fduv biz",no-agent-forwarding,no-port-forwarding,no-x11-forwarding,no-user-rc ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCyBIoc+HkUQ9t/mTiIj8NUc7iQyEfXU7YklRHHyzg7Vf2FEvOtD8220DTJVdICDoqhT0y0Ag+eN4mNIasIXPc7DMIKtlUbbY22k9i03VejYnqS9z46yQOpUJNQxxnq/Y3F2CIMWD58/PMuFOcy+mSPjoB1uYn765TJV7V9KnNon83K15PoFvIV9iyazD35GYYm3dKn7heKhlw7YVR6jhTuO/7lHmnIG7K5Kp85Ob/wyMdtKgvhQ/TTctmshWWn8r2SUd1XkuUojE+QdcXuF7klqzCz5kzXUkERuuvsKVlyKnpK0c0ZmFVQFfca9NcLmwma4AFgPunAE5TPbSetmyBYF4vyXNU4JgXQUxJE7ebZfUbI/UeqPMi9cBv8PJAiJEHzuiXWekSlPfdb121tMFG8zuIQSubqJ5DOy6gtUoQpasNxvG/uV7YBff5Y0q0jT5cNdwd+VCreEK7TVZikTGiraUyfKH9gSMNvfoFu1LmGzRgkbGOIM+vlS47iGiHCxOc= zfs biz backup

I repeated this for the second key, id_rsa_zfsbackup_bikeshed_mundo.bikeshed.internal.pub, substituting bikeshed for biz in the command. Next up are the keys for destroying snapshots:

root@brox:~/.ssh# vi id_rsa_zfsdestroy_biz_mundo.bikeshed.internal.pub

Insert "command="sudo /usr/local/bin/zfs_destroy_biz_snapshots.sh",no-agent-forwarding,no-port-forwarding,no-x11-forwarding,no-user-rc " as the first part of the public key. Close the file and save the changes. Repeat for id_rsa_zfsdestroy_bikeshed_mundo.bikeshed.internal.pub

Next up are the keys for scrubbing the zpools. Insert "command="sudo /sbin/zpool scrub biz",no-agent-forwarding,no-port-forwarding,no-x11-forwarding,no-user-rc " as the first part of id_rsa_zfscrub_biz_mundo.bikeshed.internal.pub. And again for id_rsa_zfsscrub_bikeshed_mundo.bikeshed.internal.pub, making the necessary subsitution of "bikeshed" for "biz" in the command.

Lets not forget the key used to check the status of a zfs scrub on the backup server. We shouldn't shutdown the backup server until all invoked scrubs are complete. Insert "command="sudo /sbin/zpool status",no-agent-forwarding,no-port-forwarding,no-x11-forwarding,no-user-rc " as the first part of id_rsa_zfsscrubcheck_mundo.bikeshed.internal.pub.

Almost finally we edit the key for shutting mundo down once backup activities are complete. Insert "command="sudo /sbin/shutdown -P now",no-agent-forwarding,no-port-forwarding,no-x11-forwarding,no-user-rc " as the first part of public key id_rsa_shutdown_mundo.bikeshed.internal.pub.

Let's copy from brox and paste to mundo's zfsbackup user all of the public keys we've so far created and tweaked:

root@brox:~/.ssh# cat *mundo*.pub

Copy the output and switch to the terminal with the ssh session for zfsbackups@mundo.bikeshed.internal, pasted it into the authorized_keys file. Save your changes and close the file.

Create Our First Snapshot on the Source Server

Okay, now let's snapshot one of the existing storage pools on brox.

biscuitninja@brox:~$ sudo zfs snapshot -r biz@$(date -Iseconds | cut -c1-19) biscuitninja@brox:~$ sudo zfs list -t snapshot NAME USED AVAIL REFER MOUNTPOINT biz@2015-11-08T02:52:10 0 - 152K - biz/dcp@2015-11-08T02:52:10 0 - 954G - biz/it@2015-11-08T02:52:10 0 - 429G -

Before I send our snapshot from brox over to mundo, I want to try limiting the upload bandwidtch using trickle. As you can see, there's a significant amount of data to move and I don't want to leave brox's outbound network connection saturated. I'm also planning to compress the snapshot before sending it to mundo and then decompress it again before applying it, so I'm also installing pigz. pigz is a parallel implementation of gzip for modern multi-processor, multi-core machines.

biscuitninja@brox:~$ sudo apt-get install -y trickle pigz

Now lets send the snapshot. The storage pools are quite chunky, so expect this to take some time.

biscuitninja@brox:~$ zfs send -R biz@2015-11-08T02:52:10 | pigz -4 | trickle -u 10240 ssh -i /root/.ssh/id_rsa_zfsbackup_biz_mundo.bikeshed.internal zfsbackup@mundo.bikeshed.internal "unpigz | sudo /sbin/zfs receive -Fduv biz"

That was successful, so I repeated the process for the second storage pool:

biscuitninja@brox:~$ sudo zfs snapshot -r bikeshed@$(date -Iseconds | cut -c1-19) biscuitninja@brox:~$ sudo zfs list -t snapshot NAME USED AVAIL REFER MOUNTPOINT biz@2015-11-08T02:52:10 0 - 152K - biz/dcp@2015-11-08T02:52:10 0 - 954G - biz/it@2015-11-08T02:52:10 0 - 429G - bikeshed@2015-11-09T14:24:31 0 - 152K - bikeshed/biscuitNinja@2015-11-09T14:24:31 0 - 454G - bikeshed/missBiscuitNinja@2015-11-09T14:24:31 0 - 68G - biscuitninja@brox:~$ zfs send -R bikeshed@2015-11-09T14:24:31 | pigz -4 | trickle -u 10240 ssh -i /root/.ssh/id_rsa_zfsbackup_bikeshed_mundo.bikeshed.internal zfsbackup@mundo.bikeshed.internal "unpigz | sudo /sbin/zfs receive -Fduv bikeshed"

Great stuff. Having taken some time to check the filesystems on mundo are all present and correct, we now need to automate the process! But before I do, you might have noticed the upload restriction I'm passing into trickle is quite low. The backup server is relying on it's onboard NIC, a Marvell 88E8056 PCI-E Gigabit Ethernet Controller. It seems any load on the NIC results in the kernel spamming the syslog:

Nov 11 19:06:00 mundo kernel: [ 5024.045470] net_ratelimit: 35 callbacks suppressed Nov 11 19:06:00 mundo kernel: [ 5024.045480] sky2 0000:05:00.0: error interrupt status=0x40000008 Nov 11 19:06:00 mundo kernel: [ 5024.045506] sky2 0000:05:00.0 eth0: rx error, status 0x7ffc0001 length 996

It seems to be a common complaint running modern versions of linux on these controllers - and the issue is thought to be caused by an actual hardware timing issue. I'll be certain that mundo version 2 gets an Intel NIC :/

Automating the ZFS Backups

The first thing to do in automating the backups from brox to mundo is produce some shell scripts. We will start with the shell scripts to destroy the old snapshots on mundo.

biscuitninja@mundo:~$ sudo vi /usr/local/bin/zfs_destroy_biz_snapshots.sh

Paste in the following:

#!/bin/bash storagePool="biz" oldestSnapshotToKeep=$(date --date="61 days ago" +"%Y%m%d%H%M%S") for snapshot in $(zfs list -t snapshot -o name -s name | grep ^${storagePool}@) ; do snapshotDate=$(echo $snapshot | grep -P -o '2\d{3}-[0-3]\d-[0-3]\dT[012]\d:[0-5]\d:[0-5]\d') if [ $(date --date="$snapshotDate" +"%Y%m%d%H%M%S") -lt $oldestSnapshotToKeep ] ; then /sbin/zfs destroy -R ${snapshot} &>/dev/null || exit 110 fi done

Copy and amend the file:

biscuitninja@mundo:~$ sudo cp /usr/local/bin/zfs_destroy_biz_snapshots.sh /usr/local/bin/zfs_destroy_bikeshed_snapshots.sh biscuitninja@mundo:~$ sudo vi /usr/local/bin/zfs_destroy_bikeshed_snapshots.sh

Amend 'storagePool="biz"' to 'storagePool="bikeshed"', then save the changes and close the file. Then make both scripts executable:

biscuitninja@mundo:~$ sudo chmod 700 /usr/local/bin/zfs_destroy_bi*_snapshots.sh

We're done scripting on mundo. Lets start working scripts to take snapshots on brox and send them to mundo.

biscuitninja@brox:~$ sudo vi /usr/local/bin/zfs_biz_backup.sh

Paste in the following:

#!/bin/bash # Exit codes: # 110 remoteDestination is not online # 120 failed to take snapshot # 130 failed to send snapshot # 140 failed to destroy snapshot storagePool="biz" remoteDestination=mundo.bikeshed.internal remoteUser=zfsbackup identifyFile=/root/.ssh/id_rsa_zfsbackup_biz_mundo.bikeshed.internal for ((i=1;i<=20;i++)) ; do nc -z $remoteDestination 22 &> /dev/null && remoteDestinationOnline=true [ $remoteDestinationOnline ] && break sleep 30 done if ! [ $remoteDestinationOnline ] ; then exit 110 fi lastSnapshot=$(/sbin/zfs list -t snapshot -o name -s name | grep ^${storagePool}@ | sort | tail -1) /sbin/zfs snapshot -r ${storagePool}@$(date -Iseconds | cut -c1-19) &>/dev/null || exit 120 newSnapshot=$(/sbin/zfs list -t snapshot -o name -s name | grep ^${storagePool}@ | sort | tail -1) /sbin/zfs send -R -i ${lastSnapshot} ${newSnapshot} | pigz -4 | trickle -u 10240 ssh -q -i ${identifyFile} ${remoteUser}@${remoteDestination} "unpigz | sudo /sbin/zfs receive -Fduv ${storagePool}" &>/dev/null || exit 130 oldestSnapshotToKeep=$(date --date="31 days ago" +"%Y%m%d%H%M%S") for snapshot in $(zfs list -t snapshot -o name -s name | grep ^${storagePool}@) ; do snapshotDate=$(echo $snapshot | grep -P -o '2\d{3}-[0-3]\d-[0-3]\dT[012]\d:[0-5]\d:[0-5]\d') if [ $(date --date="$snapshotDate" +"%Y%m%d%H%M%S") -lt $oldestSnapshotToKeep ] ; then /sbin/zfs destroy -R ${snapshot} &>/dev/null || exit 140 fi done

Save and close the file. I created a second copy of the above script with the values changed for the second storage pool, and then made them both executable:

biscuitninja@brox:~# sudo chmod 750 /usr/local/bin/zfs*backup.sh

Then I created a shell script to call both storage pool backup scripts. I want to ensure that the cron job backups up both storage pools in sequence, not concurrently. Additionally I will later add some logic to the shell script to remove snapshots over thirty days old from mundo and once a month scrub each storage pool.

biscuitninja@brox:~# sudo vi /usr/local/bin/zfs_backups.sh #!/bin/bash function handleError { echo "$1 failed with exit status $2" 1>&2 exit $3 } /usr/local/bin/zfs_biz_backup.sh exitStatus=$? if [ $exitStatus -ne 0 ] ; then handleError "zfs_biz_backup" $exitStatus 110 fi /usr/local/bin/zfs_bikeshed_backup.sh exitStatus=$? if [ $exitStatus -ne 0 ] ; then handleError "zfs_bikeshed_backup" $exitStatus 120 fi ssh -q -i /root/.ssh/id_rsa_zfsdestroy_biz_mundo.bikeshed.internal zfsbackup@mundo.bikeshed.internal "sudo /usr/local/bin/zfs_destroy_biz_snapshots.sh" exitStatus=$? if [ $exitStatus -ne 0 ] ; then handleError "ssh -q -i /root/.ssh/id_rsa_zfsdestroy_biz_mundo.bikeshed.internal zfsbackup@mundo.bikeshed.internal sudo /usr/local/bin/zfs_destroy_biz_snapshots.sh" $exitStatus 130 fi ssh -q -i /root/.ssh/id_rsa_zfsdestroy_bikeshed_mundo.bikeshed.internal zfsbackup@mundo.bikeshed.internal "sudo /usr/local/bin/zfs_destroy_bikeshed_snapshots.sh" exitStatus=$? if [ $exitStatus -ne 0 ] ; then handleError "ssh -q -i /root/.ssh/id_rsa_zfsdestroy_bikeshed_mundo.bikeshed.internal zfsbackup@mundo.bikeshed.internal sudo /usr/local/bin/zfs_destroy_bikeshed_snapshots.sh" $exitStatus 140 fi if [ $(date +%d) -eq 5 ] ; then ssh -q -i /root/.ssh/id_rsa_zfsscrub_biz_mundo.bikeshed.internal zfsbackup@mundo.bikeshed.internal "sudo /sbin/zpool scrub biz" if [ $exitStatus -ne 0 ] ; then handleError "ssh -q -i /root/.ssh/id_rsa_zfsscrub_biz_mundo.bikeshed.internal zfsbackup@mundo.bikeshed.internal sudo /sbin/zpool scrub biz" $exitStatus 150 fi # Check biz scrub has started, if not throw exception sleep 30s ssh -q -i /root/.ssh/id_rsa_zfsscrubcheck_mundo.bikeshed.internal zfsbackup@mundo.bikeshed.internal "sudo /sbin/zpool status" | grep -q "scrub in progress" if [ $exitStatus -ne 0 ] ; then handleError "ssh -q -i /root/.ssh/id_rsa_zfsscrubcheck_mundo.bikeshed.internal zfsbackup@mundo.bikeshed.internal sudo /sbin/zpool status" $exitStatus 153 fi # Wait for biz scrub to complete... (it's a background process) sleep 10m while $(ssh -q -i /root/.ssh/id_rsa_zfsscrubcheck_mundo.bikeshed.internal zfsbackup@mundo.bikeshed.internal "sudo /sbin/zpool status" | grep -q "scrub in progress") do sleep 10m done ssh -q -i /root/.ssh/id_rsa_zfsscrub_bikeshed_mundo.bikeshed.internal zfsbackup@mundo.bikeshed.internal "sudo /sbin/zpool scrub bikeshed" if [ $exitStatus -ne 0 ] ; then handleError "ssh -q -i /root/.ssh/id_rsa_zfsscrub_bikeshed_mundo.bikeshed.internal zfsbackup@mundo.bikeshed.internal sudo /sbin/zpool scrub bikeshed" $exitStatus 160 fi # Check bikeshed scrub has started, if not throw exception sleep 30s ssh -q -i /root/.ssh/id_rsa_zfsscrubcheck_mundo.bikeshed.internal zfsbackup@mundo.bikeshed.internal "sudo /sbin/zpool status" | grep -q "scrub in progress" if [ $exitStatus -ne 0 ] ; then handleError "ssh -q -i /root/.ssh/id_rsa_zfsscrubcheck_mundo.bikeshed.internal zfsbackup@mundo.bikeshed.internal sudo /sbin/zpool status" $exitStatus 153 fi # Wait for bikeshed scrub to complete... (it's a background process) sleep 10m while $(ssh -q -i /root/.ssh/id_rsa_zfsscrubcheck_mundo.bikeshed.internal zfsbackup@mundo.bikeshed.internal "sudo /sbin/zpool status" | grep -q "scrub in progress") do sleep 10m done fi ssh -q -i /root/.ssh/id_rsa_shutdown_mundo.bikeshed.internal zfsbackup@mundo.bikeshed.internal "sudo /sbin/shutdown -P now" exitStatus=$? if [ $exitStatus -ne 0 ] ; then handleError "ssh -q -i /root/.ssh/id_rsa_shutdown_mundo.bikeshed.internal zfsbackup@mundo.bikeshed.internal sudo /sbin/shutdown -P now" $exitStatus 170 fi exit 0

Lets make our new script executable:

biscuitninja@brox:~# sudo chmod 750 /usr/local/bin/zfs_backups.sh

And give it a whirl:

biscuitninja@brox:~# sudo /usr/local/bin/zfs_backups.sh

This fortunately (after some time) returns an exit status of 0 indicating success. Running "sudo zpool list -t snapshot" on both brox and mundo concurs. Lets edit the crontab and schedule the backup script.

biscuitninja@brox:~# sudo vi /etc/crontab # /etc/crontab: system-wide crontab # Unlike any other crontab you don't have to run the `crontab' # command to install the new version when you edit this file # and files in /etc/cron.d. These files also have username fields, # that none of the other crontabs do. SHELL=/bin/sh PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin # m h dom mon dow user command 17 * * * * root cd / && run-parts --report /etc/cron.hourly 05 0 * * * root /usr/local/bin/zfs_backups.sh 25 4 * * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) 47 5 * * 7 root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly ) 52 6 1 * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly ) 00 7 * * 0 root /sbin/zpool scrub biz 00 7 * * 1 root /sbin/zpool scrub bikeshed 00 7 * * 2 root /sbin/zpool scrub media

And that's it. It's key to stay vigilent and periodically check everything is working, especially in the early days. brox is configured to send an email if any of the crontab scripts fail. I will be working on a server monitoring project in the near future, and an aspect of that will be checking the status of the filesystems as well as the general health on each server. I run a firewall, a few VPS servers and some other appliances too, so I hope to produce something that's inspired by Nagios but lighter weight. I'm not overly keen to reinvent the wheel, but the learning curve should help me elicit some new skills.

Other considerations:

As already mentioned, some sort of failover of DNS/DHCP

Some sort of file-level backup of the boot disk on brox - this is not yet catered for

There are a number of daily/weekly/monthly cron jobs included in an install of ubuntu server. The scheduled start-up and shut-down of mundo means most of these will go unexecuted in the normal course of events. I need to take a look at them and determine whether they are important enough to schedule in an alternative manner

Finally hardware. This experience has already highlighted some issues with the hardware used for the backup server. I've already ordered another Asus P8B-M to replace the mainboard. As this is the same board that's already in the home server, that should prove to be much more reliable. I've sourced a CPU, a Sandy-Bridge Celeron that should be up to the task. All that remains is to procure DDR3(/DDR3L) EEC memory and a CPU cooler.

I've hope you've enjoyed this rather lengthy journey and garnered something useful out of it. I've no doubt my implementation could be improved, your feedback and comments are more than welcome.