If you find this useful, please consider sharing it on social media to help spread the word!

Let’s lay out an example scenario - say we have a mirrored (RAID1) array, and I just got an Email alert from smartmontools telling me that a drive /dev/sdf is failing in my ZFS RAID 10 array.

Before I configure an array, I like to make sure all drive bays are labelled with the corresponding drive's serial number, that makes this process much easier! I have a DYMO LetraTag LT-100H label maker that does the job just fine. Available on Amazon.

Notes

If you have an extra drive bay available, refrain from removing the old drive until after the resilver is complete. ZFS is completely capable of replacing a disk with an unformatted one, there are some scenarios that require manual formatting (e.g. an array who’s disks contain multiple partitions), but you should otherwise be fine skipping steps 3 and 4. If your system does require manual formatting, there may be other steps needed after the resilver is complete (e.g. re-installing the bootloader with grub-install /dev/sdX for example), you should also avoid rebooting the system while ZFS is resilvering in this case.

1) Gather Information

The first thing we need to do is collect some information we will want handy during the process. I highly recommend opening up a text editor on your workstation and dropping the information there while you work.

GUID, Pool Name, and a Similarly Partitioned Disk

Use the command zdb to list out some data for all pools, you can see from the output below the my failing drive has a GUID of 4024410420552873090 , lives in the pool raid10 , and has an adjacent drive /dev/sde which will have the same partition table.

root@zfs-lab raid10: version: 5000 name: 'raid10' state: 0 txg: 1367134 pool_guid: 13977946214682563558 errata: 0 hostid: 3182994292 hostname: 'zfs-lab' com.delphix:has_per_vdev_zaps vdev_children: 2 vdev_tree: type: 'root' id: 0 guid: 13977946214682563558 create_txg: 4 children [ 0 ] : type: 'mirror' id: 0 guid: 946224559609474074 metaslab_array: 265 metaslab_shift: 34 ashift: 12 asize: 2000384688128 is_log: 0 create_txg: 4 com.delphix:vdev_zap_top: 129 children [ 0 ] : type: 'disk' id: 0 guid: 1247532717286800833 path: '/dev/sde1' devid: 'ata-WDC_WD2003FYYS-02W0B1_WD-WMAY05058763-part1' phys_path: 'pci-0000:01:00.1-ata-5' whole_disk: 1 DTL: 428 create_txg: 4 com.delphix:vdev_zap_leaf: 130 children [ 1 ] : type: 'disk' id: 1 guid: 4024410420552873090 path: '/dev/sdf1' devid: 'ata-Hitachi_HDS722020ALA330_JK1101B9GKY6ET-part1' phys_path: 'pci-0000:01:00.1-ata-6' whole_disk: 1 DTL: 427 create_txg: 4 com.delphix:vdev_zap_leaf: 131 children [ 1 ] : type: 'mirror' id: 1 guid: 8353599129995598725 metaslab_array: 256 metaslab_shift: 34 ashift: 12 asize: 2000384688128 is_log: 0 create_txg: 4 com.delphix:vdev_zap_top: 132 children [ 0 ] : type: 'disk' id: 0 guid: 5161684360393329728 path: '/dev/sdg1' devid: 'ata-WDC_WD2003FYYS-02W0B1_WD-WCAY00294631-part1' phys_path: 'pci-0000:01:00.1-ata-7' whole_disk: 1 DTL: 426 create_txg: 4 com.delphix:vdev_zap_leaf: 133 children [ 1 ] : type: 'disk' id: 1 guid: 12714037787224569367 path: '/dev/sdh1' devid: 'ata-Hitachi_HDS722020ALA330_JK11A1YAJGN9DV-part1' phys_path: 'pci-0000:01:00.1-ata-8' whole_disk: 1 DTL: 425 create_txg: 4 com.delphix:vdev_zap_leaf: 134 features_for_read: com.delphix:hole_birth com.delphix:embedded_data

Serial Number

If you don’t already have it, the serial number of the failed drive is easily attained by running the following command. The smartctl command is supplied with the package smartmontools .

root@zfs-lab Serial Number: JK1101B9GKY6ET

Installed Disks

Just to make sure we don’t wipe out the wrong disk, let’s get a list of what we have installed. You can see from the output below that I have disks /dev/sda through /dev/sdh installed.

root@zfs-lab sda 8 :0 0 1 .8T 0 disk ├─sda1 8 :1 0 1 .8T 0 part └─sda9 8 :9 0 8M 0 part sdb 8 :16 0 465 .8G 0 disk ├─sdb1 8 :17 0 1007K 0 part ├─sdb2 8 :18 0 512M 0 part └─sdb3 8 :19 0 465 .3G 0 part sdc 8 :32 0 1 .8T 0 disk ├─sdc1 8 :33 0 1 .8T 0 part └─sdc9 8 :41 0 8M 0 part sdd 8 :48 0 465 .8G 0 disk ├─sdd1 8 :49 0 1007K 0 part ├─sdd2 8 :50 0 512M 0 part └─sdd3 8 :51 0 465 .3G 0 part sde 8 :64 0 1 .8T 0 disk ├─sde1 8 :65 0 1 .8T 0 part └─sde9 8 :73 0 8M 0 part sdf 8 :80 0 1 .8T 0 disk ├─sdf1 8 :81 0 1 .8T 0 part └─sdf9 8 :89 0 8M 0 part sdg 8 :96 0 1 .8T 0 disk ├─sdg1 8 :97 0 1 .8T 0 part └─sdg9 8 :105 0 8M 0 part sdh 8 :112 0 1 .8T 0 disk ├─sdh1 8 :113 0 1 .8T 0 part └─sdh9 8 :121 0 8M 0 part

2) Remove the Failing Disk

Now that we have all the information we need let’s get rid of the failing disk, first we’ll remove it from the ZFS pool.

Note: If this command fails, which may happen if the drive has completely died, use the disks GUID instead: zpool offline raid10 4024410420552873090

root@zfs-lab

We should check that it’s been removed before moving on.

root@zfs-lab pool: raid10 state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace' . see: http://zfsonlinux.org/msg/ZFS-8000-9P scan: scrub repaired 0B in 0 days 04:42:17 with 0 errors on Sun Nov 10 05:06:22 2019 config: NAME STATE READ WRITE CKSUM raid10 DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 sde ONLINE 0 0 0 sdf OFFLINE 2 0 0 mirror-1 ONLINE 0 0 0 sdg ONLINE 0 0 0 sdh ONLINE 0 0 0 errors: No known data errors

Now let’s remove the disk from the SCSI subsystem to ensure its disconnected cleanly.

root@zfs-lab

3) Format the New Disk

After physically replacing the disk, we’ll need to copy the partition table from the similar disk we found earlier, to find the new disk we’ll use the command lsblk | grep sd again, it’s usually the same as before, in my case /dev/sdf .

root@zfs-lab sda 8 :0 0 1 .8T 0 disk ├─sda1 8 :1 0 1 .8T 0 part └─sda9 8 :9 0 8M 0 part sdb 8 :16 0 465 .8G 0 disk ├─sdb1 8 :17 0 1007K 0 part ├─sdb2 8 :18 0 512M 0 part └─sdb3 8 :19 0 465 .3G 0 part sdc 8 :32 0 1 .8T 0 disk ├─sdc1 8 :33 0 1 .8T 0 part └─sdc9 8 :41 0 8M 0 part sdd 8 :48 0 465 .8G 0 disk ├─sdd1 8 :49 0 1007K 0 part ├─sdd2 8 :50 0 512M 0 part └─sdd3 8 :51 0 465 .3G 0 part sde 8 :64 0 1 .8T 0 disk ├─sde1 8 :65 0 1 .8T 0 part └─sde9 8 :73 0 8M 0 part sdf 8 :80 0 2 .7T 0 disk sdg 8 :96 0 1 .8T 0 disk ├─sdg1 8 :97 0 1 .8T 0 part └─sdg9 8 :105 0 8M 0 part sdh 8 :112 0 1 .8T 0 disk ├─sdh1 8 :113 0 1 .8T 0 part └─sdh9 8 :121 0 8M 0 part

IMPORTANT: The syntax of this command is counter-intuitive in my opinion, read these steps carefully, getting the source and target backwards here may hose your data!

The command we are going to use is sgdisk --replicate=/dev/TARGET /dev/SOURCE , where TARGET is the new blank disk, and SOURCE is the live disk with a similar partition table.

root@zfs-lab The operation has completed successfully.

4) Randomize GUID

To prevent some really bad potential mix-ups by ZFS, each disk should have a unique GUID, we’ll need to address that since we cloned the partition table from another disk.

root@zfs-lab The operation has completed successfully.

5) Add new Disk to ZFS Pool

Use the zpool replace command to add the new drive into the pool.

root@zfs-lab

Check to make sure it has been added successfully.

root@zfs-lab pool: raid10 state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sat Nov 16 14 :28:55 2019 373G scanned at 3 .01G/s, 186G issued at 1 .50G/s, 1 .12T total 1 .19G resilvered, 16.29 % done, 0 days 00:10:37 to go config: NAME STATE READ WRITE CKSUM raid10 DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 sde ONLINE 0 0 0 replacing-1 DEGRADED 0 0 0 old OFFLINE 2 0 0 sdf ONLINE 0 0 0 ( resilvering ) mirror-1 ONLINE 0 0 0 sdg ONLINE 0 0 0 sdh ONLINE 0 0 0 errors: No known data errors

Now we wait! You can keep an eye on the status with the command watch zpool status raid10 -v , my system took about three hours to finish resilvering 186 GiB of data.