The btrfs wiki on multiple devices is a useful resource, but can be frustratingly vague. In this post I will document my experience in expanding a 2-device raid1 array into a 3-device raid1 array.

Initial configuration

The two devices are 6TB HDDs (one Seagate, one WD). As the array had approached 90% full, I wanted to add more capacity, but I didn’t fancy spending any extra money. So I wondered if I could reuse an old 3TB WD drive that I had spare, leftover from a previous 2x3TB array before I upgraded to the current 2x6TB array.

The answer is yes: you can add one extra device to a btrfs raid1 array. You don’t need to add two at a time. It doesn’t even need to be the same size as other devices in the array. btrfs can do this because “raid1” isn’t the same as “RAID1”. In btrfs raid1, devices are pooled and then data chunks are written such that all data is stored twice, on exactly 2 different devices. In other words, it’s not a mirror. You could have a 17 device array and btrfs would still only store 2 copies of your data, and would still only protect against failure of a single device.

Adding another device

My btrfs filesystem was mounted at /j . Here’s the command I used to add the 3TB drive to the existing filesystem:

btrfs device add -f /dev/sdb /j

I used the -f (force) flag because the disk already had a partition table and filesystem on it (which I didn’t mind overwriting).

The command returned almost instantly. Here’s what the filesystem looked like at that point:

# btrfs fi sh /j

[sudo] password for james:

Label: none uuid: f8e0f12f-f106–4c9a-8718–8c8c79048cd5

Total devices 3 FS bytes used 4.83TiB

devid 1 size 5.46TiB used 4.90TiB path /dev/sda

devid 2 size 5.46TiB used 4.90TiB path /dev/sde

devid 3 size 2.73TiB used 0.00B path /dev/sdb $ sudo btrfs fi usage /j

Overall:

Device size: 13.64TiB

Device allocated: 9.80TiB

Device unallocated: 3.85TiB

Device missing: 0.00B

Used: 9.66TiB

Free (estimated): 1.99TiB (min: 1.99TiB)

Data ratio: 2.00

Metadata ratio: 2.00

Global reserve: 512.00MiB (used: 0.00B) Data,RAID1: Size:4.89TiB, Used:4.82TiB

/dev/sda 4.89TiB

/dev/sde 4.89TiB Metadata,RAID1: Size:8.00GiB, Used:6.43GiB

/dev/sda 8.00GiB

/dev/sde 8.00GiB System,RAID1: Size:8.00MiB, Used:720.00KiB

/dev/sda 8.00MiB

/dev/sde 8.00MiB Unallocated:

/dev/sda 572.02GiB

/dev/sdb 2.73TiB

/dev/sde 572.02GiB

The problem now is that the filesystem is unbalanced. Why is this a problem? btrfs will allocate new chunks of data starting with the device with the most free space first. In raid1, each chunk needs to be written to two devices, so in this case it would write to sdb first, then one of either sda or sde . Not a problem, until we’ve written about 1TiB of new data. At that point, both sda and sde will be full, and sdb will have 1.73TiB of space left unused — and unusable, because at this point there is no way to write data to two different devices.

The solution is a rebalance. This will move chunks around so that all devices are approximately equally full. That will let me use the full capacity of each drive.

Unfortunately, I’ve read on the Gotchas page of the wiki that rebalances can be very slow if you have many snapshots. I had hundreds of snapshots, because I set my machine to create a snapshot whenever it shuts down. There is a patch to mitigate the slowness but it’s only available in kernel 4.10 and later, and that’s not what I’m using on Ubuntu 16.04. Plus, the patch doesn’t solve the problem - it merely reduces it. The advice remains “try to avoid having more than 8 snapshots”.

I decided to alter my snapshot retention policy down from “keep every snapshot forever”, to the more restrained “keep daily snapshots for a month, then monthly snapshots for a year”.

This got me down to 32 snapshots. Good enough. Here’s the command to initiate a rebalance:

# date && btrfs fi balance /j && date

I stuck the date commands before and after so that it would print the date and time of starting and finishing. As I expected it to run for a very long time, I was unlikely to be around to notice the point of it finishing.

It took just over 34 hours to complete, at an overall rate of about 40MB/sec (if you imagine the process working through the 4.90TiB of data linearly, which isn’t quite what it’s doing). By comparison, a send/receive to an external USB3 hard drive runs at about 60MB/sec. My CPU is a 2009-vintage Intel Core i7 860, but top doesn’t suggest that the operation is CPU bound (the btrfs process is using about 5% of one core). It’s not a like-for-like comparison as the rebalance is doing a lot more work (and re-work, thanks to all the snapshots) than the send/receive.

I carried on using the system normally during the rebalance, but avoiding any heavy use of the btrfs filesystem.

The disks ran very hot. This exercise will be one of the most prolonged I/O intensive tasks you can throw at a disk.

Final configuration

This is the state of the filesystem after the balance:

# btrfs fi sh /j

Label: none uuid: f8e0f12f-f106–4c9a-8718–8c8c79048cd5

Total devices 3 FS bytes used 4.83TiB

devid 1 size 5.46TiB used 4.13TiB path /dev/sda

devid 2 size 5.46TiB used 4.13TiB path /dev/sde

devid 3 size 2.73TiB used 1.40TiB path /dev/sdb

# btrfs fi usage /j

Overall:

Device size: 13.64TiB

Device allocated: 9.66TiB

Device unallocated: 3.99TiB

Device missing: 0.00B

Used: 9.65TiB

Free (estimated): 1.99TiB (min: 1.99TiB)

Data ratio: 2.00

Metadata ratio: 2.00

Global reserve: 512.00MiB (used: 0.00B)

Data,RAID1: Size:4.82TiB, Used:4.82TiB

/dev/sda 4.12TiB

/dev/sdb 1.40TiB

/dev/sde 4.12TiB

Metadata,RAID1: Size:7.00GiB, Used:6.30GiB

/dev/sda 7.00GiB

/dev/sdb 1.00GiB

/dev/sde 6.00GiB

System,RAID1: Size:32.00MiB, Used:704.00KiB

/dev/sda 32.00MiB

/dev/sde 32.00MiB

Unallocated:

/dev/sda 1.33TiB

/dev/sdb 1.33TiB

/dev/sde 1.33TiB

# btrfs fi df /j

Data, RAID1: total=4.82TiB, used=4.82TiB

System, RAID1: total=32.00MiB, used=704.00KiB

Metadata, RAID1: total=7.00GiB, used=6.30GiB

GlobalReserve, single: total=512.00MiB, used=0.00B

You can see that all three drives now have exactly 1.33TiB unallocated space. New data will start to occupy that unallocated space evenly, meaning the full capacity can now be used — success!

I also seem to have grown the amount of total unallocated space by about 140GB. I’m unsure exactly why this has happened but I suspect the rebalancing has also defragmented the filesystem and allowed obsolete allocations for now-deleted snapshots to be recovered.

Conclusions

I see a lot of people complaining about btrfs, but aside from the relatively poor performance, I’ve been a happy user for more than 5 years. I’ve never lost any data, I’ve found the snapshot feature incredibly useful on so many occasions, the send/receive feature makes backups so easy, and this post hopefully illustrates the remarkable flexibility it offers when using multiple devices.