In an earlier post, I addressed the never-ending urban legend that ZFS writes data to the lowest-latency vdev. Now the urban legend that never dies has reared its head again; this time with someone claiming that ZFS will issue read operations to the lowest-latency disk in a given mirror vdev.

TL;DR – this, too, is a myth. If you need or want an empirical demonstration, read on.

I’ve got an Ubuntu Bionic machine handy with both rust and SSD available; /tmp is an ext4 filesystem on an mdraid1 SSD mirror and /rust is an ext4 filesystem on a single WD 4TB black disk. Let’s play.

root@box:~# truncate -s 4G /tmp/ssd.bin root@box:~# truncate -s 4G /rust/rust.bin root@box:~# mkdir /tmp/disks root@box:~# ln -s /tmp/ssd.bin /tmp/disks/ssd.bin ; ln -s /rust/rust.bin /tmp/disks/rust.bin root@box:~# zpool create -oashift=12 test /tmp/disks/rust.bin root@box:~# zfs set compression=off test

Now we’ve got a pool that is rust only… but we’ve got an ssd vdev off to the side, ready to attach. Let’s run an fio test on our rust-only pool first. Note: since this is read testing, we’re going to throw away our first result set; they’ll largely be served from ARC and that’s not what we’re trying to do here.

root@box:~# cd /test root@box:/test# fio --name=read --ioengine=sync --rw=randread --bs=16K --size=1G --numjobs=1 --end_fsync=1

OK, cool. Now that fio has generated its dataset, we’ll clear all caches by exporting the pool, then clearing the kernel page cache, then importing the pool again.

root@box:/test# cd ~ root@box:~# zpool export test root@box:~# echo 3 > /proc/sys/vm/drop_caches root@box:~# zpool import -d /tmp/disks test root@box:~# cd /test

Now we can get our first real, uncached read from our rust-only pool. It’s not terribly pretty; this is going to take 5 minutes or so.

root@box:/test# fio --name=read --ioengine=sync --rw=randread --bs=16K --size=1G --numjobs=1 --end_fsync=1 [ ... ] Run status group 0 (all jobs): READ: bw=17.6MiB/s (18.5MB/s), 17.6MiB/s-17.6MiB/s (18.5MB/s-18.5MB/s), io=1024MiB (1074MB), run=58029-58029msec

Alright. Now let’s attach our ssd and make this a mirror vdev, with one rust and one SSD disk.

root@box:/test# zpool attach test /tmp/disks/rust.bin /tmp/disks/ssd.bin root@box:/test# zpool status test pool: test state: ONLINE scan: resilvered 1.00G in 0h0m with 0 errors on Sat Jul 14 14:34:07 2018 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 /tmp/disks/rust.bin ONLINE 0 0 0 /tmp/disks/ssd.bin ONLINE 0 0 0 errors: No known data errors

Cool. Now that we have one rust and one SSD device in a mirror vdev, let’s export the pool, drop all the kernel page cache, and reimport the pool again.

root@box:/test# cd ~ root@box:~# zpool export test root@box:~# echo 3 > /proc/sys/vm/drop_caches root@box:~# zpool import -d /tmp/disks test root@box:~# cd /test

Gravy. Now, do we see massively improved throughput when we run the same fio test? If ZFS favors the SSD, we should see enormously improved results. If ZFS does not favor the SSD, we’ll not-quite-doubled results.

root@box:/test# fio --name=read --ioengine=sync --rw=randread --bs=16K --size=1G --numjobs=1 --end_fsync=1 [...] Run status group 0 (all jobs): READ: bw=31.1MiB/s (32.6MB/s), 31.1MiB/s-31.1MiB/s (32.6MB/s-32.6MB/s), io=1024MiB (1074MB), run=32977-32977msec

Welp. There you have it. Not-quite-doubled throughput, matching half – but only half – of the read ops coming from the SSD. To confirm, we’ll do this one more time; but this time we’ll detach the rust disk and run fio with nothing in the pool but the SSD.

root@box:/test# cd ~ root@box:~# zpool detach test /tmp/disks/rust.bin root@box:~# zpool export test root@box:~# zpool import -d /tmp/disks test root@box:~# cd /test

Moment of truth… this time, fio runs on pure solid state:

root@box:/test# fio --name=read --ioengine=sync --rw=randread --bs=16K --size=1G --numjobs=1 --end_fsync=1 [...] Run status group 0 (all jobs): READ: bw=153MiB/s (160MB/s), 153MiB/s-153MiB/s (160MB/s-160MB/s), io=1024MiB (1074MB), run=6710-6710msec

Welp, there you have it.

Rust only: reads 18.5 MB/sec

SSD only: reads 160 MB/sec

Rust + SSD: reads 32.6 MB/sec

No, ZFS does not read from the lowest-latency disk in a mirror vdev.

Please don’t perpetuate the myth that ZFS favors lower latency devices.