I wanted to replace a disk in my zpool by issuing the following command:

zpool replace -o ashift=12 pool /dev/mapper/transport /dev/mapper/data2

ZFS got to work and resilvered the pool. In the process, there were some read errors on the old disk, and after it finished, zpool status -v looked like this:

pool: pool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://zfsonlinux.org/msg/ZFS-8000-8A scan: resilvered 6,30T in 147h38m with 6929 errors on Sat Feb 11 13:31:05 2017 config: NAME STATE READ WRITE CKSUM pool ONLINE 0 0 16,0K raidz1-0 ONLINE 0 0 32,0K data1 ONLINE 0 0 0 replacing-1 ONLINE 0 0 0 transport ONLINE 14,5K 0 0 data2 ONLINE 0 0 0 data3 ONLINE 0 0 0 logs data-slog ONLINE 0 0 0 cache data-cache ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: <list of 3 files>

I expected the old disk to be detached from the pool, but it wasn't. I tried to detach it manually:

# zpool detach pool /dev/mapper/transport cannot detach /dev/mapper/transport: no valid replicas

But when I exported the pool, removed the old drive, and imported the pool again, it seems to work flawlessly: It started resilvering again, but it is DEGRADED, not FAILED:

pool: pool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sat Feb 11 17:28:50 2017 42,7G scanned out of 9,94T at 104M/s, 27h43m to go 1,68G resilvered, 0,42% done config: NAME STATE READ WRITE CKSUM pool DEGRADED 0 0 9 raidz1-0 DEGRADED 0 0 18 data1 ONLINE 0 0 0 replacing-1 DEGRADED 0 0 0 15119075650261564517 UNAVAIL 0 0 0 was /dev/mapper/transport data2 ONLINE 0 0 0 (resilvering) data3 ONLINE 0 0 0 (resilvering) logs data-slog ONLINE 0 0 0 cache data-cache ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: <list of 3 files>

Still, although it is clearly not necessary for full functionality of the pool, I cannot detach the old drive:

# zpool offline pool 15119075650261564517 cannot offline 15119075650261564517: no valid replicas

What is going on?

Update: Apparently, ZoL hadn't given up on the failing devices just yet. Replacing the 3 files with permanent errors (one of which was a zvol, meaning I had to create another one, dd conv=noerror over the contents and destroy the old one) and letting the resilver finish finally removed the old drive.

I'd still be interested in what ZoL was thinking. I mean, everything that didn't cause read- or checksum-errors was copied over to the new device, and it had already marked the sectors that caused errors as permanently failed. So why hang on to the old device that ZoL clearly didn't intend to get any information from anymore?