By now you may have guessed the punchline of my sudden interest in ZFS delete queues: we had a problem with ZFS leaking space for deleted files that was ultimately traced down to an issue with pending deletes that our fileserver wasn't cleaning up when it should have been.

As a well-debugged filesystem, ZFS should not outright leak pending deletions, where there are no remaining references anywhere yet the files haven't been cleaned up (well, more or less; snapshots come into the picture, as mentioned). However it's possible for both user-level and kernel-level things to hold references to now-deleted files in the traditional way and thus keep them from being actually removed. User-level things holding open files should be visible in, eg, fuser , and anyways this is a well-known issue that savvy people will immediately ask you about. Kernel level things may be less visible, and there is at least one in mainline Illumos and thus OmniOS r151014 (the current release as I write this entry).

Per George Wilson on the illumos-zfs mailing list here, Delphix found that the network lock manager (the nlockmgr SMF service) could hold references to (deleted) files under some circumstances (see the comment in their fix). Under the right circumstances this can cause significant space lossage over time; we saw loss rates of 5 GB a week. This is worked around by restarting nlockmgr ; this restart drops the old references and thus allows ZFS to actually remove the files and free up potentially significant amounts of your disk space. Rebooting the whole server will do it too, for obvious reasons, but is somewhat less graceful.

(Restarting nlockmgr is said to be fully transparent to clients, but we have not attempted to test that. When we did our nlockmgr restart we did as much as possible to make any locking failures a non-issue.)