For Mac geeks of a certain persuasion, the first mention of a soon-to-be-revealed feature of Leopard during the WWDC keynote set off a mental chain-reaction. That feature was Time Machine, and the name alone was enough to cause one particular phrase to hammer in the mind of many people, including me: "New file system in Leopard!" It was even a bingo square. In fact, it was my personal favorite bingo square, and the one that I most looked forward to marking.

But let's back up a bit. Why should the mere name "Time Machine" scream "new file system" to anyone? And why the excitement about a new file system in the first place? What's wrong with HFS+, Mac OS X's current file system? It's got journaling. It supports arbitrarily extensible metadata. It can even be case-sensitive to satisfy the Unix geeks. Does Mac OS X really need a new file system?

In a word, yes. HFS was a state-of-the-art personal computer file system when it was first released...twenty-one years ago. HFS+ is only eight years old, but it's built on many of the design decisions of HFS. Progress marches on. Today, there are new capabilities that the best modern file systems have, but that HFS+, even with all of its recent additions, does not. Here's a short list.

Efficient storage and handling of very small files.

Logical volume management through a pooled storage model.

Improved data integrity using checksums on all data.

Snapshots.

It's no surprise that many of those bullet points were pulled from the ZFS home page. I've written about ZFS before. It's the most "modern" (in terms of release date, at the very least) of the modern file systems. ReiserFS is a bit older, but it covers some of the same territory. Both file systems are notable for their willingness to reconsider past assumptions about file system design.

ZFS and ReiserFS are just two examples of modern file systems, and the list of features above is far from exhaustive. I chose these features because each has a potential benefit to Mac OS X users and developers.

Efficient storage and handling of very small files. As disk sizes have grown, very small files have become the enemy of file system efficiency. It's not so much that the minimum storage required for each small file has grown (although it has in some cases, with large block sizes on file systems optimized for high-bandwidth i/o). It's mostly that large disks can hold so many more small files. File systems designed in an era when the largest disks held tens of thousands of files tend to choke on modern workloads where a single directory could contain hundreds of thousands of files, and the entire disk could contain many millions.

This situation may seem pathological, limited to things like mail server spool directories. But plain old (non-server) Mac OS X is quite the file maven. Thanks to its Unix heritage and its bundle system in which a directory full of small files appears as a single logical item in the GUI (frameworks, applications, and even some rich document formats all use the bundle system), even the most minimal home user's Mac OS X boot volume probably has over half a million files. (To see how many files you have, open Disk Utility, select your hard drive in the left-hand pane, then look in the lower-right corner of the window.) Most of my disks have over a million.

And don't forget Spotlight, which requires an individual file on disk for each thing being indexed. Applications have begun to alter their storage implementations to make Spotlight happy. Apple's Mail application, which used an mbox-like storage format in Panther, now stores each message in an individual file in Tiger. Microsoft's Entourage email application has kept its monolithic database file in Office 2004, but now also creates an individual file for each mail message purely for the benefit of Spotlight's indexer. New applications become "Spotlight savvy" every day. The trend is clear: more, smaller files are coming to a Mac OS X disk near you.

A modern file system must be able to store these small files efficiently. Very small block sizes are ideal for storing small files, but costly for large files. One solution is to use variable block sizes. Another solution is to pack multiple small files into a single disk block.

Then there's performance. There's some minimum overhead for each file operation. File systems that take no special steps to deal with the proliferation of small files tend to get "nickel-and-dimed to death" as the per-file overhead begins to dwarf the actual work done to each file. Most modern file systems try to coalesce many small operations into one larger task in order to minimize this effect.

The goal of all of these techniques is to reduce the cost multiplier associated with each file. That is, to reduce the difference in storage size and runtime performance between writing 10,240 1KB files and one 10MB file—while still maintaining good performance when dealing with 1GB files, of course.

Logical volume management through a pooled storage model. I touched on this topic in an earlier post. The ability to divorce physical disks from logical volumes is the obvious next step in storage abstraction. Imagine if adding more free space to any existing volume was as easy as just slapping in another hard disk. Imagine if adding redundancy or increasing performance by spreading i/o across many devices was just as easy. Now imagine it all worked with cheap commodity hard drives, with no special hardware, and with no reformatting. This is the "storage as an (almost) infinitely reconfigurable cloud" model that ZFS delivers. Although it sounds like an esoteric, tech-drenched topic, it's actually a perfect fit for Apple's product philosophy. It's storage that Just Works.

Improved data integrity using checksums on all data. Again, this sounds like a feature that's far from the mainstream. Who but banks and the military really need industrial-grade software verification of all data going to and from the disk? Isn't that overkill for a home user?

I don't think it is. Hard drives are increasingly home to the most important and precious information in people's lives: financial documents, photos and movies of family, even seemingly trivial things like long-forgotten passwords to secure web sites stored in the Mac OS X Keychain. Losing any of this data can be an economic or emotional hardship.

There are many ways a loss can occur, of course. The most common is probably accidental deletion, followed by hardware failure. Checksums prevent neither of these. What they do prevent is less common, but much more insidious: silent data corruption.

It's painfully obvious when a file is missing entirely, or a whole disk no longer works. But when a few bytes here or there get corrupted due to a transient error in a disk's firmware or an undetected bad block on a disk platter or a cosmic ray hitting the RAM cache at the wrong time or any other kind of non-catastrophic event, a failure to detect these occurrences leads not only to the corruption of your data, but also to the eventual corruption of your backups as well.

Assuming you don't have an unlimited budget for data storage and are therefore forced to recycle backup media, the corruption will eventually wipe out any trace of the original, valid files. And the worst part is that you'll have no idea that it's happening.

End-to-end data verification does not prevent user error, and cannot help recover from catastrophic failure. But the problem it does solve is perhaps the most difficult, and most rarely addressed problem in the storage world.

Snapshots. A snapshot preserves the state of an entire file system at a given point in time. This may sound a lot like a backup, but there are some important differences.

First, a snapshot is entirely self-consistent, exactly preserving the state of each file at a particular instant in time across an entire file system. A traditional backup running on an active file system makes no such guarantees without invasive locking schemes or even more onerous requirements.

Second, snapshots are considerably more space-efficient than backups. By recording only the individual disk blocks that have changed, a snapshot takes a fraction of the disk space required by a traditional backup.

Finally, and perhaps most importantly, while a full backup takes an amount of time that's proportional to the size of the file system, a snapshot can happen in constant time, regardless of file system size. This is why you'll often see snapshots referred to as "instantaneous." The time required is usually so small that a snapshot appears to take no time at all. And remember, this time does not increase as disks get bigger.

Time Machine

Now, back to Time Machine. The thinking triggered by the announcement of the name during the keynote went something like this.

Time Machine....time travel...go back in time to get older versions of files...to see the state of the file system as it existed in the past—OMG, snapshots! New file system in Leopard! New file system in Leopard!

File system nerd knows that snapshots are not the kind of feature that's easy to tack onto an existing file system. HFS+ has already been extended significantly past its original abilities. Trying to add snapshots is probably one extension too far. So snapshots probably mean a new file system.

As the Time Machine demonstration progressed, revealing the user interface's Core Animation flourishes, nothing shown precluded the existence of a new file system with support for snapshots. On the other hand, nothing about a new file system was mentioned explicitly either. You'd think this would be the type of thing that Apple would want to tout. If Time Machine isn't powered by a new, snapshot-enabled file system from Apple, then how does it work? It's got to be snapshots, right? File system nerds began to worry.

Wait! Maybe Apple didn't say anything about a new file system because the one they're using was created by someone else. Maybe Apple is moving to ZFS in Leopard!

I've written about ZFS several times, including a post that revealed that the Filesystem Development Manager at Apple is interested in porting ZFS to Mac OS X. Stick a fork in it! "Confirmed!!!" Time Machine → snapshots → ZFS!

In the past few months, it's seemed like accepted wisdom among the denizens of Mac web forums and blogs that Apple was moving to ZFS. Time Machine seemed like an official confirmation of what everyone expected. Just google for "zfs leopard snapshots" to see how many people came to the same conclusion when Time Machine was announced. All the pieces fit. Too bad it's not true.

The snapshot/ZFS revelation was debunked nearly as quickly as it sprang up. Although all of WWDC except for the keynote is covered by an non-disclosure agreement, the particulars of Time Machine's implementation were some of the very first technical details to leak.

Time Machine does not use ZFS. There was a lot of initial confusion about this, partially because Leopard does include a port of DTrace, the "other" high-profile open source project to come out of Sun's OpenSolaris efforts. But the absence of ZFS was no surprise to me.

Although my blog has been cited frequently in other blogs and discussion forums where ZFS was predicted as a feature of Leopard, I never expected it to happen. I said as much at the end of my post about Apple's ZFS efforts.

While I (still) eagerly await whatever shiny new file system Apple has up its sleeve, it's nice to see that Apple is working towards adding ZFS interoperability as well.

I took the ZFS port at face value—as a port of a foreign file system, not as a replacement for HFS+ (certainly not in the Leopard time frame, anyway). But what I did expect was a new file system from Apple. Not a port or a fork of an open source file system, but a brand-new, home-grown, kick-ass file system created by Apple's own team of engineers. Unfortunately, that didn't happen either.

The upshot, as readers probably know by now, is that Time Machine is not an interface to file system snapshots built on any sort of new, modern file system. Instead, it's an automated backup system that works with plain old HFS+. The point-in-time views in Time Machine are actually sparsely populated directory trees on an external disk or server containing mostly hard links to unchanged directories, plus full copies of the few files that have been created or modified since the last backup.

Apple added traditional hard links (that is, hard links to files) to HFS+ back before Mac OS X 10.0 was released. In Leopard, HFS+ supports hard links to directories as well—an ability wholly alien to any other Unix-like operating system that I can think of. This is how Time Machine builds its sparse trees. The very first backup is a full copy. All subsequent backups contain hard links to the unchanged portions of the previous backup.

A lot of the disappointment about the lack of modern file system mojo from Apple has bubbled over into Time Machine hate—or if not hate, then at least condescension. Does Time Machine suck because it doesn't use snapshots? No. The two things serve entirely different purposes. Time Machine is "backups made easy enough that people will actually do them," and that's nothing to sneeze at.

Time Machine leverages the same file system event notification system as Spotlight in order to keep track of which files have changed. (This notification system is open to third-party developers in Leopard. Yay!) This makes the backup process much less demanding; the entire volume does not need to be scoured for changed files, grinding the disk in the process. The list of changed files is ready immediately, at any time. Like Spotlight before it, Time Machine shows the incredible utility of global file system notifications.

The most significant feature of Time Machine has nothing to do with the underlying file system or copy engine. Apple has set backups free from the traditional "utility application" model that so many people find intimidating and confusing.

Making the actual backup process automatic is pretty easy. Lots of existing backup products do that. The ingenious bit is that Apple has made the recovery process similarly free of any interaction with a dedicated backup application.[1] Files are recovered from a simple—fun, even!—interface right in the standard file manager. Even better, data can be recovered from within individual applications, and not just those from Apple. Third-party developers can also integrate Time Machine into their applications.

There's still plenty of room for legitimate Time Machine criticism, however. While the hard link trees are a clever solution, given the constraints of HFS+, the strategy dictates a file-level granularity for all backups. In other words, if you change a single byte of a 500MB file, the entire 500MB file will be copied to the backup volume during the next Time Machine backup. Frequent modifications to large files will fill your backup volume very quickly.

According to MacWorld, "Apple suggests that the answer will be for application developers to modify their programs to break up data into more discrete elements that can be backed up more simply by Time Machine—something they may already be doing in order to make their files searchable via Spotlight." Of course, this advice is totally unhelpful for people who edit video or work with other large media files on a daily basis.

Worse, there does not appear to be a way to prioritize backup retention. When space runs out on the backup volume, presumably Time Machine will recycle old space. But if it does so based on date rather than (user-specified) "importance", a relatively unimportant change to a large file could necessitate the loss of hundreds of small files from the backup volume. Where once a small text file may have had an entire year's worth of revisions on the backup volume, now only the past month's revisions may exist due to the need to reclaim space for a few recently modified large files.

All of this comes back to snapshots. Again, snapshots are not the same thing as backups. But they can certainly be used to more efficiently implement an automated backup system like Time Machine. With snapshot-enabled file systems on both the primary and backup volumes, backups could be done at the block level rather than the file level. A one byte change to a 500MB file would then cause only a single block (say, 4KB) of new storage to be used on the backup volume.

Furthermore, snapshots would enable "live" file history on the primary disk as well. This (still) does not constitute a backup. (When a disk dies, all the snapshots on that same disk die too. Backups have to be on another device.) What it does do is enable any file on the primary volume to be reverted to a previous version, Time Machine style, but without requiring the backup volume (or server or whatever) to be available.

This reveals another weakness of Time Machine, particularly for laptop users. Restoring an older version of a file is only possible if the backup volume is available. If you're on the road with just your MacBook and you need last week's version of a file, you're out of luck.

Taking stock

So, to sum up, Time Machine is an automated backup system with an inspired lack of user-visible moving parts, but a fair to middling underlying implementation. It's not based on snapshots. It doesn't use ZFS. It doesn't use a new Apple file system. It's not a big truck.

As for the future of file systems in Mac OS X, I continue to hold out hope that something more modern will replace HFS+. It doesn't have to have every buzzword feature under the sun, nor does it even need to have all the ones I described earlier. But a few would be nice. And yes, snapshots are high on that list.

Although I would be satisfied with ZFS, I think Apple has a unique perspective on computing that might lead to a home-grown file system with some interesting attributes. When might such a thing appear? Not in Leopard, it seems—or at least not in 10.5.0. Does such a project even exist within Apple, or is the plan to (eventually) adopt an existing open source file system like ZFS? As usual, Apple isn't saying a word one way or the other.

And so I continue to wait. The few minutes between the announcement of Time Machine and the eventual revelation that there's no new file system under the covers represents the best experience for Mac file system nerds in a keynote in many years. (Sad, but true.) Mac OS X 10.6 has officially been added to my watch list. WWDC 2008, here I come!