A rather surprising article hit the front page of the BBC on Tuesday: the next generation of hard disks could cause slowdowns for XP users. Not normally the kind of thing you'd expect to be placed so prominently, but the warning it gives is a worthy one, if timed a bit oddly. The world of hard disks is set to change, and the impact could be severe. In the remarkably conservative world of PC hardware, it's not often that a 30-year-old convention gets discarded. Even this change has been almost a decade in the making.

The problem is hard disk sectors. A sector is the smallest unit of a hard disk that software can read or write. Even though a file might only be a single byte long, the operating system has to read or write at least 512 bytes to read or write that file.

512-byte sectors have been the norm for decades. The 512-byte size was itself inherited from floppy disks, making it an even older historical artifact. The age of this standard means that it's baked in to a lot of important software: PC BIOSes, operating systems, and the boot loaders that hand control from the BIOS to the operating system. All of this makes migration to a new standard difficult.

Given such entrenchment, the obvious question is, why change? We all know that the PC world isn't keen on migrating away from long-lived, entrenched standards—the continued use of IPv4 and the PC BIOS are two fine examples of 1970s and 1980s technology sticking around long past their prime, in spite of desirable replacements (IPv6 and EFI, respectively) being available. But every now and then, a change is forced on vendors in spite of their naturally conservative instincts.

Hard disks are unreliable

In this case, there are two reasons for the change. The first is that hard disks are not actually very reliable. We all like to think of hard disks as neatly storing the 1s and 0s that make up our data and then reading them back with perfect accuracy, but unfortunately the reality is nothing like as neat.

Instead of having a nice digital signal written in the magnetic surface—little groups of magnets pointing "all north" or "all south"—what we have have is groups pointing "mostly south" or "mostly north." Converting this imprecise analog data back into the crisp digital ones and zeroes that represents our data requires the analog signal to be processed.

That processing isn't enough to reliably restore the data, though. Fundamentally, it produces only educated guesses; it's probably right, but could be wrong. To counter this, the hard disks store a substantial amount of error-checking data alongside each sector. This data is invisible to software, but is checked by the drive's firmware. This error-checking data gives the drive a substantial ability to reconstruct data that is missing or damaged using clever math, but this comes with considerable storage overhead. In a 2004-vintage disk, for every 512 bytes of data, typically 40 bytes of error checking data are also required, along with a further 40 bytes used to locate and indicate the start of the sector, and provide space between sectors. This means that 80 bytes are used for data integrity for every 512 bytes of user data, so about 13% of the theoretical capacity of a hard disk is gone automatically, just to account for the inevitable errors that come up when reading and interpreting the analog signal stored on the disk. With this 40-byte overhead, the drive can correct something like 50 consecutive unreadable bits. Longer codes could recover from longer errors, but the trade-off is that this eats into storage capacity.

Higher areal density is a blessing and a curse

This has been the status quo for many years. What's changing to make that a problem now? Throughout that period, areal density—the amount of data stored in a given disk area—has been on the rise. Current disks have an areal density typically around 400 Gbit/square inch; five years ago, the number would be closer to 100. The problem with packing all these bits into ever decreasing areas is that it's making the analog signal on the disk get increasingly worse. The signals are weaker, there's more interference from adjacent data, and the disk is more sensitive to minor fluctuations in voltages and other suboptimal conditions when writing.

This weaker analog signal in turn places greater demands on the error checking data. More errors are happening more of the time, with the result that those 40 bytes are not going to be enough for much longer. Typical consumer grade hard drives have a target of one unreadable bit for every 1014 read from disk (1014 bits is about 12 TB, so if you have six 2 TB disks in an array, that array probably has an error on it); enterprise drives and some consumer disks claim one in every 1015 bits, which is substantially better. The increased areal densities mean that the probability of 400 consecutive errors is increasing, which means that if they want to hit that one in 1014 target, they're going to need better error-checking. An 80-byte error checking block per sector would double the number of errors that can be corrected, up to 800 bits, but would also mean that about 19% of the disk's capacity was taken up by overheads, with only 81% available for user data.

In the past, enlarging the error correction data was viable; the increasing areal densities offered more space than the extra correction data used, for a net growth in available space. A decade ago, only 24 bytes were needed per sector, with 40 bytes necessary in 2004, and probably more in more recent disks. As long as the increase in areal density is greater than the increase in error correcting overhead (to accommodate signal loss from the increase in areal density), hard drives can continue to get larger. But hard drive manufacturers are now getting close to the point where each increase in areal density requires such a large increase in error correcting data that the areal density improvement gets canceled out anyway!

Making 4096 bytes the new standard

Instead of storing 512-byte sectors, hard disks will start using 4096-byte sectors. 4096 is a good size for this kind of thing. For one, it matches the standard size of allocation units in the NTFS filesystem, which nowadays is probably the most widely used filesystem on personal computers. Secondly, it matches the standard size of memory pages on x86 systems. Memory allocations on x86 systems are generally done in multiples of 4096 bytes, and correspondingly, many disk operations (such as reading to or from the pagefile, or reading in executable programs), which interact intimately with the memory system, are equally done in multiples of 4096 bytes.

4096 byte sectors don't solve the analog problem—signals are getting weaker, and noise is getting stronger, and only reduced densities or some breakthrough in recording technology are going to change that—but it helps substantially with the error-correcting problem. Due to the way error correcting codes work, larger sectors require relatively less error correcting data to protect against the same size errors. A 4096 byte sector is equivalent to eight 512 byte sectors. With 40 bytes per sector for finding sector starts and 40 bytes for error correcting, protecting against 50 error bits, 4096 bytes requires (8 x 512 + 8 x 40 + 8 x 40) = 4736 bytes; 4096 of data, 640 of overhead. The total protection is against 400 error bits (50 bits per sector, eight sectors), though they have to be spread evenly among all the sectors.

With 4096 byte sectors, only one spacer start is needed, and to achieve a good level of protection, only 100 bytes of error checking data are required, for a total of (1 x 4096 + 1 x 40 + 1 x 100) = 4236 bytes; 4096 of data, 140 of overhead. 100 bytes per sector can correct up to 1000 consecutive error bits; for the forseeable future, this should be "good enough" to achieve the specified error rates. With an overhead of just 140 bytes per sector, about 96% of the disk's capacity to be used.

In one fell swoop, this change provides greater robustness against the problems caused by increasing areal density, and more efficient encoding of the data on disk. That's good news, except for that whole "legacy" thing. The 512 byte sector assumption is built in to a lot of software.

A 512-byte leaden albatross

As far back as 1998, IBM started indicating to the hard disk manufacturing community that sectors would have to be enlarged to allow for robust error correction. In 2000, IDEMA, the International Disk Drive Equipment and Materials Association, put together a task force to establish a large sector standard, the Long Data Block Committee. After initially considering, but ultimately rejecting, a 1024-byte interim format, in March 2006, they finalized their specification and committed to 4096 byte sectors. Phoenix produced preliminary BIOS support for the specification in 2005, and Microsoft, for its part, ensured that Windows Vista would support the new sector size. Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2 all support the new sector size. MacOS X supports it, and Linux kernels since September 2009 also support it.

The big obvious name missing from this list is Windows XP (and its server counterpart, Windows Server 2003). Windows XP (along with old Linux kernels) has, somewhere within its code, a fixed assumption of 512 byte sectors. Try to use it with hard disks with 4096 byte sectors and failure will ensue. Cognizant of this problem, the hard disk vendors responded with, well, a long period of inaction. Little was done to publicize the issue, no effort was made to force the issue by releasing large sector disks; the industry just sat on its hands doing nothing.

However, this situation clearly couldn't go on forever.