My Tale Of Storage Trouble

I’ve had a four-drive NAS in my closet for years. One day, one of the four drives failed. Because I had the drives configured in RAID 5, the array continued to work just fine. I mistook the somewhat slower performance of degraded mode to be a consequence of approaching the array’s physical capacity. The enclosure never alerted me to a problem. So when a second drive failed, all of my data (family photos and videos, music collection, two decades of work, everything) was gone. Poof. Instantly. And, because of circumstances too embarrassing for a technology professional to relate, that NAS contained my only copy of all of that data. You could hear my screams from blocks away.

Misery loves company and, of course, I am far from alone. Back in 2011, Tom’s Hardware showed that hard drive failure within three years can range up to 20%. SSD rates are better, but go ask Linus Torvalds if that was any consolation for his dead workstation.

In a blind panic, I called the biggest name in disk disaster recovery, Seagate Recovery Services, and in the process stumbled into a fascinating photo story. What follows is not meant to be a commercial for Seagate. The company did not pay for this coverage. The Tom’s Hardware editors and I simply recognized that a lot of people need recovery help, and a glimpse behind the curtain at how those operations get done might be enlightening for consumers and business users alike.

This page's image source: Wikipedia. Nearly all other artwork in this article is by Peter Panayiotou of Panayiotou Photography, Inc. (www.panayiotou.com).

Sending Trouble Away

SRS received 18,000 recovery cases last year. With the rollout of its Rescue & Recovery service, Seagate expects that number to climb well past 30,000 cases annually going forward. These can take the form of anything from a USB thumb drive to a multi-drive SSD array to a storage area network consisting of hundreds of hard drives. Of course, in order to service those devices, spares must often be on-hand, which means that, of necessity, SRS has access to one of the most diverse drive collections in the world. Old school SCSI? Iomega Zip? It's all there.

When submitting a drive to SRS, you begin on the company’s website, filling out a questionnaire about your storage problem, the suspected nature of the failure, observations, what was happening prior to failure, and so on. This creates the start of a case profile that stays with the drives as Seagate receives and logs them all the way through the eventual return of your data.

The Feel Of Failure

Interestingly, the first step in diagnosing a faulty drive out of the box is not to analyze it with software.No, the first thing that technicians do is put power to the drive, then listen to it and feel it.The customer’s original input when reporting the drive gets taken into account, butdrive experts can also tell a fair bit by how the drive vibrates, the kinds of clicking patterns it makes, and, of course, if there are any grinding noises. Obviously, grinding is bad and would be halted immediately. In general, technicians try to run the drive the least amount possible at this stage. But if the drive sounds like feels like it is in proper mechanicalworking order, then it moves to the next stage of pre-evaluation.

Finishing The First Look

After a cursory mechanical examination, technicians connect the drive to a test system and see if it can perform basic tasks, such as coming up during boot, obtaining a volume letter, and performing read/write operations. At this early stage, the point is not to commence repairs. Rather, techs only want to get a fix on what area within SRS should receive the drive for further analysis and recovery work.

This Workbench Blows

This is a medical-grade HEPA clean bench used to control the air around the drive while it’s open. According to Seagate, such benches allow the environment to eradicate particulate debris down to the microscopic level within the space of a few cubic feet, which is all that’s needed for most drive examination and repair. The back wallconstantly pushes out filtered air into the chamber, creating positive pressure and forcing any airborne contaminants back away from the bench.

Inside The Bubble

While the Chicago facility where we shot most of the photography in this story was in the midst of taking down its clean room during our visit, other SRS sites, including Oklahoma City, can take clean room conditions beyond the HEPA-class workbench. While not common, some recovery situations call for spinning up platters in environments that are essentially free from the risk of debris contamination. Even one speck of dust between a drive head and the underlying platter can cause further damage and data loss, and some jobs demand absolute assurance of the most favorable recovery conditions.

Peering Under The Hood

Once the drive lid comes off, techs can get a much better sense of the physical damage that might be in play. Says Seagate’s Peter Oswald, "I’ve seen it all. Dogs chew on them. People take hammers to them. Fires where not only was the drive burned but the fire department completely doused it in water. People forget that their laptop bag is sitting next to their car when they pull out, and then they run over it. And of course we see drives that come from natural disasters."

A Filter For Fragments

Many people don’t realize that most 3.5” hard drives have at least one filter in them. Because these filters are installed under totally clean manufacturing conditions, they should remain totally white.However, the filter plays a key role in recovery diagnostics because techs can immediately tell if the drive heads have made contact with the platter surfaces. Such contact plows a furrow into the media, resulting in dust and debris flying off the platters and into the air circulating around inside the drive. The filter captures many of these particles and turns dark from the debris. Or, as Seagate puts it, “That discoloration is your data scraped off the media.” In addition, with the particles removed from circulation, the drive may continue to function longer, allowing the user and/or recovery techs toretrieve data without further media damage.

Deeper Damage

Especially if you click on this image to view its higher-resolution version, it’s clear that at least the topmost platter of this drive has seen better days.Hard drive platters should look like mirrors. Here, though, roughly the outmost half of the platter is scarred in a circular fashion from head crashing. Admittedly, most head crashes are not this bad; we wanted to photograph something dramatic. But when there is evidence of damage, techs willwant to get a quick sense of whether that damage extends to several platters. In this case, there was dust covering all platter surfaces, and the drive needed to be completely dismantled so that techs could get a better view of what was really going on. “When I do this kind of examination,” says Peter Oswald, “I’m looking for particles that catch the light and give me more sense of where I need to go.”

Send In The Claw

As you might expect, disassembling a hard drive isn’t like pulling apart LEGOs. One little quiver is enough to have the drive heads gouge out new furrows in already damaged platters. As a result, SRS engineers use a special claw-like tool to make the process safer and quicker. “That piece, which is specifically built for Seagate, allows us to remove the heads more safely than anybody else in the industry can,” says Peter Oswald. “You see how the heads are sitting over the very innermost potion of the drive? That’s a safe zone for those heads to sit and land. But that device allows us to go in and safely pick those heads up and clear them from the surface of the platter and remove them safely.”

Need More Input…Disassemble!

Piece by piece, technicians dismantle the drive for cleaning and deep examination. Platters get stored into special carriers. In rare cases, those platters may be installed in highly specialized machines designed to perform deep examination of media tracks. Usually, though, technicians can rebuild the original drive, replacing whatever components (such as heads) are necessary in order to get that brief bit of life needed to extract the drive’s contents. Interestingly, Peter Oswald notes that his recovery team can harness software to give very precise directions to drives on how they should attempt to read data. This is one reason why drives are sometimes disassembled. Technicians use visual observation to help them pinpoint what disk areas to focus on.