RAID: Life, Death and Resurrection

587 reads

@ doomhammer Piotr Gaczkowski Creator. Efficiency Hacker. Human Jukebox. Loves convenient tools and sharing knowledge.

RAID4 array as seen by the author

This is a true story that happened to me around 2010/2011.

My problems with a Network Attached Storage (NAS) started quite early. I think this difficult childhood was caused not so much by the fragility of the hardware but rather by my sadistic tendencies. I admit, I am a bit perverse and definitely a hacker. This means that I tend to use each piece of equipment that falls into my hands in a way that its producer could not predict.

At the beginning, while experimenting with various additions created by troublemakers like myself, I shot off my foot. Or rather my face. The Web Interface refused to work when I uninstalled system-suuplied BitTorrent client (which was, by the way, quite despicable). This was actually producers’ fault. The producer hadn’t come up with the idea that the user may not necessarily need their unlimited goodness and might try to satisfy themselves in their own way (like with a Transmission BitTorrent client). Because I hadn’t activated the SSH access before (the moral: for your own good, activate SSH immediately after purchasing a NAS!), my only contact with NAS from then on was over NFS and by using Transmission (which doesn’t integrate with Web Interface and works on a different port).

Because it was quite sufficient for listening to music (UPnP server was working fine without making a fuss), I didn’t bother to fix it. “There would be time for that”, I thought.

Protect Yourself, My Son

It is worth adding that the NAS I had was pretty rich in backup solutions. From the simplest ones (copying to any USB device connected to the socket, when you push the “backup” button) to more wicked ones ( rsync controlled by cron ).

Lack of time (which will be mentioned again in this note) was the reason that, to my bane, I didn’t manage to activate any of these solutions before I ruined the Web Interface.

The real problems started a bit later, and they were related to the power failure. I had 4x500GB drives in an array that the producer called X-RAID and it wass more or less equivalent to RAID 4. The premise it that failure of one drive shouldn’t cause the loss of data and doesn’t disturb the work of the system. Therefore, according to Murphy’s law, two of the drives went belly up. After such a blow, the NAS wasn’t able to come around. My attempt to start it ended up in “Booting” message which lasted forever on the pale display. Frankly, the display was green but green wouldn’t show the full pathos of the scene.

Count off

Without thinking too much (sometimes I have an impression that I don’t think too much at all), I decided to check the drives. The first observations:

HD1: fdisk shows the partition table, ext2 partition is mounting

shows the partition table, ext2 partition is mounting HD2: fdisk stays mum

stays mum HD3: same as HD1

HD4: burning inferno

When I connected HD4 to my PC for inspection, BIOS started shouting at me. There was something about S.M.A.R.T. on HD4 being really angry. Ultimately I diagnosed it as an electronics failure and put aside to the box with a „Later/Maybe” label.

Having two working and two faulty drives is not a good starting point in any attempt to recover the data.

I scratched my head. As I remember it, I kept scratching my head for several months at that point. Finally I came to the conclusion that perhaps HD2 was not as sick as it seemed to be. Maybe it way just simulating the illness to skive off work?

“bunch of fruits in buckets on black wheelbarrow” by Kelly Neil on Unsplash

Moving Bits by the Wheelbarrow

To avoid complicating matters even more, I decided to buy an additional drive and do all operations on images. Any slip-up on a living drive would be a permanent death for my data. I bought the 2TB drive, took one of my favorite tools into my hand ( dd ), and started shovel ing the bits. My working hypothesis was that the partition table for HD2 was broken.

Because the failed NAS formatted every drive in the same way (system partition, swap partition, then LVM) I compared the partition tables of HD1 and HD3 and then I copied one of them to HD2. Using losetup , I mounted system partition with HD2 — everything looked absolutely top-notch. My heart was filled with joy! I awaited impatiently to be able to activate LVM volume and regain the access to the data.

(Un)expected Plot Twist

The fate can be vicious, however, and it turned out that one of Physical Volumes couldn’t be found. I noted its UUID and pondered gloomily what the next steps could be. For the lack of better idea I fired up hexdump -C and looked at what I thought was a single PV.

On HD1, everything looked like a classic LVM run-in, similarly on HD3. But HD2 presented a weird mashup of expected data and some random chaos (at first sight). The field with UUID was especially strange. Did it mean that it wasn’t just the partition table that suffered?

Despite this setback, I continued to investigate further. Why are some of the data correct and some of them look scambled? The enlightenment came suddenly and I cannot recall now what caused it. When I look back, I see how stupid I was. At that time I thought I was the smartest person living on the planet Earth!

Come, My Child, I Will Teach You!

I examined the entire situation. I moved mentally back to the point that from the beginning was the catalyzer of my actions: RAID 4 protects one of the drives from failure by recording XORed data to allow recreation of the original. Isn’t that obvious?

Without much thinking (you bet!) I wrote a Proof of Concept in C. All it did was to read the first 4kB from each of the three images and present the XORed data on the screen. This way I found the missing UUID. Also, a peace of mind! Now, all that was needed was to make the program more general. This way I could write the results to the fourth image and then proceed with losetup , vgchange -ay , and we’ll be home again!

Not So Fast!

Because I’ve been always bad at math, it didn’t occur to me that putting 4 images of 500GB each on a formatted disc with a total capacity of 2TB can be a problem. Especially if there are already some other data on it. My initial enthusiasm dropped and I started thinking about an in-place solution (in the meantime, drive prices skyrocketed so buying another one was out of the question).

The first brilliant idea was to write a kernel driver that would present a block device returning XOR form provided files. I even started reading Linux Drivers HOWTO, even though it seemed like using a cannon to swat a fly. Ultimately, the idea of writing a filesystem in Python with FUSE won. It was relatively easy and in a few moments I checked how it works. My heart sank again when losetup cried that it lacks permissions to present my fake file as a block device.

In such situations I tend to ask friends for advice. Luckily one of them, named strace showed me where the problem lied. FUSE doesn’t allow one user to access the file system mounted by another user. And because losetup requires superuser priviledges and I mounted my filesystem as a regular user, it turned out to be an obstacle.

O Frabjous Day! Callooh! Callay!

losetup , vgchange , fuseext2 and… my entire music collection (several months of ripping CDs) was safe and sound, waving to me cheerily. There was much rejoicing!

Post Mortem

I believe you might not find yourself in a similar position as I had. This was a particularly bad coincidence of a power failure, broken hard drive and prohibitive prices for replacements. But there are lessons to be learned from this experience.

Even if you operate on a relatively high level (like filesystems and FUSE) it is good to know what is going on beyond the level of abstraction. In my case it was the hexdump and strace that helped me regain my costitution when things looked grim.

Similar problems may appear whatever you do, around the same time I was struggling with my NAS, Twitter was facing problems related to scaling. Even though their software was written in Ruby, they resorted to using DTrace to look beyond the facade. All the fancy frameworks sure save time in development, but when things go ugly expect to get yourself dirty.

Tags