Hello Jon! I actually signed up here to briefly comment on your opinions as expressed about RAID in general. So first, let me say that in your position and from your perspective I can certainly understand your comments. This about sums them all up for me:

It is generally well accepted that RAID0 carries a sizable risk of data loss.

...pretty much echoes what I consider to be a myth about RAID 0 and "the heightened risk" of data loss. I'll explain.

First of all, let's consider a situation where a person purchases two of these: Western Digital ATA100 80.0GB WD800JB's, just as an example. From your chart in the article, it has been your experience that this drive has a 1.72% failure rate in terms of how many of these drives your company has sold that have required replacement because they failed. Here's the point I want to suggest to you:

If your customer decides on purchasing two of these drives to be used as normal IDE drives instead of RAID drives, then based on your company's experience to date he's running about a 1.72% chance that either one or both of the drives he's purchased could be defective from the factory and fail prematurely. If it happens that one of the drives should fail while the other doesn't, then your customer will lose the data from that drive (assuming here no backups were done and so on) even though he may not even know how to spell the word RAID. If the drive that fails is a boot drive, then your customer will lose his boot drive and be unable to boot his system, and he'll have to reconfigure his second hard drive as the boot drive and reinstall his OS there if he wishes to boot his system at all prior to getting a replacement drive. All of this, of course, is possible even if no RAID was ever deployed in the system from the start.

The thing that most people do not understand in general is that two drives running as normal IDE drives are still the identical drives when configured under RAID 0. The drives themselves do not know about any differences between RAID and IDE, and they do not behave any differently at all when running as IDE instead of RAID 0, or vice-versa. Thus, the failure rate for each of the two drives running IDE is *exactly the same* as it would be for each of those same two drives running in a RAID 0 configuration. Single drives running in IDE have no fault tolerance, just as single drives running in RAID 0 have no fault tolerance. Again, the drives themselves just do not care whether you run them in IDE or in RAID 0--the drives have no preference--they run and operate exactly the same way regardless. This is an important point to remember when thinking about RAID and IDE drive operation.

OK, now, some people will say, "Aha, yes, but when running in RAID 0 you need two drives instead of one, and that means that your two drives, because they are two, are twice as likely to fail as the single IDE drive you could use instead of RAID 0."

Really? So, by that logic, then, the more components of a type that I have in my system, the higher the risk of failure accordingly? I do not believe, thankfully, that this is way things work...;) If things worked this way in reality, well, all of us would have a very tough time finding anything that worked to any degree of reliability...;) Here's what I mean:

Let's go back to your drive failure chart for a moment and consider the Western Digital ATA100 80.0GB WD800JB drive. According to your numbers you've sold 290 of those drives and yet the chance of failure for each drive you've sold is but 1.72%--which means that your company has seen ~5 (4.998...or so) of those 290 drives fail.

So, if we take the logic that says using two drives instead of one means that the possibility of a drive failing is 100% greater than the possibility of a single drive failing if only a single drive is deployed, then ought not the possibility of one of those 290 drives failing be 29000% greater than if your company had sold but ONE Western Digital ATA100 80.0GB WD800JB? Obviously, though, even though you have sold 290 Western Digital ATA100 80.0GB WD800JBs, the odds of a drive failure are only 1.72%, or roughly 5 drives out of the 290, so we can see plainly that deploying more than a single drive does not increase the odds of a drive failure by 100% x the total number of drives deployed. Does it?

And yet, this is the way that many people think about that issue, it seems to me. They think, wrongly, that if they have two drives in a system the odds of a single drive failing are 100% greater than if they had but a single drive. Obviously, the experience of your company in selling hundreds of these drives and, indeed, the experience of Western Digital in selling millions of these drives is *not* that the risk of drive failure accelerates by 100% for every additional drive you deploy after the first one. Allow me to suggest that if it was then neither you nor Western Digital could conduct business...;)

So what *is* the risk that *a* drive will fail, regardless of how many such drives you either sell or else deploy in a single system? You'll find that risk estimated by every hard drive manufacturer: it's called the MTBF-hours number. You'll also find that number listed plainly in the specifications each hard drive maker lists for each drive he manufactures and sells. So whether you have two drives in a system, five drives or ten drives, etc., that have a MTBF-hour rating of, say, 10,000 hours (just to throw out a number) then you can be assured that on average the manufacturer expects that *each* drive he makes and sells will run that long before failing. Of course, this is an average, and the actual drive you buy may last half that long or twice that long, but on average this is the kind of operational durability you should expect from each of your drives--whether you have one, or two or ten, etc. The *number* of drives in a system has, of course, no impact at all upon the manufacturer's MTBF-hours estimates of operational life for each of his drives that you own.

The second point that needs addressing is this: are you really any safer in terms of your data if you have a single 300Gb drive running as IDE than if you had two 150Gb drives running as one RAID 0 300Gb drive? Well, if it turns out that that the MTBF-hours estimates for the 300GB drive is the same as for the 150Gb drives--then the answer is "no." In that case, there is exactly the same risk between the 300Gb drive failing and one of the two 150Gb drives that comprise your RAID 0 drive. Indeed, in this scenario, if you lose you single 300Gb IDE drive, or you lose one of your two 150Gb RAID 0 drives, then you lose *all* your data, don't you? Likewise, if the 300Gb drive you buy to use as an IDE drive has a MTBF-hour rating of 50,000 hours, but each of the 150Gb drives you use in your RAID 0 configuration has a MTBF-hour rating of 25,000 hours, then you may expect that your IDE drive will possibly run twice as long as your RAID drives before failing. You could as easily reverse that to see your RAID 0 drives logically being expected to outlive your single IDE drive by twice as long, too.

So, it isn't the number of drives a user has in his system that determines the likelihood of a drive failure, it is the MTBF-hour rating that each drive has that is the only barometer for judging how likely it is that a drive will fail, and whether you are running your drives as IDE or running them as RAID 0 makes no difference whatever.

At home, for instance, I am in my *fourth year* of running RAID 0 configurations--and although I've used several different types of drives and RAID controllers over that span, I have yet to have a single RAID 0 drive failure. Conversely, though, in the years prior to that before I ever ran RAID 0 and was running either SCSI or IDE, I had two (that I can remember) drives fail--both of which were replaced under warranty by the manufacturer. If I thought like many people do about RAID 0 then certainly I would reach the conclusion that running IDE or SCSI was much more risky than running RAID 0--heh...;)--but of course I don't think like that so that's not what I think. If anything, I think that drives today are just made a lot better than they were a decade ago, and they just last a lot longer, too--and of course, whether I'm running them as RAID 0 or as IDE makes not a whit of difference there.

But, to answer your complaints about some RAID 0 configurations, I think you'll agree with me that the weakest link in a RAID 0 configuration is the RAID *controller* one chooses to use. In the last four years, since I decided to try RAID 0 for myself, I've used only Promise FastTrack TX RAID controllers at home--first the TX2200 and most recently the TX4200 (TX4200 is coupled to two Maxtor 300Gb SataII drives--it's two years now using the 4200 without a single drive error or failure. The TX2200 I moved to my wife's box, with a pair of WD100 0JBs, has been operating for *four years* without a failure of any kind.) OK, these are dedicated, hardware, PCI RAID controllers which I consider to be several steps above the mostly software-RAID-type controllers found on most motherboards these days as standard equipment. Yes, people *are* reporting a sizable number of RAID 0 (and other RAID mode problems) with these controllers. I think the issue boils down to the quality of the RAID controller--which is exactly what I mean when I say that the efficacy of a RAID setup is greatly dependent on "how" you set it up, and "what" you set it up with in terms of controllers, and of course hard drives, too.

IE, you go "cheap" then you get cheap, if you know what I mean...;) When it comes to components of any sort I believe that very often you get exactly what you pay for--which is also why I'm not a believer in motherboard sound, or motherboard graphics, either. Generally, it's been my experience that the drivers for motherboard-integrated devices of all kinds are just not as good nor as reliable as the kind of driver support you get with name-brand discrete peripherals.

Anyway, Jon, this has been my experience at home with RAID 0 over the last four years, and I appreciate the opportunity to share it. Thanks again.

Posted on 2007-02-06 18:19:50