A great piece of investigative reporting in Tom's Hardware on SSD reliability today. Writer Andrew Ku talked to users of more than 160,000 SSDs and tabulated their responses. It is the largest scale study of SSDs ever - other than what vendors keep close to the vest.

It is a long report and if you are professionally responsible - or just really curious - about SSDs I urge you to read it right now. But if you want the high points - and some advice - read on.

The sample Mr. Ku spoke to a number of users, including some who requested anonymity, who ranged in sample size from less than 100 SSDs to over 150,000 units. Respondents included NoSupportLinuxHosting.com, InterServer.net, Steadfast Networks, Softlayer and - with 155,000 SSDs - ZT Systems. In addition he spoke to a number of other sources at supercomputer centers, research groups and other end-users.

Almost all the drives in the study are Intel drives, because they are currently the most trusted SSDs. This puts the spotlight on Intel, but presumably other drives would fare no better and in some cases much worse. With their Micron JV and their own flash translation layer (FTL), Intel is one of the few vertically integrated SSD vendors. And as one of the world's largest mobo vendors, they arguably have the most systems-level expertise of any SSD player.

The issues Comparing the reliability of a 50 year old technology against a 5 year old one isn't easy. Both are moving targets, the use cases are different, and among SSD vendors the reliability is all over the map. But in my view here are some of the key issues:

Failure rates. Storage vendors routinely claim that ≈50% of all returned drives are NTF - no trouble found - when tested. I take a consumer perspective: if it isn't working it's failed. But vendors have a point: they only control their piece of the puzzle, but get blamed for the whole thing. Life can be unfair.

Storage vendors routinely claim that ≈50% of all returned drives are NTF - no trouble found - when tested. I take a consumer perspective: if it isn't working it's failed. But vendors have a point: they only control their piece of the puzzle, but get blamed for the whole thing. Life can be unfair. Substitution effect. If I replace 5 drives with 1 SSD, what happens to reliability - especially if the 5 drives are in a RAID stripe?

If I replace 5 drives with 1 SSD, what happens to reliability - especially if the 5 drives are in a RAID stripe? Age. Few of the SSDs in this study are over 2 years old, while lots of HDDs are. We know that HDD failure rates increase with age, especially after 3 years, and we don't know how SSDs will age.

Few of the SSDs in this study are over 2 years old, while lots of HDDs are. We know that HDD failure rates increase with age, especially after 3 years, and we don't know how SSDs will age. Endurance. End-users worry about the fact that NAND flash is spec'd for a limited number of writes, but for most that isn't the problem. It is the other failure modes that bite you.

End-users worry about the fact that NAND flash is spec'd for a limited number of writes, but for most that isn't the problem. It is the failure modes that bite you. Entropy. It doesn't matter what technology we use, the universe hates your data and always will.

The Storage Bits take All SSDs do is replace a hard drive's head disk assembly - the platters and heads - with a lot of flash chips. The rest of the stuff is the same - and that stuff accounts for about half of all drive failures. So the best we can expect is that SSDs could be twice as reliable.

But flash isn't that reliable either, especially as feature sizes shrink. Few know that it takes ≈20 volts to write NAND flash, which is a lot when insulators are molecules thick. Entire plane failures on flash die are common.

And those are only the obvious problems.

Bottom line: treat your SSDs as you do hard drives. Back up regularly. Use redundancy if uninterrupted service is the goal.

And know that you are only buying performance - not reliability.

Comments welcome, of course.