Friday, March 30, 2012

Reliability of modern hard drives

A couple of things have got me thinking about the reliability of hard drives.
  1. The eldest boy asked me about RAID levels and what extra reliability they bring.
  2. I sold an old 500gig SATA drive from my media center to a guy on eBay and he's quibbling because he's found three bad sectors
Although RAID has bought tremendous increases in the speed and reliability of storage systems there are some basic engineering considerations around combining many unreliable parts into a whole. The Mean Time Between Failure for modern hard-drives (MTBF) is in the order of 30,000 hours. MTBF is a complicated field but if you look at the figures provided by manufacturers then they assume a Gaussian distribution with the 30,000 hours figure at the peak; a few drives will fail after a day, a few will last 100,000 hours, but the bulk will fail around 30,000 hours (around three and a half years). It's why I say to people "...it's not if this drive will fail, rather it's when!"
So - with this in mind I decided to find out the MTBF of a rack of ten drives, each with a MTBF of 30k hours.The formula is;
So, if you stick 30k hours in for D1 through D10 you find the MTBF for the entire system is only 3k hours - less than twenty weeks! In fact it's worse than this as the PSU in the enclosure and the RAID management card will have MTBFs to take into consideration as well. In fact if you ask any broadcast engineer "how often are you replacing drives in RAID arrays" and they'll tell you it's nearly a weekly occurrence for any decent sized facility, and this is why! Although it's been nearly a decade since I ran engineering in a good-sized facility I was often uneasy about how often RAID enclosures failed (loosing all of the media, which is what happens with a RAID-0 striped set). I had that 30k hours figure in my head but never calculated the system MTBF.

RAID-1 (mirrored drive sets), RAID-5 (distributed parity) and RAID-6 (double distributed parity) along with some of the advances that better file systems bring (Isilon's OneFS and Linux's ZFS) mean that failure of a single drive is no longer the disaster it once was, but if someone doesn't notice a drive has died OR (heaven forbid) a second drive dies whilst replacing the first you're stuck. Don't forget a lot of the chassis we're installing now have a couple of dozen drives (Isilon's NL36 nodes - done of a few of them recently) and so MTBF is even worse than the 3k hours above (however, server-grade SAS or Fibre Channel drives are considerably more reliable than domestic-grade SATA drives).

We also know that modern multi-terrabyte drives pack data so densely (similar sized platters to the first 10Mbyte drives of yesteryear, but hundreds of thousands more bits/mm-sq. of disk surface) that the disk's error correction/error recovery system is working flat-out all the time. The newer 2Tbyte drives have a Viterbi decoder to try and statistically extract correct data from the very noisy signal coming off the drive's heads. Additionally the drive's SMART system has to know about the number of bad sectors due to manufacturing imperfections (contained in an EPROM-based p-list table) as well as the number of grown bad sectors (which get swapped out as per the g-list). Spinrite is the best utility I've found for drive maintenance/recovery as it forces the SMART system to pay attention to bad sectors and swap them out. In a Hitachi Ultrastar 7k RPM, 500gig SATA drive there are 10,000 hidden spare sectors on the drive (each sector is only 4k bytes in size) to allow the drive to swap-out failed sectors. According to the data sheet Hitachi would replace a new drive if it had more than twenty bad sectors from the factory - any less and they regard it as being well inside manufacturing tolerances. If you Google "how many bad sectors is acceptable for a new drive" you'll find hundred of IT experts claiming that no bad sectors are acceptable. I don't know what planet they live on, presumably one where quantum mechanics operates in a different manner and electrons don't bump into each other leading to electrical noise!
Oh - the eBay guy; he ran a utility on the drive I sold him that reported three bad sectors. He asked me for a refund. Apparently a second-hand disk drive should carry a better guarantee than that provided by the factory when new!

2 comments:

Anonymous said...

Steve Gibson of grc.com has made a career out of knowing how hard drives work, and sometimes don't. Spinrite is a very handy too for recovering hard drives that are becoming borderline. Watching it work is fascinating in demonstrating just how many read errors a typical drive suffers - a recent podcast from way back when explains in more detail
http://www.grc.com/securitynow.htm
listen to the 'Once upon a time' program

Good report from Google on hard drive failure here - they have enough HDs to get build some good metrics

http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/disk_failures.pdf

Phil Crawley said...

I hear ya!

I'm a big fan of Steve & Security Now; SpinRite has saved me on a few occasions!

Google articles very interesting, thanks, Phil.