Hard Drive Reliability Confidence by Sam C. Chan

December 10, 2011

This chart shows the level of confidence I would have on a particular hard drive being able to last x years, after it has been in-shop for y days of testing and observation, if everything is satisfactory and no new concerns are detected, after the problem is deemed repairable and successfully rectified.

This chart is based on a new drive less than 1-year old. For each year of drive age, subtract 2.5%.

A minimal 1- to 3-week observation period is recommended, for a meaningful assessment.

Types of Defects:

  • Magnetic

  • Mechanical

  • Electrical

  • Physical

  • Thermal

Basic assumptions: Drive is subject to 75% power-on hours (for 7-day or less) and 50% power-on hours (for 10-day or longer), with moderately heavy usage, mostly diagnostic exercises, and occasionally, real world usage to get a few of its behavior patterns. It is subject to reasonable daily thermal expansion-contraction cycles, within a human hospitable environment.

According to manufacturers' declared Mean Time Between Failure (MTBF) figures, a typical modern drive would expect a 0.8% failure rate on an annual basis. That is assuming artificial lab environment under ideal conditions at all times. My empirical data suggests that it is actually around 2% in the field. That ratio has been trending higher in the past 3 years, with the advent of terabyte hard drives. With areal density on a steady rise for 4 decades, we're now rapidly approaching the technological barrier, with deteriorating reliability.

With world-wide hard drive manufacturing consolidating, we now have only 5 makers. There is no discernable difference across brands. There are however significant differences across specific series, and sometimes even different capacity models in the same series.

In the past decade, hard drives became the #1 hardware failure in computer systems. In the preceding 2 decades, it was the power supply unit.


