A new report from Internet titan Google examined the performance of 100,000 consumer-grade hard drives ranging from 80 to 400 GB in capacity in use in Google’s operations since 2001. The report’s conclusions: heavy use and high temperatures may not contribute to drive failure as much as “common knowledge” suggests.
In their report (PDF), Eduardo Pinheiro, Wolf-Dietrich Weber, and Luiz Andr´ Barroso wrote that, initially, they expected to find that heavier drive use would correspond with higher failure rates; instead, the results of their study hint at a more complicated pattern to drive failure—and, amusingly, hint that data from drive’s built-in SMART self-monitoring technology isn’t enough to go on when trying to predict the failure of individual drives. Some SMART parameters were found to have a strong correspondence with higher probabilities of failure—for instance, a drive was 39 times more likely to fail within 60 days of their first scan error than they were if they didn’t produce any errors. However, a large portion of Google’s failed drives showed no SMART errors at all, suggesting monitoring SMART data alone is not enough to predict individual drive failure, although it might be useful in aggregate across “populations” of drives.
Google also found that drives less than three years old but which received heavy use were less likely to fail than drives of a similar age which received infrequent use. “One possible explanation for this behavior is the survival of the fittest theory,” the authors wrote. “Drives that survive the infant mortality phrase are the least susceptible to that failure mode, and result in a population that is more robust with respect to variation in utilization levels.”
The study also found that drive failures do not increase with the average temperature; in fact, the authors found that lower temperatures were associated with higher drive failure rates. “Only at very high temperatures,” above 45° C, “is there a slight reversal of this trend.” However, older hard drives (three years old or more) were mire likely to suffer a failure in warmer environments.
Google very politely omits mention of the observed performance of specific drives and drive manufacturers, but its results do serve as an interesting starting point for discussions of how we store our data, and what we can reasonably expect from it. As the authors note, over 90 percent of new information produced in the world is being stored on magnetic media—and, as I would note, the failure rate of hard drives is (eventually) 100 percent.