Searching \ for '[EE]:: Factors affecting hard disk longevity <- Re' in subject line. ()
Make payments with PayPal - it's fast, free and secure! Help us get a faster server
FAQ page: www.piclist.com/techref/index.htm?key=factors+affecting
Search entire site for: ': Factors affecting hard disk longevity <- Re'.

Exact match. Not showing close matches.
PICList Thread
'[EE]:: Factors affecting hard disk longevity <- Re'
2007\02\21@054301 by Russell McMahon

face
flavicon
face
Ken, Ross, Gavin, Philip;Iain, Chris want to read this.
Rod should want to :-)
Owen may wish to file this away just in case.



Resubjecting and retagging Jinx's post due to its worth to many here.

From: "Jinx" <spam_OUTjoecolquittTakeThisOuTspamclear.net.nz>
> "The impact of heavy use and high temperatures on hard disk drive
> failure may be overstated, says a report by three Google engineers"
>
> http://news.bbc.co.uk/2/hi/technology/6376021.stm

This is a report of Google's experiences with off the shelf hard
disks. It's worth knowing.
I've summarised the the key points they make in the article re HDD
reliability.

I may be tempted to conclude:

   As soon as you get ANY scan erros, replace the drive.

   Run them warmish !!!

   Use them frequently !!!

   Replace at 3 years old.


           Russell

___________


Hard disk test 'surprises' Google

The impact of heavy use and high temperatures on hard disk drive
failure may be overstated

The report examined 100,000 commercial hard drives, ranging from 80GB
to 400GB in capacity, used at Google since 2001.
The firm uses "off-the-shelf" drives to store cached web pages and
services.

"Our data indicate a much weaker correlation between utilisation
levels and failures than previous work has suggested,"

There is a widely held belief that hard disks which are subject to
heavy use are more likely to fail than those used intermittently.
It was also thought that hard drives preferred cool temperatures to
hotter environments.

"However our results appear to paint a more complex picture.
Only very young and very old age groups appear to show the expected
behaviour."

A hard disk was described as having "failed" if it needed to be
replaced.
[ :-) ]

Lower temperatures are associated with higher failure rates

Hard drives less than three years old and used a lot are less likely
to fail than similarly aged hard drives that are used infrequently

"One possible explanation for this behaviour is the survival of the
fittest theory," said the authors, speculating that drives which
failed early on in their lifetime had been removed from the overall
sample leaving only the older, more robust units.
[ :-) ]

There was a clear trend showing "that lower temperatures are
associated with higher failure rates".
"Only at very high temperatures is there a slight reversal of this
trend."

But hard drives which are three years old and older were more likely
to suffer a failure when used in warmer environments.

The report also looked at the impact of scan errors - problems found
on the surface of a disc - on hard drive failure.
The group of drives with scan errors are 10 times more likely to fail
than the group with no errors," .

"After the first scan error, drives are 39 times more likely to fail
within 60 days than drives without scan errors."












2007\02\21@063405 by Russell McMahon

face
flavicon
face
Here's the actual Google paper

"Failure trends in a large disk drive population"
Proceedings of the 5th USENIX Conference on File and Storage
Technologies (FAST'07), February 2007

       http://216.239.37.132/papers/disk_failures.pdf

Note, from below

   " ... a large fraction of our failed drives
have shown no SMART error signals whatsoever. This
result suggests that SMART models are more useful in
predicting trends for large aggregate populations than for
individual components. It also suggests that powerful
predictive models need to make use of signals beyond
those provided by SMART."



       Russell




5 Conclusions

In this study we report on the failure characteristics of
consumer-grade disk drives. To our knowledge, the
study is unprecedented in that it uses a much larger
population size than has been previously reported and
presents a comprehensive analysis of the correlation between
failures and several parameters that are believed to
affect disk lifetime. Such analysis is made possible by
a new highly parallel health data collection and analysis
infrastructure, and by the sheer size of our computing
deployment.

One of our key findings has been the lack of a consistent
pattern of higher failure rates for higher temperature
drives or for those drives at higher utilization levels.
Such correlations have been repeatedly highlighted
by previous studies, but we are unable to confirm them
by observing our population. Although our data do not
allow us to conclude that there is no such correlation,
it provides strong evidence to suggest that other effects
may be more prominent in affecting disk drive reliability
in the context of a professionally managed data center
deployment.

Our results confirm the findings of previous smaller
population studies that suggest that some of the SMART
parameters are well-correlated with higher failure probabilities.
We find, for example, that after their first scan
error, drives are 39 times more likely to fail within 60
days than drives with no such errors. First errors in reallocations,
offline reallocations, and probational counts
are also strongly correlated to higher failure probabilities.
Despite those strong correlations, we find that
failure prediction models based on SMART parameters
alone are likely to be severely limited in their prediction
accuracy, given that a large fraction of our failed drives
have shown no SMART error signals whatsoever. This
result suggests that SMART models are more useful in
predicting trends for large aggregate populations than for
individual components. It also suggests that powerful
predictive models need to make use of signals beyond those provided by
SMART.




______________________


{Quote hidden}

More... (looser matching)
- Last day of these posts
- In 2007 , 2008 only
- Today
- New search...