Diagnosing potential Linux drive failures

May 3, 2012 Kenny Smith

Share This

While working on server today I came across some odd issues in some applications. While checking the logs, I found some drive errors in the warn log file (Failed SMART usage Attribute: 7 Seek_Error_Rate). In searching for the cause of this error, I think discovered a nifty little tool that will tell you the health of physical disk drives as reported by the SMART controller. SMART stands for Self-Monitoring, Analysis and Reporting Technology Systems, and is a technology built into most disk systems.

Two switches will get you going
smartctl -i <drive>
smartctl -Hc <drive>

The following output shows how my troublesome drive is on its way to a nearby recycling plant. Notice that the health check (Hc) shows that the drive is in pre-fail state.


=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   041   041   140    Pre-fail  Always   FAILING_NOW 1265

General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (12000) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 140) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported.These tips were found on a link from Linux Journal. Hope they help you as well as they did me! Now, I just need to find a matching drive for this to replace in the RAID5 array. Hmmm…

Supporting the Boy Scouts of America

Team Concert 6.0.4: New Horizontal Clustering Features

Strongback Presentations at IBM InterConnect 2015

13 steps you need to take to improve your web site performance

One Response to “Diagnosing potential Linux drive failures”

Thomas
13 years ago

I've recently move to Linux OS as we decided to use Linux as well here and your tips here are really useful. Checked all my devices with health check – fortunately, looks like they're working well at the moment.

Diagnosing potential Linux drive failures

Related Posts

One Response to “Diagnosing potential Linux drive failures”

You’ve outsourced your application development. How do you keep intellectual control of your applications?

Contact Us

SNAIL MAIL

EMAIL

PHONE

Thought Leadership

About Us