Hi All,
I have two Seagate Barracuda 7200.12 1 TB (ST31000528AS) drives in a
Linux software RAID-1 configuration. Today I've got a notification from
smartd that one of the drives (sda) is failing:
Device: /dev/sda, ATA error count increased from 0 to 6
Some other log messages (like: "ata1.00: cmd ... Emask 0x409 (media
error)", "end_request: I/O error, dev sda, sector 39072000") and the
disk's SMART error log seem to confirm that the disk is dying. My
problem is that I'm seeing SMART warnings about the other drive too:
smartd[5845]: Device: /dev/sdb, SMART Prefailure Attribute: 1
Raw_Read_Error_Rate changed from 108 to 117
Below is the listing of SMART attributes for the good drive (smartctl -A
/dev/sdb):
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 113 099 006 Pre-fail
Always - 52634145
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail
Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age
Always - 56
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail
Always - 24
7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail
Always - 35530576
9 Power_On_Hours 0x0032 096 096 000 Old_age
Always - 3861
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age
Always - 56
183 Unknown_Attribute 0x0000 100 100 000 Old_age
Offline - 0
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always
- 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always
- 0
188 Unknown_Attribute 0x0032 100 099 000 Old_age Always
- 1
189 High_Fly_Writes 0x003a 099 099 000 Old_age Always
- 1
190 Airflow_Temperature_Cel 0x0022 067 059 045 Old_age Always
- 33 (Lifetime Min/Max 32/41)
194 Temperature_Celsius 0x0022 033 041 000 Old_age Always
- 33 (0 19 0 0)
195 Hardware_ECC_Recovered 0x001a 036 015 000 Old_age Always
- 52634145
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
- 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age
Offline - 91955249811373
241 Unknown_Attribute 0x0000 100 253 000 Old_age
Offline - 1261294398
242 Unknown_Attribute 0x0000 100 253 000 Old_age
Offline - 1519044357
And here is the listing for the bad drive:
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 109 100 006 Pre-fail
Always - 23028010
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail
Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age
Always - 59
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail
Always - 17
7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail
Always - 81078197
9 Power_On_Hours 0x0032 096 096 000 Old_age
Always - 3861
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age
Always - 59
183 Unknown_Attribute 0x0000 100 100 000 Old_age
Offline - 0
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always
- 0
187 Reported_Uncorrect 0x0032 094 094 000 Old_age Always
- 6
188 Unknown_Attribute 0x0032 100 096 000 Old_age Always
- 26
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always
- 0
190 Airflow_Temperature_Cel 0x0022 070 062 045 Old_age Always
- 30 (Lifetime Min/Max 29/3
194 Temperature_Celsius 0x0022 030 040 000 Old_age Always
- 30 (0 19 0 0)
195 Hardware_ECC_Recovered 0x001a 041 022 000 Old_age Always
- 23028010
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
- 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age
Offline - 82240033787773
241 Unknown_Attribute 0x0000 100 253 000 Old_age
Offline - 2371531202
242 Unknown_Attribute 0x0000 100 253 000 Old_age
Offline - 3348144171
Both have a nonzero Reallocated_Sector_Ct and Seek_Error_Rate.
I cannot run an extended SMART test on the drive as due to some firmware
problem it doesn't move past 10% completion.
Do you think the other drive is failing also?
Thanks!
--
Peter Szymański <szyman(at)magres.net>