I have a machine running centOS 5.3. It has a 6-disk raid 5 array. According to the raid web console 2, The raid card appears to be SRCSASBB81.
About a week ago, I started to receive these predictive failure warnings (once per day).
Controller ID: 0 PD Predictive failure: --:--:4
Generated on:Mon Sep 16 08:29:57 2013
SYSTEM DETAILS---
IP Address: REDACTED
OS Name: Linux
OS Version: 2.6
Driver Name: megaraid_sas
Driver Version: 00.00.04.01-RH1
IMAGE DETAILS---
BIOS Version: 1.12.122-0393
Firmware Package Version: 8.0.1-0029
Firmware Version: NT16
So, I started the intel raid web console, looked at all the drives and saw that drive 4 did have a "pred fail count" of 1. All the other disks had 0 in that field. I figured that's what the "--:--:4" in the warning was referring to. I backed up everything on the raid, identified the physical location of all drives then using the raid web console took drive 4 off line (putting the raid into a degraded state). The light on the physical drive in the expected location turned orange - as expected. I removed the disk and replaced it with a new one. The raid rebuilt and came back to optimal with the new disk. All went as planned. Yay!
However, every morning at 7:30 AM, I still get this same predictive failure warning. The "pred fail count" on the new disk (like all the others) is now 0. Everything looks fine. Is there some file where I have to manually reset some failure count? I can't see anything in the UI that indicates there is something else I need to do.
Please help me understand what's going on and what further steps I should take
Thanks.
-J