E-Series drive failure - AutoSupport Message

Last updated

Aug 9, 2022
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 5,123

Visibility:: Public

Votes:: 0

Category:: e-series-systems

Specialty:: esg

Last Updated:: 8/9/2022, 3:22:45 PM

Applies to

NetApp E-Series

Event Summary

SANtricity reports a failed drive for a hard disk drive (HDD) or solid state drive (SSD)

Validate

Login to SANtricity Storage Manager or SANtricity System Manager
Confirm SANtricity reports one of more of the following error messages in the Recovery GURU:
- FAILED_DRIVE-Recovery Failure Type Code: 23 Failed Drive
- FAILED_DRIVE-Recovery Failure Type Code: 23 Failed Physical Disk - Unassigned or Standby Hot Spare
- DISK_POOL_DRIVE_FAILURE-Recovery Failure Type Code: 443 Failed Disk Pool Drive
Confirm from the Hardware tab that the drive shows as failed

Resolution

Contact NetApp Technical Support as this will require a drive replacement
If AutoSupport is enabled, this event should have already created a technical support case

Additional Information

Possible causes for drive failures:

A user manually failed the drive via the shell, SMcli, or the GUI.
An unrecovered write error / write failure.
- Failure of a host initiated write.
- Failure of an internally initiated write.
Failure of a drive firmware download.
Exceeding a synthetic predictive failure analysis (SPFA) threshold. (Controller reported / detected errors and initiated failure).
- Drive firmware may be a factor here if there are any known issues that would cause premature drive failures.
Exceeding a SMART (Self-Monitoring, Analysis and Reporting Technology) failure threshold. (Drive reported / detected errors and initiated failure).

For Legacy/Traditional RAID Volume Groups:

Reconstruction to a hot spare should begin automatically when the drive fails if a hot spare is available.
Once reconstruction to the hot spare completes replace the failed drive.
- Best practice recommendation is to wait until the failed drive disappears from the GUI before inserting the new drive.
Once copy back/reconstruction has completed the system will return to optimal.
When the new drive is inserted, copy back from the hot spare or reconstruction from parity should begin to the new drive automatically.

For Dynamic Disk Pools:

Reconstruction / Rebalance will begin automatically when the drive fails if there is preservation capacity available.
Once reconstruction / rebalancing completes replace the failed drive.
- Best practice recommendation is to wait until the failed drive disappears from the GUI before inserting the new drive.
When the new drive is inserted copy back / rebalancing will begin to the new drive automatically.
Once copy back / rebalancing has completed the system will return to optimal.

If there are any questions or concerns with the details and procedures above, please contact NetApp Technical Support and reference this article for further assistance.