Frequent drive failures on a Shelf

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 54

Visibility:: Public

Votes:: 0

Category:: disk-drives

Specialty:: hw

Last Updated:

Applies to

AFF/FAS/ASA
ONTAP 9
Disk Shelves
MetroCluster configurations

Issue

Multiple SSDs in a Shelf are repeatedly failing.
After each disk replacement and RAID reconstruction, new failures occur in the same shelf. The system eventually ran out of spare disks, placing the aggregate in a degraded state.
Sample Logs:

Thu Oct 02 09:35:41 node2-01: disk_server_1: shm.threshold.consecutiveTimeouts:error]: shm: Disk 4d.20.1 has exceeded the threshold of 11 consecutive timeouts; the system will fail the disk if possibleThu Oct 02 11:15:45 node2-01: disk_server_1: shm.threshold.consecutiveTimeouts:error]: shm: Disk 1c.20.22 has exceeded the threshold of 11 consecutive timeouts; the system will fail the disk if possible.

20.1 : NETAPP X670_ABCDEFGHIJK NA51 14651.0GB 520B/sect (7XXXXXXX3) (Failed)
20.22: NETAPP X670_ABCDEFGHIJK NA51 14651.0GB 520B/sect (8XXXXXXX3) (Failed)

Checked and found that the shelf modules are running on a very older firmware version:
Example:

Shelf 20: DS224-12 Firmware rev. IOM12B A: 0141 IOM12B B: 0141