Frequent drive failures on a Shelf
Applies to
- AFF/FAS/ASA
- ONTAP 9
- Disk Shelves
- MetroCluster configurations
Issue
- Multiple SSDs in a Shelf are repeatedly failing.
- After each disk replacement and RAID reconstruction, new failures occur in the same shelf. The system eventually ran out of spare disks, placing the aggregate in a degraded state.
- Sample Logs:
Thu Oct 02 09:35:41 node2-01: disk_server_1: shm.threshold.consecutiveTimeouts:error]: shm: Disk 4d.20.1 has exceeded the threshold of 11 consecutive timeouts; the system will fail the disk if possibleThu Oct 02 11:15:45 node2-01: disk_server_1: shm.threshold.consecutiveTimeouts:error]: shm: Disk 1c.20.22 has exceeded the threshold of 11 consecutive timeouts; the system will fail the disk if possible.
20.1 : NETAPP X670_ABCDEFGHIJK NA51 14651.0GB 520B/sect (7XXXXXXX3) (Failed)20.22: NETAPP X670_ABCDEFGHIJK NA51 14651.0GB 520B/sect (8XXXXXXX3) (Failed)
- Checked and found that the shelf modules are running on a very older firmware version:
- Example:
Shelf 20: DS224-12 Firmware rev. IOM12B A: 0141 IOM12B B: 0141
