MCC-IP faulty backend connection causing multiple failed drives
Applies to
- ONTAP 9
- MetroCluster IP
- AFF A250
Issue
- increased disk failure rate caused by high latency reported in event log
Tue Feb 1 08:00:00 +0100 [netapp01-01: disk_latency_monitor: shm.ssd.threshold.ioLatency:notice]: SSD 0m.i2.3L13 has exceeded the expected block latency in the current timeframe with an average latency of 10089 us and an average utilization of 37 percent. The next highest SSD latency: 1339 us. Disk 0m.i2.3L13 Shelf 0 Bay 18 [NETAPP X] S/N [] UID []
ifstat -a
reporting increasing CRC errors on backend MCC port
-- interface e0d (50 days, 0 hours, 0 minutes, 1 seconds) --
CRC errors: 16984 | Runt frames: 0 | Fragment: 1344
- command aborts to remote drives reported in event log
Thu Feb 11 08:00:00 +0100 [netapp01-01:scsi_cmdblk_strthr_admin: scsi.cmd.abortedByHost:error]: Disk device 0v.i2.2L4: Command aborted by host adapter: HA status 0x13: cdb 0x28:16712464:0001.