MetroCluster FC multiple disks fail
Applies to
- MetroCluster FC
- ONTAP 9
- ATTO FibreBridge 7500N/7600N
Issue
- Multiple disks fail due to excessive command aborted error and device timeouts.
- The command aborted errors and device timeouts are seen on disks on the same path:
[cl01-n02: isp2400_intrd: scsi.cmd.abortedByHost:error]: Disk deviceswitch1:8.126L29: Command aborted by host adapter: HA status 0x4: cdb 0x9a:0000000014345600:0001:0168.
[cl01-n02: isp2400_intrd: scsi.cmd.abortedByHost:error]: Disk deviceswitch1:8.126L30: Command aborted by host adapter: HA status 0x4: cdb 0x9a:000000003d12d800:0001:0200.
[cl01-n02: isp2400_intrd: scsi.cmd.abortedByHost:error]: Disk deviceswitch1:8.126L38: Command aborted by host adapter: HA status 0x4: cdb 0x9a:000000018ef51200:0001:0200.
[cl01-n02: isp2400_timeout_3: fci.device.timeout:debug]: HBA 1a encountered a device timeout on Disk deviceswitch1:8.126 (0x04070800) LUN 29 cdb 0x9a:0000000014345600:0001:0168 retry: 0
[cl01-n02: disk_server_0: shm.threshold.consecutiveTimeouts:error]: shm: Disk deviceswitch1:8.126L29 has exceeded the threshold of 11 consecutive timeouts; the system will fail the disk if possible.
porterrshow
andsfpshow
output for the switch port show no errors and SFP TX/RX values are within normal limits- Logs from the ATTO FibreBridge connected to the switch port stated in the error show multiple instances of the following error:
INFO FC TM Cmd Rcvd: Abort Task Set to LUN:X on FC Port 1
- The SFP in the FibreBridge and Switch port have been replaced and the optical cable but the disk and FibreBridge errors are still visible.