Multiple failed drives after scsi.cmd.abortedByHost:error alerts
Applies to
- FAS8700
- Disk Drives X357_KPM6V3T8ATE
Issue
- Multiple reported SCSI errors reported against drives
[scsi.cmd.abortedByHost:error]: Device 0a.01.22: Command aborted by host adapter: HA status 0x0x4: cdb 0x28:56161ba8:0050. (Additional EMS parameters: deviceType="Disk" disk_information="")
[scsi.cmd.abortedByHost:error]: Device 0a.01.18: Command aborted by host adapter: HA status 0x0x4: cdb 0x28:6f9be9b0:0008. (Additional EMS parameters: deviceType="Disk" disk_information="")
[scsi.cmd.retrySuccess:debug]: Device 0a.01.18: request successful after retry #1: cdb 0x28:6f9be9b0:0008. (Additional EMS parameters: deviceType="Disk" freeRetryCount="0" dTime="8035")
[scsi.cmd.abortedByHost:error]: Device 0a.01.22: Command aborted by host adapter: HA status 0x0x4: cdb 0x28:5615abf8:0200. (Additional EMS parameters: deviceType="Disk" disk_information="")
[scsi.cmd.retrySuccess:debug]: Device 0a.01.22: request successful after retry #1: cdb 0x28:56161ba8:0050. (Additional EMS parameters: deviceType="Disk" freeRetryCount="0" dTime="8343")
[scsi.cmd.retrySuccess:debug]: Device 0a.01.22: request successful after retry #1: cdb 0x28:5615abf8:0200. (Additional EMS parameters: deviceType="Disk" freeRetryCount="0" dTime="8350")
[scsi.cmd.underrun:error]: Device 0a.01.4: Received a data underrun: cdb 0x28:79143a08:01a8. Not all the data was received. Possible transmission error. I/O will be retried. (Additional EMS parameters: deviceType="Disk" disk_information="")
- After receiving errors the drives begin to fail
[raid.disk.predictiveFailure:error]: Disk 0a.01.6 Shelf 1 Bay 6 [NETAPP X357_KPM6V3T8ATE NA50] S/N [XXXXXXXXXXXXX] UID [58CE38EE:222E26AC:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] reported a predictive failure and it is prefailed; it will be copied to a spare and failed (Additional EMS parameters: shelf="1" bay="6" vendor="NETAPP " model="X357_KPM6V3T8ATE" firmware_revision="NA50" serialno="XXXXXXXXXXXXX" disk_type="5" disk_rpm="N/A" carrier="" site="Local")
- Node panics due to multi-disk panic
- Upon reboot we see multiple drives failed to initialize
[Node-01:disk.init.failureBytes:error]: Failed disk 0d.01.8 detected during disk initialization.
[Node-01:disk.init.failureBytes:error]: Failed disk 0d.01.22 detected during disk initialization.
[Node-01:disk.init.failureBytes:error]: Failed disk 0d.01.4 detected during disk initialization.
[Node-01:disk.init.failureBytes:error]: Failed disk 0d.01.6 detected during disk initialization.
- Node is unable to boot into ONTAP due to root volume having failed drives
[raid.assim.rg.missingChild:debug]: Aggregate Aggr01, rgobj_verify: RAID object 0 has only 6 valid children, expected 11.
[raid.assim.plex.missingChild:debug]: Aggregate Aggr01, plexobj_verify: Plex 0 only has 0 working RAID groups (1 total) and is being taken offline
[raid.assim.mirror.noChild:debug]: Aggregate Aggr01, mirrorobj_verify: No operable plexes found.
[raid.assim.tree.noRootVol:error]: No usable root volume found!
- In maintenance mode we can see the output of
aggr status -r
Aggregate aggr01 (failed, raid_dp, partial, fast zeroed) (block checksums)
Plex /aggr01/plex0 (offline, failed, inactive)
RAID group /aggr01/plex0/rg0 (partial, block checksums)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0d.01.18P2 0d 1 18 SA:B 0 SSD N/A 1799343/3685054464 1799351/3685070848 (fast zeroed)
parity 0a.01.1P2 0a 1 1 SA:A 0 SSD N/A 1799343/3685054464 1799351/3685070848 (fast zeroed)
data FAILED N/A 1799343/ -
data 0a.01.3P2 0a 1 3 SA:A 0 SSD N/A 1799343/3685054464 1799351/3685070848 (fast zeroed)
data FAILED N/A 1799343/ -
data 0a.01.23P2 0a 1 23 SA:A 0 SSD N/A 1799343/3685054464 1799351/3685070848 (fast zeroed)
data FAILED N/A 1799343/ -
data FAILED N/A 1799343/ -
data 0a.01.21P2 0a 1 21 SA:A 0 SSD N/A 1799343/3685054464 1799351/3685070848 (fast zeroed)
data FAILED N/A 1799343/ -
data 0a.01.5P2 0a 1 5 SA:A 0 SSD N/A 1799343/3685054464 1799351/3685070848 (fast zeroed)
Raid group is missing 5 disks.