Skip to main content
NetApp Knowledge Base

Multiple failed drives after scsi.cmd.abortedByHost:error alerts

Views:
83
Visibility:
Public
Votes:
0
Category:
fas-systems
Specialty:
HW
Last Updated:

Applies to

  • FAS8700
  • Disk Drives X357_KPM6V3T8ATE

Issue

  • Multiple reported SCSI errors reported against drives

[scsi.cmd.abortedByHost:error]: Device 0a.01.22: Command aborted by host adapter: HA status 0x0x4: cdb 0x28:56161ba8:0050. (Additional EMS parameters: deviceType="Disk" disk_information="")
[scsi.cmd.abortedByHost:error]: Device 0a.01.18: Command aborted by host adapter: HA status 0x0x4: cdb 0x28:6f9be9b0:0008. (Additional EMS parameters: deviceType="Disk" disk_information="")
[scsi.cmd.retrySuccess:debug]: Device 0a.01.18: request successful after retry #1: cdb 0x28:6f9be9b0:0008. (Additional EMS parameters: deviceType="Disk" freeRetryCount="0" dTime="8035")
[scsi.cmd.abortedByHost:error]: Device 0a.01.22: Command aborted by host adapter: HA status 0x0x4: cdb 0x28:5615abf8:0200. (Additional EMS parameters: deviceType="Disk" disk_information="")
[scsi.cmd.retrySuccess:debug]: Device 0a.01.22: request successful after retry #1: cdb 0x28:56161ba8:0050. (Additional EMS parameters: deviceType="Disk" freeRetryCount="0" dTime="8343")
[scsi.cmd.retrySuccess:debug]: Device 0a.01.22: request successful after retry #1: cdb 0x28:5615abf8:0200. (Additional EMS parameters: deviceType="Disk" freeRetryCount="0" dTime="8350")
[scsi.cmd.underrun:error]: Device 0a.01.4: Received a data underrun: cdb 0x28:79143a08:01a8. Not all the data was received. Possible transmission error. I/O will be retried. (Additional EMS parameters: deviceType="Disk" disk_information="")

  • After receiving errors the drives begin to fail
[raid.disk.predictiveFailure:error]: Disk 0a.01.6 Shelf 1 Bay 6 [NETAPP   X357_KPM6V3T8ATE NA50] S/N [XXXXXXXXXXXXX] UID [58CE38EE:222E26AC:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] reported a predictive failure and it is prefailed; it will be copied to a spare and failed (Additional EMS parameters: shelf="1" bay="6" vendor="NETAPP  " model="X357_KPM6V3T8ATE" firmware_revision="NA50" serialno="XXXXXXXXXXXXX" disk_type="5" disk_rpm="N/A" carrier="" site="Local")
  • Node panics due to multi-disk panic
  • Upon reboot we see multiple drives failed to initialize
[Node-01:disk.init.failureBytes:error]: Failed disk 0d.01.8 detected during disk initialization.
[Node-01:disk.init.failureBytes:error]: Failed disk 0d.01.22 detected during disk initialization.
[Node-01:disk.init.failureBytes:error]: Failed disk 0d.01.4 detected during disk initialization.
[Node-01:disk.init.failureBytes:error]: Failed disk 0d.01.6 detected during disk initialization.
  • Node is unable to boot into ONTAP due to root volume having failed drives

[raid.assim.rg.missingChild:debug]: Aggregate Aggr01, rgobj_verify: RAID object 0 has only 6 valid children, expected 11.
[raid.assim.plex.missingChild:debug]: Aggregate Aggr01, plexobj_verify: Plex 0 only has 0 working RAID groups (1 total) and is being taken offline
[raid.assim.mirror.noChild:debug]: Aggregate Aggr01, mirrorobj_verify: No operable plexes found.
[raid.assim.tree.noRootVol:error]: No usable root volume found!

  • In maintenance mode we can see the output of aggr status -r which shows failed/missing drives

Aggregate aggr01 (failed, raid_dp, partial, fast zeroed) (block checksums)
  Plex /aggr01/plex0 (offline, failed, inactive)
    RAID group /aggr01/plex0/rg0 (partial, block checksums)

      RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------          ------------- ---- ---- ---- ----- --------------    --------------
      dparity   0d.01.18P2      0d    1   18  SA:B   0   SSD   N/A 1799343/3685054464 1799351/3685070848 (fast zeroed)
      parity    0a.01.1P2       0a    1   1   SA:A   0   SSD   N/A 1799343/3685054464 1799351/3685070848 (fast zeroed)
      data      FAILED                  N/A                        1799343/ -
      data      0a.01.3P2       0a    1   3   SA:A   0   SSD   N/A 1799343/3685054464 1799351/3685070848 (fast zeroed)
      data      FAILED                  N/A                        1799343/ -
      data      0a.01.23P2      0a    1   23  SA:A   0   SSD   N/A 1799343/3685054464 1799351/3685070848 (fast zeroed)
      data      FAILED                  N/A                        1799343/ -
      data      FAILED                  N/A                        1799343/ -
      data      0a.01.21P2      0a    1   21  SA:A   0   SSD   N/A 1799343/3685054464 1799351/3685070848 (fast zeroed)
      data      FAILED                  N/A                        1799343/ -
      data      0a.01.5P2       0a    1   5   SA:A   0   SSD   N/A 1799343/3685054464 1799351/3685070848 (fast zeroed)
      Raid group is missing 5 disks.

 

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.