CFDISK-1350: X359_S165330TATE drives failing after a power cycle on NA50
Issue
- Newly configured system with no user data saved, have multiple disk failed randomly, upon system power off and on.
EMS
[hamsg: disk.partner.diskFail:debug]: The partner with sysid xxxxxxxxx has failed 0d.11.15P1.
[hamsg: disk.partner.diskFail:debug]: The partner with sysid xxxxxxxxx has failed 0d.11.15P2.
[disk_server_0: scsi.debug:debug]: shm_setup_for_failure disk 0d.11.15 (S/N xxxxxxxxx) error 40000000h
[config_thread: raid.disk.offline:notice]: Marking Disk /aggr1/plex0/rg0/0d.11.15P3 Shelf 11 Bay 15 [NETAPP X359_S165330TATE NA50] S/N [xxxxxxxxx] UID [60025380:0470D710:500A0981:00000003:00000000:00000000:00000000:00000000:00000000:00000000] offline.
[disk_server_0: disk.fail.ssdstats:info]: Disk 0d.11.15 (xxxxxxxxx) failed with rated life used 0 %, percent spare blocks 0 %, spare blocks N/A.
<raid_shared_disk_exchange_1
disk_info="Disk 0d.11.9 Shelf 11 Bay 9 [NETAPP X359_S165330TATE NA50] S/N [******] UID [50025380:04634390:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000]"
event="FAIL_START"
local_state="failing"
local_substate="0x2"
partner_state="failing"
partner_substate="0x1"
failure_reason="failing"
sick_reason="INVALID"
offline_reason="NONE"
online_reason="NONE"
recv_reply="25d380d1-f9a2-11ef-b129-d039eab629d5"
host_type="1"
add_details="persistent 1, spare on unfail 0, awaiting done 0, awaiting prefail abort 0, awaiting offline abort 0, pool partitioning 0"
timestamp="1749445501"
shelf="11"
bay="9"
vendor="NETAPP "
model="X359_S165330TATE"
firmware_revision="NA50"
serialno="****"
disk_type="5"
disk_rpm="N/A"
carrier=""
site="Local"/>
- Node reboot due to multi disk failure
[config_thread: cf.multidisk.fatalProblem:error]: Node encountered a multidisk error or other fatal error while waiting to be taken over. aggr aggr_aggr2: raid volfsm, fatal multi-disk error.. Raid type - raid_dp Group name plex0/rg0 state DOUBLERECONS. 1 disk failed in the group. Disk 0d.11.14P1 Shelf 11 Bay 14 [NETAPP X359_S165330TATE NA50] S/N [XXXXXXXXXXXX] UID [60025380:0470DB90:500A0981:00000001:00000000:00000000:00000000:00000000:00000000:00000000] error: adapter error prevents command from being sent to device. Raid type - raid_dp Group name plex0/rg1 state DOUBLERECONS. 1 disk failed in the group. Disk /aggr_aggr2/plex0/rg1/0d.11.14P2 Shelf 11 Bay 14 [NETAPP X359_S165330TATE NA50] S/N [XXXXXXXXXXXX] UID [60025380:0470DB90:500A0981:00000002:00000000:00000000:00000000:00000000:00000000:00000000] error: adapter error prevents command from being sent to device..