Single disk failure X4013S172B7T6NTE leading to WAFL hung panic
Applies to
- AFF A400
- X4013S172B7T6NTE disk drive
- NA56 disk firmware
Issue
1. Disk reports aborted command with codes 0xb - 0x90 0x5
[Node-01: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device e5a.00.3.16L0: Check Condition: CDB 0xe2: Sense Data SCSI:aborted command - (0xb - 0x90 0x5 0xfb)(732).
[Node-01: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device e5a.00.3.16L0: Check Condition: CDB 0xe2: Sense Data SCSI:aborted command - (0xb - 0x90 0x5 0xfb)(984).
[Node-01: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device e5a.00.3.16L0: Check Condition: CDB 0xe2: Sense Data SCSI:aborted command - (0xb - 0x90 0x5 0xfb)(633).
[Node-01: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device e5a.00.3.16L0: Check Condition: CDB 0xe2: Sense Data SCSI:aborted command - (0xb - 0x90 0x5 0xfb)(913).
2. Disk request tries again but also fails
[Node-01: scsi_cmdblk_strthr_admin: scsi.cmd.pastTimeToLive:error]: Disk device e5a.00.3.16L0: request failed after try #2: cdb 0xe2.
3. ONTAP offlines the disk
[Node-01: config_thread: raid.disk.offline:notice]: Marking Disk /aggr1/plex0/rg0/e0c.00.0.16P2 Shelf 0 Bay 16 [NETAPP X4011S172B3T8NTE NA56] S/N [XXXXXXXXXXXXXXXXXX] UID [65375930:4E300259:00253841:00000004:500A0981:00000002:00000000:00000000:00000000:00000000] offline.
[Node-01: config_thread: raid.shared.disk.exchange:info]: Received shared disk state exchange Disk e0c.00.0.16 Shelf 0 Bay 16 [NETAPP X4011S172B3T8NTE NA56] S/N [XXXXXXXXXXXXXXXXXX] UID [35375930:4E300259:00253841:00000004:00000000:00000000:00000000:00000000:00000000:00000000], event NONE, state offlining, substate 0x1000000, partner state offlining, partner substate 0x400000, failure reason unknown, sick reason INVALID, offline reason AGRSV_TIMEOUT, online reason NONE, partner dblade ID cde75058-9877-11eb-aeae-d039ea1e4755, host 1 persistent 0, spare on unfail 0, awaiting done 0, awaiting prefail abort 0, awaiting offline abort 0, pool partitioning 0
[Node-01: config_thread: raid.disk.offline:notice]: Marking partner:Disk e0c.00.0.16 Shelf 0 Bay 16 [NETAPP X4011S172B3T8NTE NA56] S/N [XXXXXXXXXXXXXXXXXX] UID [35375930:4E300259:00253841:00000004:00000000:00000000:00000000:00000000:00000000:00000000] offline.
4. System hits WAFL hung panic
[Node-01: wafl_exempt10: sk.panic:alert]: Panic String: WAFL hung for aggr1. in SK process wafl_exempt10 on release 9.8P8 (C)