CHW-3644: AFF-C800 MDP during partner node reseat
Issue
- Partner node experienced an L2 Watchdog reset and has been taken over
- All disks properly seated and pushed in
- Partner node is reseated, during this time the node that has taken over experiences a multi-disk fault:
PANIC: aggrname: raid volfsm, fatal multi-disk error.. - Link disabled messages are seen prior to the multi-disk fault:
[?] Wed Nov 19 22:44:47 -0500 [nodename: kernel: nvme.link.disabled.error:error]: PCIe link disabled for NVMe SSD in slot 6 due to excessive errors.
[?] Wed Nov 19 22:44:47 -0500 [nodename: kernel: nvme.link.disabled.error:error]: PCIe link disabled for NVMe SSD in slot 4 due to excessive errors.
[?] Wed Nov 19 22:44:47 -0500 [nodename: kernel: nvme.link.disabled.error:error]: PCIe link disabled for NVMe SSD in slot 1 due to excessive errors.
[?] Wed Nov 19 22:44:47 -0500 [nodename: kernel: nvme.link.disabled.error:error]: PCIe link disabled for NVMe SSD in slot 8 due to excessive errors.
[?] Wed Nov 19 22:44:47 -0500 [nodename: kernel: nvme.link.disabled.error:error]: PCIe link disabled for NVMe SSD in slot 7 due to excessive errors.
[?] Wed Nov 19 22:44:47 -0500 [nodename: kernel: nvme.link.disabled.error:error]: PCIe link disabled for NVMe SSD in slot 30 due to excessive errors. - Scsi check conditions are seen prior to the multi-disk fault:
Wed Nov 19 22:44:12 -0500 [vama822-03: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Unknown device 0n.1L9998: Check Condition: CDB 0x12: Sense Data SCSI:aborted command - (0xb - 0x90 0x6 0xfa)(26876).
Wed Nov 19 22:44:12 -0500 [vama822-03: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Unknown device 0n.4L9998: Check Condition: CDB 0x12: Sense Data SCSI:aborted command - (0xb - 0x90 0x6 0xfa)(26877).
Wed Nov 19 22:44:12 -0500 [vama822-03: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Unknown device 0n.30L9998: Check Condition: CDB 0x12: Sense Data SCSI:aborted command - (0xb - 0x90 0x6 0xfa)(26876).
Wed Nov 19 22:44:12 -0500 [vama822-03: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Unknown device 0n.8L9998: Check Condition: CDB 0x12: Sense Data SCSI:aborted command - (0xb - 0x90 0x6 0xfa)(26876).
Wed Nov 19 22:44:12 -0500 [vama822-03: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Unknown device 0n.6L9998: Check Condition: CDB 0x12: Sense Data SCSI:aborted command - (0xb - 0x90 0x6 0xfa)(26877).
Wed Nov 19 22:44:12 -0500 [vama822-03: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Unknown device 0n.7L9998: Check Condition: CDB 0x12: Sense Data SCSI:aborted command - (0xb - 0x90 0x6 0xfa)(26877).
Wed Nov 19 22:44:42 -0500 [vama822-03: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Unknown device 0n.6L9998: Check Condition: CDB 0x12: Sense Data SCSI:aborted command - (0xb - 0x90 0x6 0xfa)(57106).
Wed Nov 19 22:44:42 -0500 [vama822-03: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Unknown device 0n.1L9998: Check Condition: CDB 0x12: Sense Data SCSI:aborted command - (0xb - 0x90 0x6 0xfa)(57110).
Wed Nov 19 22:44:42 -0500 [vama822-03: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Unknown device 0n.4L9998: Check Condition: CDB 0x12: Sense Data SCSI:aborted command - (0xb - 0x90 0x6 0xfa)(57111).
Wed Nov 19 22:44:42 -0500 [vama822-03: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Unknown device 0n.8L9998: Check Condition: CDB 0x12: Sense Data SCSI:aborted command - (0xb - 0x90 0x6 0xfa)(57120).
Wed Nov 19 22:44:42 -0500 [vama822-03: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Unknown device 0n.7L9998: Check Condition: CDB 0x12: Sense Data SCSI:aborted command - (0xb - 0x90 0x6 0xfa)(57129).
Wed Nov 19 22:44:42 -0500 [vama822-03: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Unknown device 0n.30L9998: Check Condition: CDB 0x12: Sense Data SCSI:aborted command - (0xb - 0x90 0x6 0xfa)(57133).
- After reseat the partner is able to recover from L2 watchdog reset
- Node which experienced the panic recovers after booting back up
