Panic on two-node cluster with interconnect down and no takeover
Applies to
- FAS2750 and other platforms with internal HA interconnect
- Two node clusters
- ONTAP 9
Issue
- Cluster master node panics with PCI Error NMI similar to:
PANIC: PCI Error NMI from device(s):ErrSrcID(CorrSrc(0x8),UCorrSrc(0)), RPT(0,1,0):PLX PCIE 8725 switch on Controller, X3311A in slot 1 on Controller.
- HA interconnect goes down at time of panic:
[?] Fri Apr 12 16:00:00 +0300 [cluster-01: statd: cf.takeover.disabled:alert]: HA mode, but takeover of partner is disabled due to reason : HA interconnect error. Verify that the partner node is running and that the HA interconnect cabling is correct, if applicable. For further assistance, contact technical support.
[?] Fri Apr 12 16:00:00 +0300 [cluster-01: statd: ic.HAInterconnectDown:error]: HA interconnect: Interconnect down for 29 minutes: links down
[?] Fri Apr 12 16:00:00 +0300 [cluster-01: statd: callhome.hainterconnect.down:alert]: Call home for HA INTERCONNECT DOWN due to links down.
- HA partner node remains up but stops serving data, with cluster applications all going offline (seen in output of advanced command
cluster ring show
).