Node out of quorum during cluster switch reboot due to improper cluster network cabling
Applies to
- ONTAP 9
- Cluster Network Switches
- Connectivity, Liveliness and Availability Monitor (CLAM)
Issue
- During a cluster switch upgrade/reboot, the data traffic connectivity is lost briefly.
- Node data LIFs failover to another node
- Review of the EMS log file output shows the cluster is out of quorum:
[Node-02: kltp: clam.node.ooq:EMERGENCY]: Node (name=Node-01, ID=1000) is out of "CLAM quorum" (reason=node in minority).
[Node-02: kltp: clam.node.ooq:EMERGENCY]: Node (name=Node-03, ID=1002) is out of "CLAM quorum" (reason=node in minority).- Both cluster ports of the affected nodes go down at the same time:
[Node-01: vifmgr: vifmgr.portdown:notice]: A link down event was received on node Node-01, port e0a.
[Node-01: vifmgr: vifmgr.clus.linkdown:EMERGENCY]: The cluster port e0a on node Node-01 has gone down unexpectedly.
[Node-01: vifmgr: vifmgr.portdown:notice]: A link down event was received on node Node-01, port e0b.
[Node-01: vifmgr: vifmgr.clus.linkdown:EMERGENCY]: The cluster port e0b on node Node-01 has gone down unexpectedly.
- Nodes may encounter CLAM panic and reboot:
[Node-01: gop_eq_thread: sk.panic:alert]: Panic String: Received PANIC packet from partner, receiving message is (Coredump and takeover initiated because Connectivity, Liveliness and Availability Monitor (CLAM) has determined this node is out of quorum.) in SK process gop_eq_thread on release 9.10.1P6 (C)
- The cluster applications of the affected nodes would go offline.
