Node get out of quorum when giveback from partner node during ONTAP upgrade
Applies to
- ONTAP 9
- AFF-A400
Issue
- ONTAP Upgrade fails with "
Failed to get ONTAP version of the node node2" from 9.8P5 to 9.8P20 using ONTAP System Manager - During the ONTAP upgrade, the secondary node get out of quorum when giveback from the master node.
[2023-12-29 13:15:18.994] 12/29/2023 11:42:33 node1 ERROR upgrademgr.update.pausedErr: The automated update of the cluster has been paused due to the following reason: Node "node1": Error: {Failed to get ONTAP version of the node "node2".}, Action: {Verify that the node "node2" was booted with the intended version using the "system image show" command.}.
[2023-12-29 13:15:19.007] 12/29/2023 11:41:00 node1 EMERGENCY clam.node.ooq: Node (name=node2, ID=1001) is out of "CLAM quorum" (reason=seen by HA partner).
[2023-12-29 13:15:19.007] 12/29/2023 11:41:00 node1 EMERGENCY callhome.clam.node.ooq: Call home for NODE(S) OUT OF CLUSTER QUORUM.
- Both cluster ports going down with switchless cluster configuration. Port active LED is off.
[2023-12-29 13:15:19.032] 12/29/2023 11:26:43 node1 ERROR vifmgr.port.monitor.failed: The "link_flapping" health check for port e3b (node node1) has failed. The port is operating in a degraded state.
[2023-12-29 13:15:19.034] 12/29/2023 11:26:43 node1 ERROR vifmgr.port.monitor.failed: The "link_flapping" health check for port e3a (node node1) has failed. The port is operating in a degraded state.
- X1151A card firmware is 1.4.0-E-96 which have fix of bug 1383080
- Attempt the following actions, none of which helps this issue:
-
Reseat the SFPs and cables that connected to the cluster ports
-
Perform a loopback test from e3a to e3b on each local node to identify the faulty side and all cluster ports fail to link up
-
Reseat the card in slot 3
-
Perform a power cycle on both nodes
