Cluster switches N9K C9336 reboot caused loss of cluster communication
Applies to
- FAS/AFF systems
- Cisco N9K-C9336C-FX2 Cluster switches
- NX-OS version 10.2.5
Issue
- Both cluster ports on all nodes go down simultaneously, resulting in a loss of cluster communication:
Sat Nov 01 00:30:34 [Node-01: mgmt_port_link_status_poll: netif.linkDown:info]: Ethernet e3b: Link down, check cable.Sat Nov 01 00:31:23 [Node-01: mgmt_port_link_status_poll: netif.linkDown:info]: Ethernet e3a: Link down, check cable.
Sat Nov 01 00:30:34 [Node-02: mgmt_port_link_status_poll: netif.linkDown:info]: Ethernet e3b: Link down, check cable.Sat Nov 01 00:31:23 [Node-02: mgmt_port_link_status_poll: netif.linkDown:info]: Ethernet e3a: Link down, check cable.
Sat Nov 01 00:30:35 [Node-03: mgmt_port_link_status_poll: netif.linkDown:info]: Ethernet e3b: Link down, check cable.Sat Nov 01 00:31:24 [Node-03: mgmt_port_link_status_poll: netif.linkDown:info]: Ethernet e3a: Link down, check cable.
Sat Nov 01 00:30:35 [Node-04: mgmt_port_link_status_poll: netif.linkDown:info]: Ethernet e3b: Link down, check cable.Sat Nov 01 00:31:24 [Node-04: mgmt_port_link_status_poll: netif.linkDown:info]: Ethernet e3a: Link down, check cable.
- All the nodes in the cluster go out of the CLAM quorum:
Sat Nov 01 00:32:30 [Node-01: kltp: clam.node.ooq:EMERGENCY]: Node (name=Node-01, ID=1000) is out of "CLAM quorum" (reason=node in minority).Sat Nov 01 00:32:31 [Node-02: kltp: clam.node.ooq:EMERGENCY]: Node (name=Node-01, ID=1000) is out of "CLAM quorum" (reason=node in minority).Sat Nov 01 00:32:22 [Node-03: kltp: clam.node.ooq:EMERGENCY]: Node (name=Node-01, ID=1000) is out of "CLAM quorum" (reason=node in minority).Sat Nov 01 00:32:31 [Node-04: kltp: clam.node.ooq:EMERGENCY]: Node (name=Node-01, ID=1000) is out of "CLAM quorum" (reason=node in minority).
- The cluster's RDB becomes unsynchronized, leading to a loss of quorum.
- The switch logs indicate that both cluster switches underwent a reboot and the cluster ports linked to these switches became inactive:
Cluster-switch1:
Sat Nov 1 04:39:01 2025: Card Uptime Record----------------------------------------------Uptime: 83, 0 days 0 hour(s) 1 minute(s) 23 second(s)Reset Reason: Unknown (0)
Cluster-switch-2:
Sat Nov 1 04:38:33 2025: Card Uptime Record----------------------------------------------Uptime: 83, 0 days 0 hour(s) 1 minute(s) 23 second(s)Reset Reason: Unknown (0)
