Unexpected node reboot in MetroCluster IP

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 489

Visibility:: Public

Votes:: 0

Category:: metrocluster

Specialty:: metrocluster

Last Updated:

Applies to

ONTAP 9
MetroCluster IP
AFF-A700
X91146A T6 card

Issue

Node reboots unexpectedly with no indication of an issue
SP logs showing the HA partner is taking disk reservations, which would occur after a takeover (CLAM takeover):

Apr 18 04:01:40 [NodeA1:clam.node.ooq:EMERGENCY]: Node (name=NodeA2, ID=1001) is out of "CLAM quorum" (reason=quorum update). A disk reservation was detected on disk 7a.10.3P3 at 18Apr2023 04:01:44 Ordinarily, this will only occur if the partner node has taken over. This node will be shutdown. HALT: HA partner has taken over disk reservations Uptime: 47d18h37m13s System rebooting...

HA interconnect timeouts are reported shortly before the reboots and takeover is triggered due to heartbeat lost:

Sun Apr 18 20:35:39 +0200 [NodeA1: DR_heartbeat_thread: cf.ic.xferTimedOut:error]: HA interconnect: MCC_DRSOM transfer timed out. Sun Apr 18 20:35:39 +0200 [NodeA1: cf_firmware: cf.ic.xferTimedOut:error]: HA interconnect: OFW transfer timed out. Sun Apr 18 20:35:58 +0200 [NodeA1: cf_main:cf.fsm.takeover.noHeartbeat:alert]: Failover monitor: Takeover initiated after no heartbeat was detected from the partner node.