Both CVO HA nodes in Azure randomly shutdown at few days intervals
Applies to
- Cloud Volumes ONTAP
- Azure
Issue
Both CVO HA nodes in Azure randomly shutdown with takeover at irregular intervals.
EMS logs
report an initiated shutdown:Node-01
Fri Jul 28 20:07:58 +0100 [node-01: shutdown_thread0: ha.localNodeShutDown:notice]: Shutdown of the local node has been initiated with inhibit_takeover set to FALSE.
Fri Jul 28 20:07:58 +0100 [node-01: cf_main: cf.fsm.takeoverOfPartnerDisabled:error]: Failover monitor: takeover of node-02 disabled (local halt in progress).
Fri Jul 28 20:08:06 +0100 [node-01: shutdown_thread0: kern.shutdown:notice]: System shut down because : "D-blade Halting".
Node-02
Fri Jul 28 20:08:08 +0100 [node-02: cf_main: cf.fsm.takeover.on.halt:info]: Failover monitor: Node initiated automatic takeover after detecting that its partner node has halted.
Fri Jul 28 20:08:08 +0100 [node-02: cf_main: cf.fsm.stateTransit:info]: Failover monitor: UP --> TAKEOVER
Fri Jul 28 20:08:08 +0100 [node-02: cf_takeover: ha.takeover.stateChng:debug]: params: {'old_state': 'NOT_IN_TAKEOVER', 'new_state': 'IN_CFO_TAKEOVER'}
Fri Jul 28 20:08:08 +0100 [node-02: cf_takeover: cf.fm.takeoverStarted:notice]: Failover monitor: takeover started
Fri Jul 28 20:08:15 +0100 [node-02: cf_takeover: cf.fm.takeoverComplete:notice]: Failover monitor: takeover completed
Fri Jul 28 20:08:15 +0100 [node-02: cf_takeover: cf.fm.takeoverDuration:info]: Failover monitor: takeover duration time is 7 seconds.
Fri Jul 28 20:08:47 +0100 [node-02: shutdown_thread0: ha.localNodeShutDown:notice]: Shutdown of the local node has been initiated with inhibit_takeover set to FALSE.
Fri Jul 28 20:08:55 +0100 [node-02: shutdown_thread0: kern.shutdown:notice]: System shut down because : "D-blade Halting".