Inconsistent mailbox disk state after nodes reboot due to cooling failure
Applies to
- ONTAP 9
Issue
- After both nodes reboot due to a site AC failure, the node that was taken over boots up first, and accidentally overrides the mailbox disk state:
waiting for giveback...
Do you wish to halt this node rather than wait [y/n]? n
The HA partner appears to be either not operational or not in takeover
mode. You will be asked whether you want to continue. If you answer "yes", the
existing failover monitor disk state will be overwritten and this node will be
rebooted. Answering "no" will halt this node with no modification to the failover
monitor disk state.
WARNING: Answering "yes" while the HA partner is operational and in
takeover mode will have unexpected and potentially catastrophic results:
YOUR FILESYSTEMS MAY BE DESTROYED
Do you wish to continue [y/n]?
Please answer yes or no.
Do you wish to continue [y/n]? y
Jun 11 10:22:02 [cluster1-02:cf.fm.overwriteState:notice]: System continuing after overwriting failover monitor state!
- When the node boots back up to the cluster, we see node 1 thinks it is in takeover, while node 2 is booted:
cluster1::> sto fa show
(storage failover show)
Takeover
Node Partner Possible State Description
-------------- -------------- -------- -------------------------------------
cluster1-01
cluster1- false In takeover
02
cluster1-02
cluster1- false Connected to cluster1-01,
01 Takeover is not possible: NVRAM log
not synchronized