Switchless cluster hits bug 1253791 then suffers power loss resulting in cluster app quorum issues
Applies to
- FAS2720
- ONTAP 9
- Two node switchless cluster
Issue
- One node previously panicked due to quorum loss as a result of bug 1253791 (e0a/e0b cluster ports go link down)
- Partial giveback because cluster apps cannot come online with cluster ports down, with
storage failover show
reporting:
Waiting for cluster applications to come online on the local node
- Power loss while in this state power cycles both nodes
- Previous node that had taken over/was cluster master, comes up with cluster applications offline with following error after boot:
Internal error: Cannot open corrupt replicated database. Automatic recovery
attempt has failed or is disabled. Check the event logs for details. This node
is not fully operational. Contact support personnel for the root volume recovery
procedures.
- Attempting to clear the
bootarg.rdb_corrupt
state through recovery procedures, taken over node becomes master for mgwd but other apps report "-" and previous master is secondary for mgwd with other apps offline - Example: Node cluster1-01 was the node that had originally panicked due to quorum loss as a result of bug 1253791, node 02 had taken over and was master before power loss/rdb recovery
- Node 01
cluster ring show
after rdb recovery:
::> set advanced
::*> cluster ring show
Node UnitName Epoch DB Epoch DB Trnxs Master Online
----------- -------- -------- -------- -------- ----------- ---------
cluster1-01 mgmt 21 21 107 cluster1-01 master
cluster1-01 vldb - - - - -
cluster1-01 vifmgr - - - - -
cluster1-01 bcomd - - - - -
cluster1-01 crs - - - - -
cluster1-02 mgmt 21 21 107 cluster1-01 secondary
cluster1-02 vldb 0 18 3295 - offline
cluster1-02 vifmgr 0 20 50 - offline
cluster1-02 bcomd 0 19 6 - offline
cluster1-02 crs 0 18 1 - offline
- Node 02
cluster ring show
after rdb recovery:
Node UnitName Epoch DB Epoch DB Trnxs Master Online
----------- -------- -------- -------- -------- ----------- ---------
cluster1-01 crs - - - - -
cluster1-02 mgmt 21 21 109 cluster1-01 secondary
cluster1-02 vldb 0 18 3295 - offline
cluster1-02 vifmgr 0 20 50 - offline
cluster1-02 bcomd 0 19 6 - offline
cluster1-02 crs 0 18 1 - offline