Skip to main content
NetApp Knowledge Base

MetroCluster: LIFs offline while performing giveback

Views:
7
Visibility:
Public
Votes:
0
Category:
metrocluster
Specialty:
MetroCluster
Last Updated:

Applies to

  • MetroCluster IP 
  • Cluster ports on MetroCluster backend ports
  • MetroCluster backend port offline 
  • Giveback 

Issue

  • Booting a node and performing a giveback while one or more MetroCluster backend ports are offline can cause cluster out of quorum
  • VifMgr (Virtual Interface Manager) will then be taken offline which in turn will trigger FreeBSD to take all LIFs offline to avoid duplicate IP conflicts

Example:

node_03 VifMgr fails to join quorum using cluster LIF 1 on e0a

[kern_vifmgr:info:9017] A [src/rdb/TM.cc 1621 (0x80ea38600)]: _triggerOnlineStatusCallback: TM 1002: Report UNIT_IS_OFFLINE (epoch 0, master 0). Reason: RW_TXN txn could not acquire transaction: RPC failure ().
[kern_vifmgr:info:9017] A [src/rdb/TM.cc 1625 (0x80ea38600)]: _triggerOnlineStatusCallback: FAILOVER rdb: Local unit VifMgr offline 


node_03 VifMgr attempts to move cluster LIF 1 to another port, but fails because it is OOQ

[kern_vifmgr:info:9017] [0x812356d00] [Net::CdbLifHandle::avoidDownPorts] LIF lif:cdb:node_03:node_03_clus1 (1000) is assigned to a down port (node_03:e0a). Attempting to reassign.
[kern_vifmgr:info:9017] Warning: Unable to list entries on node node_04. RPC: Port mapper failure [from vifmgr on node "node_03" (VSID: -3) to mgwd at 169.254.249.59]


node_04 VifMgr loses quorum because it fails to communicate with node_03

[kern_vifmgr:info:9156] A [src/rdb/cluster_events.cc 88 (0x80e836c00)]: Report: Cluster event: cluster-quorum-ends, epoch 31, site 1003 [not enough healthy nodes (1/2 healthy)].
[kern_vifmgr:info:9156] A [src/rdb/quorum/qm_states/inq/HoldingQuorumState.cc 55 (0x80e836c00)]: doWork: Master losing quorum, not enough votes to maintain quorum at 2248s.


node_04 does not regain quorum within 65 seconds grace period and offlines any LIFs that could be hosted on node_03 to avoid a splitbrain/duplicate IP scenario

[kern_vifmgr:info:9156] [0x80ae37300] [EventMgr::unitOffline] Setting VifMgr operational status as OOQ
[kern_vifmgr:info:9156] [0x80ae37300] [FailoverMgr::localNodeDown] VifMgr on node node_04 is now out of quorum.
[node_04: vifmgr: vifmgr.lifBeingRemoved:notice]: LIF data_01 (on virtual server 7), IP address 1.11.20.12, is being removed from node node_04, port a0a-120.

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.