Unsynchronized logs and Takeover Disabled seen on one HA pair in MetroCluster IP
- Views:
- 38
- Visibility:
- Public
- Votes:
- 0
- Category:
- ontap-9
- Specialty:
- metrocluster
- Last Updated:
- 5/1/2025, 11:28:21 AM
Applies to
- MetroCluster IP
- ONTAP versions lower than 9.9.1
Issue
- HA-Interconnect errors seen, and takeover is disabled:
Sun Feb 02 03:45:57 -0500 [Node-02: nvmm_error: rdma.rlib.event.error:debug]: QP wafl event error: client disconnect.
Sun Feb 02 03:45:57 -0500 [Node-02: nvmm_error: nvmm.mirror.offlined:debug]: params: {'mirror': 'HA_PARTNER'}
Sun Feb 02 03:45:57 -0500 [Node-02: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of Node-02 by Node-01 disabled (unsynchronized log).
Sun Feb 02 03:46:00 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_LAYOUT_SYNCING to NVMM_MIRROR_LAYOUT_SYNCED and took 5 msecs.
Sun Feb 02 03:46:00 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_LAYOUT_SYNCED to NVMM_MIRROR_SYNCING_START and took 0 msecs.
Sun Feb 02 03:46:00 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.aborting:debug]: mirror of sysid 1, partner_type HA Partner and mirror state NVMM_MIRROR_SYNCING_START is aborted because of reason NVMM_ERR_STREAM_MAP.
Sun Feb 02 03:46:00 -0500 [Node-02: nvmm_error: nvmm.mirror.aborting:debug]: mirror of sysid 1, partner_type HA Partner and mirror state NVMM_MIRROR_OFFLINE is aborted because of reason NVMM_ABORT_SYNCING_MIRROR.
- HA-interconnect re-establishes seconds later, and takeover is enabled:
Sun Feb 02 03:46:00 -0500 [Node-02: iw_cm_wq: rdma.rlib.connected:debug]: wafl:HA:A QP is now connected.
Sun Feb 02 03:46:00 -0500 [Node-02: iw_cm_wq: rdma.rlib.connected:debug]: raid:HA:A QP is now connected.
Sun Feb 02 03:46:00 -0500 [Node-02: iw_cm_wq: rdma.rlib.connected:debug]: misc:HA:A QP is now connected.
Sun Feb 02 03:46:00 -0500 [Node-02: iw_cm_wq: rdma.rlib.connected:debug]: wafl:HA:A QP is now connected.
Sun Feb 02 03:46:00 -0500 [Node-02: iw_cm_wq: rdma.rlib.connected:debug]: raid:HA:A QP is now connected.
Sun Feb 02 03:46:00 -0500 [Node-02: iw_cm_wq: rdma.rlib.connected:debug]: misc:HA:A QP is now connected.
Sun Feb 02 03:46:00 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_LAYOUT_SYNCING to NVMM_MIRROR_LAYOUT_SYNCED and took 4 msecs.
Sun Feb 02 03:46:00 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_SYNCING_START to NVMM_MIRROR_CP1_START and took 26 msecs.
Sun Feb 02 03:46:00 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_CP1_START to NVMM_MIRROR_WAFL_INIT and took 270 msecs.
Sun Feb 02 03:46:00 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_WAFL_INIT to NVMM_MIRROR_CP2_FINISH and took 20 msecs.
Sun Feb 02 03:46:01 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_CP2_FINISH to NVMM_MIRROR_WAFL_HEADER and took 543 msecs.
Sun Feb 02 03:46:01 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_WAFL_HEADER to NVMM_MIRROR_SYNCING_OTHER and took 1 msecs.
Sun Feb 02 03:46:01 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_SYNCING_OTHER to NVMM_MIRROR_ONLINE and took 169 msecs.
Sun Feb 02 03:46:01 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.onlined:debug]: params: {'mirror': 'HA_PARTNER'}
Sun Feb 02 03:46:02 -0500 [Node-02: cf_main: cf.fsm.takeoverByPartnerEnabled:notice]: Failover monitor: takeover of Node-02 by Node-01 enabled
- Network congestion error seen in EMS:
Mon Feb 03 13:03:36 -0500 [Node-01: mccip_mirror_congestion_mgr_p: mcc.network.congestion:notice]: Network congestion detected. Action taken: Increased ic_timeout to 2000 msec.