ONTAP Select: Takeover is not possible and HA Interconnect RDMA down

Last updated

Jul 8, 2025
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 1,462

Visibility:: Public

Votes:: 0

Category:: ontap-9

Specialty:: ontapselect

Last Updated:: 7/8/2025, 2:04:53 PM

Applies to

NetApp ONTAP Select
HA-Interconnect (IC)
Storage Failover Takeover

Issue

ONTAP HA shows disabled:

::*> storage failover show

                                Takeover

Node            Partner         Possible State Description

--------------  --------------  -------- -------------------------------------

ontap-select-01 ontap-select-02 false    Waiting for ontap-select-02,

                                         Takeover is not possible: NVRAM log

                                         not synchronized

ontap-select-02 ontap-select-01 false    Waiting for ontap-select-01,

                                         Takeover is not possible: NVRAM log

                                         not synchronized, Disk inventory not

                                         exchanged

2 entries were displayed.

HA Interconnect Link is either up or down but IC RDMA connection is down:

::> set adv

Warning: These advanced commands are potentially dangerous; use them only when directed to do so by NetApp personnel.

Do you want to continue? {y|n}: y



::*> node run -node * -command ic status

2 entries were acted on.

Node: ontap-select-01

Link : up

IC RDMA connection : down

Node: ontap-select-02

Link : down

IC RDMA connection : down

When Link shows down, the status in VMWare for the vNIC shows connected - if not, then connect the vNIC in VMWare:

Note: refer to Solution section on how to identify the correct vNic MAC address

Event log shows a sequence of events when the problem triggers:

Note: some events like the ones with severity debug will not show under admin privilege and might require elevation of privilege level

::*> event log show

Sat May 27 2023 17:00:26 +00:00 [ontap-select-02:cf.ic.xferTimedOutVSA:notice]: HA interconnect: ofw transfer timed out.

Sat May 27 2023 17:00:26 +00:00 [ontap-select-02:cf.fm.partnerFwTransition:info]: prevstate="SF_UP", newstate="SF_UNKNOWN", progresscounter="0"

Sat May 27 2023 17:00:28 +00:00 [ontap-select-02:cf.fsm.takeoverOfPartnerDisabled:error]: Failover monitor: takeover of ontap-select-01 disabled (unsynchronized log).

Sat May 27 2023 17:00:29 +00:00 [ontap-select-02:ic.rdma.qpDisconnected:debug]: ofw is disconnected.

Sat May 27 2023 17:00:31 +00:00 [ontap-select-02:cf.ic.xferTimedOutVSA:notice]: HA interconnect: wafl transfer timed out.

Sat May 27 2023 17:00:31 +00:00 [ontap-select-02:nvmm.mirror.aborting:debug]: mirror of sysid 1, partner_type HA Partner and mirror state MIRROR_ONLINE is aborted because of reason Abort Pending.

Sat May 27 2023 17:00:31 +00:00 [ontap-select-02:sk.hog.runtime:notice]: Process wafl_exempt01 ran for 16048 milliseconds

Sat May 27 2023 17:00:31 +00:00 [ontap-select-02:mgr.stack.longrun.proc:notice]: Long running process: wafl_exempt01

Sat May 27 2023 17:00:31 +00:00 [ontap-select-02:mgr.stack.frame:notice]: Stack frame  0: maytag.ko::sk_save_stackframes(0xffffffff8942f6f0) + 0x30

Sat May 27 2023 17:00:31 +00:00 [ontap-select-02:ha.healthCheckRoundtrip:debug]: HA_HEALTH_CHECK request-id 7 start-timestamp 8294259962 round-trip time: 0 msecs.

Sat May 27 2023 17:00:31 +00:00 [ontap-select-02:ha.netPartition.other:debug]: Network partition due to other error. Duration 119 msecs, takeover wait 0 msecs; error code 5; status: 0x1001; request id: 7.

Sat May 27 2023 17:00:31 +00:00 [ontap-select-02:rastrace.dump.saved:debug]: A RAS trace dump for module IC instance 0 was stored in /etc/log/rastrace/IC_0_20230527_17:00:31:741638.dmp.

Sat May 27 2023 17:00:31 +00:00 [ontap-select-02:cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of ontap-select-02 by ontap-select-01 disabled (unsynchronized log).

Sat May 27 2023 17:00:35 +00:00 [ontap-select-02:rastrace.dump.saved:debug]: A RAS trace dump for module HA instance 0 was stored in /etc/log/rastrace/HA_0_20230527_17:00:35:668467.dmp.

Sat May 27 2023 17:00:51 +00:00 [ontap-select-02:nvmm.mirror.offlined:debug]: mirror="HA Partner Mirror Offlined"

Sat May 27 2023 17:00:58 +00:00 [ontap-select-02:rdma.rlib.queue.full:notice]: Send queue of QP Control is full.

Sat May 27 2023 17:00:58 +00:00 [ontap-select-02:ctrl.rdma.heartBeat:info]: HA interconnect: Missed heartbeat to 169.254.128.242.

Sat May 27 2023 17:00:58 +00:00 [ontap-select-02:sk.hog.runtime:notice]: Process ctrl_hb_port_e0f ran for 16051 milliseconds

Sat May 27 2023 17:00:58 +00:00 [ontap-select-02:mgr.stack.longrun.proc:notice]: Long running process: ctrl_hb_port_e0f



Sat May 27 2023 17:01:00 +00:00 [ontap-select-02:monitor.globalStatus.critical:EMERGENCY]: Controller failover of ontap-select-01 is not possible: unsynchronized log.

Sat May 27 2023 17:01:33 +00:00 [ontap-select-02:cf.diskinventory.sendFailed:debug]: reason="HA Interconnect down", errorCode="0"

ESXi vmkernel log at the same time shows:

2023-06-16T17:00:36.817Z esx001.corp.local vmkernel: cpu36:3239639)Vmxnet3: 21226: ontap-select-01.eth5,02:0c:00:00:80:f2, portID(67108922): Hang detected,numHangQ: 1, enableGen: 96

2023-06-16T17:00:36.817Z esx001.corp.local vmkernel: cpu36:3239639)NetSched: 752: 0x8400000f: received a force quiesce for port 0x400003a, dropped 9 pkts

2023-06-16T17:00:36.817Z esx001.corp.local vmkernel: cpu36:3239639)Vmxnet3: 21239: portID: 67108922, sop: 1542 eop: 1543 enableGen: 0 qid: 96, pkt: 0x45c995a9b900

2023-06-16T17:00:36.817Z esx001.corp.local vmkernel: cpu36:3239639)Vmxnet3: 21239: portID: 67108922, sop: 1540 eop: 1541 enableGen: 0 qid: 96, pkt: 0x45c98885c900

2023-06-16T17:00:36.817Z esx001.corp.local vmkernel: cpu36:3239639)Vmxnet3: 21239: portID: 67108922, sop: 1538 eop: 1539 enableGen: 0 qid: 96, pkt: 0x45c988887980

2023-06-16T17:00:36.817Z esx001.corp.local vmkernel: cpu36:3239639)Vmxnet3: 21239: portID: 67108922, sop: 1536 eop: 1537 enableGen: 0 qid: 96, pkt: 0x45c9940c4f80

2023-06-16T17:00:36.817Z esx001.corp.local vmkernel: cpu36:3239639)Vmxnet3: 21239: portID: 67108922, sop: 1534 eop: 1535 enableGen: 0 qid: 96, pkt: 0x45c98bd48d40

2023-06-16T17:00:36.817Z esx001.corp.local vmkernel: cpu36:3239639)Vmxnet3: 21239: portID: 67108922, sop: 1532 eop: 1533 enableGen: 0 qid: 96, pkt: 0x45c995b63d00

2023-06-16T17:00:36.817Z esx001.corp.local vmkernel: cpu36:3239639)Vmxnet3: 21239: portID: 67108922, sop: 1530 eop: 1531 enableGen: 0 qid: 96, pkt: 0x45c995ba5a00

2023-06-16T17:00:36.817Z esx001.corp.local vmkernel: cpu36:3239639)Vmxnet3: 21239: portID: 67108922, sop: 1528 eop: 1529 enableGen: 0 qid: 96, pkt: 0x45c99413bf00

2023-06-16T17:00:36.817Z esx001.corp.local vmkernel: cpu36:3239639)Vmxnet3: 21235: portID:67108922, QID: 0, next2TX: 1496, next2Comp: 1528, lastNext2TX: 1496, next2Write:3253, ringSize: 4096 inFlight: 18, delay(ms): 4622,txStopped: 0

2023-06-16T17:00:36.817Z esx001.corp.local vmkernel: cpu36:3239639)Vmxnet3: 21226: ontap-select-01.eth5,02:0c:00:00:80:f2, portID(67108922): Hang detected,numHangQ: 1, enableGen: 96