CONTAP-400019: "NVRAM log not synchronized" SFO error for single node reporting "Send queue of QP WAFL is full"
Issue
- Both nodes in an HA pair sporadically reporting "NVRAM log not synchronized" messages. Example:
::> storage failover show -field reason
node reason
----------- ----------------------------
node_name-1 "NVRAM log not synchronized"
node_name-2 "NVRAM log not synchronized"
2 entries were displayed.
- The ONTAP Event Message System reports:
Fri Feb 14 12:55:32 +0100 [node_name-1: wafl_exempt03: rdma.rlib.queue.full:notice]: Send queue of QP WAFL is full.
...
Fri Feb 14 12:55:33 +0100 [node_name-1: cf_main: cf.fsm.takeoverOfPartnerDisabled:error]: Failover monitor: takeover of node_name-2 disabled (unsynchronized log).
Fri Feb 14 12:55:33 +0100 [node_name-1: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of node_name-1 by node_name-2 disabled (unsynchronized log).
cleared after a few seconds:
Fri Feb 14 12:55:50 +0100 [node_name-1: cf_main: cf.fsm.takeoverByPartnerEnabled:notice]: Failover monitor: takeover of node_name-1 by node_name-2 enabled
- And, the equivalent messages, in the partner, at the same time:
Fri Feb 14 12:55:34 +0100 [node_name-2: cf_main: cf.fsm.takeoverOfPartnerDisabled:error]: Failover monitor: takeover of node_name-1 disabled (unsynchronized log).
Fri Feb 14 12:55:34 +0100 [node_name-2: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of node_name-2 by node_name-1 disabled (unsynchronized log).
...
Fri Feb 14 12:55:43 +0100 [node_name-2: cf_main: cf.fsm.takeoverByPartnerEnabled:notice]: Failover monitor: takeover of node_name-2 by node_name-1 enabled
Fri Feb 14 12:55:51 +0100 [ClusterA-02: cf_main: cf.fsm.takeoverOfPartnerEnabled:notice]: Failover monitor: takeover of node_name-1 enabled
- No errors or issues noticed in the Interconnect ports and connections