Skip to main content
NetApp Knowledge Base

Continued packet loss when pinging from cluster LIF after cluster switch RCF upgrade

Views:
113
Visibility:
Public
Votes:
0
Category:
fabric-interconnect-and-management-switches
Specialty:
HW
Last Updated:

Applies to

  • Cisco NX3232C Cluster Network Switch (CNS)
  • RCF firmware update to 1.10 or later from 1.8 or earlier

Issue

  • All nodes continuously report the following events when pinging each others' cluster LIFs:

[vifmgr: vifmgr.cluscheck.ctdpktloss:debug]: Continued packet loss when pinging from cluster lif node-01_clus-1 (node node-01) to cluster lif node-02_clus2 (node node-02).

[vifmgr: vifmgr.cluscheck.droppedall:alert]: Total packet loss when pinging from cluster lif node-01_clus-1 (node node-01) to cluster lif node-02_clus2 (node node-02).

  • With half cluster ping-cluster failing. Example:

::*> cluster ping-cluster -node node-01
...
 Basic connectivity succeeds on 14 path(s)
 Basic connectivity fails on 14 path(s)
 ...
 Larger than PMTU communication succeeds on 14 path(s)
 RPC status:
 14 paths up, 0 paths down (tcp check)
 14 paths up, 0 paths down (udp check)

  • Every time a cluster port connected to the switch 1 is reverted to a LIF for the switch 2:
    •  EMS reports messages similar to:
vifmgr: vifmgr.dbase.checkerror:alert]: VIFMgr experienced an error verifying cluster database consistency. Some LIFs might not be hosted properly as a result.
vifmgr: vifmgr.startup.failover.err:alert]: VIFMgr encountered errors during startup.
  • vifmgr reports messages similar to:
[kern_vifmgr:info:6537] rdb::qm:...:src/rdb/quorum/qm_states/inq/SecondaryState.cc:222 (thr_id:0x80c138500) SecondaryState::receivePoll Leaving quorum at 21170636s apparent starvation or RPC failure at sender 1003. Sender expected VS_Unknown, actual WS_QuorumMember.
  • mgwd reports messages similar to:
[kern_mgwd:info:2343] A [src/rdb/quorum/qm_states/inq/SecondaryState.cc 217 (0x823d60300)]: receivePoll: Leaving quorum at 9068946s apparent starvation or RPC failure at sender 1003. Sender expected VS_Unknown, actual WS_QuorumMember.
[kern_mgwd:info:2343] A [src/rdb/cluster_events.cc 88 (0x823d60300)]: Report: Cluster event: node-event, epoch 31, site 1004 [apparent starvation detected in voting protocol].
[kern_mgwd:info:2325] W [src/rdb/TM.cc 3923 (0x821377f00)]: _coord_commit: TM 1003: Transaction TID <31,277502,277502> commit failed: UNIT_OFFLINE; declaring unstable quorum in epoch 31.  Total participating sites: 3, number of sites committed: 3, epsilon commit: true
[kern_mgwd:info:2325] rdb::TM:Mon Nov 06 11:06:47 2023:src/rdb/TM.cc:3933 (thr_id:0x821377f00) TM 1003: Transaction TID <31,277502,277502> commit failed: UNIT_OFFLINE; declaring unstable quorum in epoch 31.  Total participating sites: 3, number of sites committed: 3, epsilon commit: true
  • The issue remains, regardless the ISL is enabled or not (to isolate the traffic on each switch).

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.