Skip to main content
NetApp Knowledge Base

Vifmgr: Packet loss when pinging from one cluster lif to another cluster lif

Views:
1,402
Visibility:
Public
Votes:
0
Category:
fabric-interconnect-and-management-switches
Specialty:
hw
Last Updated:
11/28/2024, 1:59:43 AM

Applies to

  • Cluster Network Switch
  • ONTAP 9

Issue

  • Similar type of EMS-messages are seen for all cluster nodes:

Fri Nov 19 18:06:27 +0100 [node1: vifmgr: vifmgr.cluscheck.ctdpktloss:alert]: Continued packet loss when pinging from cluster lif node1_clus2 (node node1) to cluster lif node5_clus1 (node node5)
 
Thu Dec 23 03:36:41 +0100 [node2: vifmgr: vifmgr.cluscheck.droppedlarge:alert]: Partial packet loss when pinging from cluster lif node2_clus1 (node node2) to cluster lif node6_clus2 (node node6)
 
Tue Dec 28 16:54:49 +0100 [node3: vifmgr: vifmgr.cluscheck.droppedall:alert]: Total packet loss when pinging from cluster lif node3_clus2 (node node3) to cluster lif node1_clus1 (node node1)
 
  • Symptoms indicate network traffic issues between the two cluster switches via the inter switch link (ISL), since lot of cluster ports report issues. Example:

::> event show -message-name *vifmgr.cluscheck*
Time Node Severity Event
------------------- ---------------- ------------- ---------------------------
8/24/2022 08:14:27 node_name-01 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-01_clus1 (node node_name-01) to cluster lif node_name-11_clus2 (node node_name-11).
8/23/2022 18:36:43 node_name-12 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-12_clus1 (node node_name-12) to cluster lif node_name-11_clus2 (node node_name-11).
8/23/2022 12:41:38 node_name-11 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-01_clus2 (node node_name-01).
8/23/2022 09:33:27 node_name-02 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-02_clus1 (node node_name-02) to cluster lif node_name-11_clus2 (node node_name-11).
8/23/2022 08:28:35 node_name-11 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-12_clus2 (node node_name-12).
8/21/2022 13:58:34 node_name-12 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-12_clus1 (node node_name-12) to cluster lif node_name-01_clus2 (node node_name-01).
8/21/2022 13:36:54 node_name-01 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-01_clus1 (node node_name-01) to cluster lif node_name-11_clus2 (node node_name-11).
8/21/2022 01:51:56 node_name-01 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-01_clus1 (node node_name-01) to cluster lif node_name-12_clus2 (node node_name-12).
8/21/2022 01:08:57 node_name-11 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-01_clus2 (node node_name-01).
8/21/2022 01:08:57 node_name-11 ALERT vifmgr.cluscheck.ctdpktloss: Continued packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-01_clus2 (node node_name-01).
8/20/2022 22:48:56 node_name-11 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-01_clus2 (node node_name-01).
8/20/2022 22:48:56 node_name-11 ALERT vifmgr.cluscheck.ctdpktloss: Continued packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-01_clus2 (node node_name-01).
8/20/2022 22:11:29 node_name-02 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-02_clus1 (node node_name-02) to cluster lif node_name-12_clus2 (node node_name-12).
8/20/2022 10:58:50 node_name-11 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-01_clus2 (node node_name-01).
8/20/2022 01:39:14 node_name-01 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-01_clus1 (node node_name-01) to cluster lif node_name-12_clus2 (node node_name-12).
8/20/2022 01:39:14 node_name-11 ALERT vifmgr.cluscheck.droppedlarge: Partial packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-12_clus2 (node node_name-12).
8/20/2022 01:39:14 node_name-11 ALERT vifmgr.cluscheck.ctdpktloss: Continued packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-12_clus2 (node node_name-12).
8/19/2022 17:29:32 node_name-11 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-12_clus2 (node node_name-12).
8/19/2022 17:29:32 node_name-11 ALERT vifmgr.cluscheck.ctdpktloss: Continued packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-12_clus2 (node node_name-12).
8/18/2022 21:13:36 node_name-11 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-12_clus2 (node node_name-12).
20 entries were displayed.

  • Based on the above example the issue e.g. always happens between cluster lif _clus1 of one node and cluster lif _clus2 of another node and vice versa
  • The _clus1 ports of all nodes are connected to one cluster switch and the _clus2 ports to the other cluster switch
  • Disabled each ISL port one at a time and checked with cluster ping if the error messages returned. Example:  

::> set advanced

::*> cluster ping-cluster

  •  Isolated the faulty ISL connection and inspected the link specific hardware parts
  •  IFSTAT -A shows multiple cluster ports with receive errors (may see a combination of CRC err, Error symbol, Illegal symbol, etc):

-- interface  e2c  (4 days, 6 hours, 38 minutes, 26 seconds) --

RECEIVE
 Total frames:    12130k | Frames/second:      33  | Total bytes:      5418m
 Bytes/second:    14663  | Total errors:    57950  | Errors/minute:       9 
 Total discards:      0  | Discards/minute:     0  | Multi/broadcast:   112k
 Non-primary u/c:     0  | CRC errors:      38630  | Runt frames:         0 
 Fragment:          304  | Long frames:         0  | Jabber:              0 
 Length errors:       2  | No buffer:           0  | Xon:                 0 
 Xoff:                0  | Pause:               0  | Jumbo:             450k
 Noproto:             0  | Error symbol:        0  | Illegal symbol:  19016 

  • CLUSTER-SWITCH-INTERFACE.XML shows ISL port with input errors.

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.