Vifmgr: Packet loss when pinging from one cluster lif to another cluster lif
Applies to
- Cluster Network Switch
- ONTAP 9
Issue
- Similar type of EMS-messages are seen for all cluster nodes:
Fri Nov 19 18:06:27 +0100 [node1: vifmgr: vifmgr.cluscheck.ctdpktloss:alert]: Continued packet loss when pinging from cluster lif node1_clus2 (node node1) to cluster lif node5_clus1 (node node5)
Thu Dec 23 03:36:41 +0100 [node2: vifmgr: vifmgr.cluscheck.droppedlarge:alert]: Partial packet loss when pinging from cluster lif node2_clus1 (node node2) to cluster lif node6_clus2 (node node6)
Tue Dec 28 16:54:49 +0100 [node3: vifmgr: vifmgr.cluscheck.droppedall:alert]: Total packet loss when pinging from cluster lif node3_clus2 (node node3) to cluster lif node1_clus1 (node node1)
- Symptoms indicate network traffic issues between the two cluster switches via the inter switch link (ISL), since lot of cluster ports report issues. Example:
::> event show -message-name *vifmgr.cluscheck*
Time Node Severity Event
------------------- ---------------- ------------- ---------------------------
8/24/2022 08:14:27 node_name-01 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-01_clus1 (node node_name-01) to cluster lif node_name-11_clus2 (node node_name-11).
8/23/2022 18:36:43 node_name-12 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-12_clus1 (node node_name-12) to cluster lif node_name-11_clus2 (node node_name-11).
8/23/2022 12:41:38 node_name-11 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-01_clus2 (node node_name-01).
8/23/2022 09:33:27 node_name-02 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-02_clus1 (node node_name-02) to cluster lif node_name-11_clus2 (node node_name-11).
8/23/2022 08:28:35 node_name-11 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-12_clus2 (node node_name-12).
8/21/2022 13:58:34 node_name-12 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-12_clus1 (node node_name-12) to cluster lif node_name-01_clus2 (node node_name-01).
8/21/2022 13:36:54 node_name-01 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-01_clus1 (node node_name-01) to cluster lif node_name-11_clus2 (node node_name-11).
8/21/2022 01:51:56 node_name-01 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-01_clus1 (node node_name-01) to cluster lif node_name-12_clus2 (node node_name-12).
8/21/2022 01:08:57 node_name-11 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-01_clus2 (node node_name-01).
8/21/2022 01:08:57 node_name-11 ALERT vifmgr.cluscheck.ctdpktloss: Continued packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-01_clus2 (node node_name-01).
8/20/2022 22:48:56 node_name-11 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-01_clus2 (node node_name-01).
8/20/2022 22:48:56 node_name-11 ALERT vifmgr.cluscheck.ctdpktloss: Continued packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-01_clus2 (node node_name-01).
8/20/2022 22:11:29 node_name-02 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-02_clus1 (node node_name-02) to cluster lif node_name-12_clus2 (node node_name-12).
8/20/2022 10:58:50 node_name-11 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-01_clus2 (node node_name-01).
8/20/2022 01:39:14 node_name-01 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-01_clus1 (node node_name-01) to cluster lif node_name-12_clus2 (node node_name-12).
8/20/2022 01:39:14 node_name-11 ALERT vifmgr.cluscheck.droppedlarge: Partial packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-12_clus2 (node node_name-12).
8/20/2022 01:39:14 node_name-11 ALERT vifmgr.cluscheck.ctdpktloss: Continued packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-12_clus2 (node node_name-12).
8/19/2022 17:29:32 node_name-11 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-12_clus2 (node node_name-12).
8/19/2022 17:29:32 node_name-11 ALERT vifmgr.cluscheck.ctdpktloss: Continued packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-12_clus2 (node node_name-12).
8/18/2022 21:13:36 node_name-11 ALERT vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-11_clus1 (node node_name-11) to cluster lif node_name-12_clus2 (node node_name-12).
20 entries were displayed.
- Based on the above example the issue e.g. always happens between cluster lif _clus1 of one node and cluster lif _clus2 of another node and vice versa
- The _clus1 ports of all nodes are connected to one cluster switch and the _clus2 ports to the other cluster switch
- Disabled each ISL port one at a time and checked with cluster ping if the error messages returned. Example:
::> set advanced
::*> cluster ping-cluster
- Isolated the faulty ISL connection and inspected the link specific hardware parts
-
IFSTAT -
A
shows multiple cluster ports with receive errors (may see a combination of CRC err, Error symbol, Illegal symbol, etc):
-- interface e2c (4 days, 6 hours, 38 minutes, 26 seconds) --
RECEIVE
Total frames: 12130k | Frames/second: 33 | Total bytes: 5418m
Bytes/second: 14663 | Total errors: 57950 | Errors/minute: 9
Total discards: 0 | Discards/minute: 0 | Multi/broadcast: 112k
Non-primary u/c: 0 | CRC errors: 38630 | Runt frames: 0
Fragment: 304 | Long frames: 0 | Jabber: 0
Length errors: 2 | No buffer: 0 | Xon: 0
Xoff: 0 | Pause: 0 | Jumbo: 450k
Noproto: 0 | Error symbol: 0 | Illegal symbol: 19016
CLUSTER-SWITCH-INTERFACE.XML
shows ISL port with input errors.