Skip to main content
NetApp Knowledge Base

Bus Overruns on Port Causing Cluster Health to be Degraded

Views:
562
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
core
Last Updated:

Applies to

  • Cluster network port(s)
  • Bus Overruns detected
  • AFF A700s

Issue

  • Node reporting a cluster port in degraded status with similar messages to:

[node_name-1: vifmgr: callhome.clus.net.degraded:alert]: Call home for CLUSTER NETWORK DEGRADED: Total Packet Loss - Ping failures detected between node_name-1_cluster1 ( 169.254.123.145 ) on node_name-1 and node_name-2_cluster1 ( 169.254.123.167 ) on node_name-2

[node_name-1: vifmgr: vifmgr.cluscheck.droppedall:alert]: Total packet loss when pinging from cluster lif node_name-1_cluster1 (node node_name-1) to cluster lif node_name-2_cluster1 (node node_name-2).

and/or

[node_name-1: vifmgr: vifmgr.port.monitor.failed:error]: The "l2_reachability" health check for port e0a (node node_name-1) has failed. The port is operating in a degraded state.

[node_name-1: vifmgr: callhome.clus.net.degraded:alert]: Call home for CLUSTER NETWORK DEGRADED: Insufficient L2 Reachability - Insufficient L2 Reachability detected from cluster port e0a on node node_name-1.

  • ONTAP event messages and VIFMGR-LOG.GZ outputs with:
::> event log show -messagename vifmgr*
Time Node        Severity      Event
---- ----------- ------------- ---------------------------
...  node_name-1 ERROR         vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-1_cluster2 (node node_name-1) to cluster lif node_name-2_cluster1 (node node_name-2).
...  node_name-1 INFORMATIONAL vifmgr.portdown: A link down event was received on node node_name-1, port e0a.
...  node_name-1 CRITICAL      vifmgr.clus.linkdown: The cluster port e0a on node node_name-1 has gone down unexpectedly.
...  node_name-1 INFORMATIONAL vifmgr.portdown: A link down event was received on node node_name-1, port e0a.
  • no-reachability reported by that cluster network port. Example:

::> network port reachability show -detail -node node_name-1 -port e0a
Node         Port     Expected Reachability        Reachability Status
------------ -------- ---------------------------- --------------------------
node_name-1  e0a      Cluster:Cluster              no-reachability
   Unreachable Ports: node_name-2:e0b, node_name-2:e0a, node_name-1:e0b
    Unexpected Ports: -

  • Increasing total discards and bus overruns on the cluster network port ifstat output. Example:
::> system node run -node node_name -command ifstat e0a
-- interface  e0a  (0 hours, 38 minutes, 59 seconds) --
RECEIVE
Total frames:      217k | Frames/second:      93  | Total bytes:     98483k
Bytes/second:    42105  | Total errors:        0  | Errors/minute:       0
Total discards:  31183  | Discards/minute:   800  | Multi/broadcast:   131
Non-primary u/c:     0  | CRC errors:          0  | Runt frames:         0
...
Noproto:             0  | Error symbol:        0  | Illegal symbol:      0
Bus overruns:    31183  | Queue drops:         0  | LRO segments:      206k
LRO bytes:       95312k | LRO6 segments:       0  | LRO6 bytes:          0
...

 

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.