Bus Overruns on Port Causing Cluster Health to be Degraded
Applies to
- Cluster network port(s)
- Bus Overruns detected
- AFF A700s
Issue
- Node reporting a cluster port in degraded status with similar messages to:
[node_name-1: vifmgr: callhome.clus.net.degraded:alert]: Call home for CLUSTER NETWORK DEGRADED: Total Packet Loss - Ping failures detected between node_name-1_cluster1 ( 169.254.123.145 ) on node_name-1 and node_name-2_cluster1 ( 169.254.123.167 ) on node_name-2
[node_name-1: vifmgr: vifmgr.cluscheck.droppedall:alert]: Total packet loss when pinging from cluster lif node_name-1_cluster1 (node node_name-1) to cluster lif node_name-2_cluster1 (node node_name-2).
and/or
[node_name-1: vifmgr: vifmgr.port.monitor.failed:error]: The "l2_reachability" health check for port e0a (node node_name-1) has failed. The port is operating in a degraded state.
[node_name-1: vifmgr: callhome.clus.net.degraded:alert]: Call home for CLUSTER NETWORK DEGRADED: Insufficient L2 Reachability - Insufficient L2 Reachability detected from cluster port e0a on node node_name-1.
- ONTAP event messages and VIFMGR-LOG.GZ outputs with:
::> event log show -messagename vifmgr*
Time Node Severity Event
---- ----------- ------------- ---------------------------
... node_name-1 ERROR vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-1_cluster2 (node node_name-1) to cluster lif node_name-2_cluster1 (node node_name-2).
... node_name-1 INFORMATIONAL vifmgr.portdown: A link down event was received on node node_name-1, port e0a.
... node_name-1 CRITICAL vifmgr.clus.linkdown: The cluster port e0a on node node_name-1 has gone down unexpectedly.
... node_name-1 INFORMATIONAL vifmgr.portdown: A link down event was received on node node_name-1, port e0a.
- no-reachability reported by that cluster network port. Example:
::> network port reachability show -detail -node node_name-1 -port e0a
Node Port Expected Reachability Reachability Status
------------ -------- ---------------------------- --------------------------
node_name-1 e0a Cluster:Cluster no-reachability
Unreachable Ports: node_name-2:e0b, node_name-2:e0a, node_name-1:e0b
Unexpected Ports: -
- Increasing total discards and bus overruns on the cluster network port ifstat output. Example:
::> system node run -node node_name -command ifstat e0a
-- interface e0a (0 hours, 38 minutes, 59 seconds) --
RECEIVE
Total frames: 217k | Frames/second: 93 | Total bytes: 98483k
Bytes/second: 42105 | Total errors: 0 | Errors/minute: 0
Total discards: 31183 | Discards/minute: 800 | Multi/broadcast: 131
Non-primary u/c: 0 | CRC errors: 0 | Runt frames: 0
...
Noproto: 0 | Error symbol: 0 | Illegal symbol: 0
Bus overruns: 31183 | Queue drops: 0 | LRO segments: 206k
LRO bytes: 95312k | LRO6 segments: 0 | LRO6 bytes: 0
...