Cluster Network Degraded alerts reported multiple times due to errors on cluster ports
Applies to
- ONTAP 9
- Cluster Switch
Issue
- Following cluster network degraded autosupport notification is received multiple times:
HA Group Notification (CLUSTER NETWORK DEGRADED) ALERT
- Multiple errors are reported on cluster ports of all the nodes except for one cluster port as seen in RECEIVE section of IFSTAT output.
- You may see the cluster switch health as degraded in the
system health alert show
command output - A performance impact could be seen if there is a high amount of CRCs
- Cluster ports with multiple errors (may see a combination of CRC err, Error symbol, Illegal symbol, etc):
clustershell::>system node run -node <nodename> -command ifstat -a
-- interface e0b (219 days, 5 hours, 16 minutes, 2 seconds) --
RECEIVE
Frames/second: 2301 | Bytes/second: 3785k | Errors/minute: 0
Discards/minute: 0 | Total frames: 154g | Total bytes: 176t
Total errors: 43918 | Total discards: 65 | Multi/broadcast: 4452k
No buffers: 0 | Non-primary u/c: 0 | L2 terminate: 14908
Tag drop: 0 | Vlan tag drop: 0 | Vlan untag drop: 0
Vlan forwards: 0 | CRC errors: 29328 | Runt frames: 0
Fragment: 0 | Long frames: 65 | Jabber: 0
Error symbol: 29328 | Illegal symbol: 14590 | Bus overruns: 0
Queue drop: 0 | Xon: 0 | Xoff: 0
Jumbo: 5634k | JMBuf RxFrames: 162g | JMBuf DrvCopy: 27146
- Single cluster port without errors:
clustershell::>system node run -node <nodename> -command ifstat -a
-- interface e0b (219 days, 7 hours, 2 minutes, 24 seconds) --
RECEIVE
Frames/second: 1092 | Bytes/second: 950k | Errors/minute: 0
Discards/minute: 0 | Total frames: 47631m | Total bytes: 107t
Total errors: 0 | Total discards: 1159 | Multi/broadcast: 4473k
No buffers: 1087 | Non-primary u/c: 0 | L2 terminate: 302
Tag drop: 0 | Vlan tag drop: 0 | Vlan untag drop: 0
Vlan forwards: 0 | CRC errors: 0 | Runt frames: 0
Fragment: 0 | Long frames: 50 | Jabber: 0
Error symbol: 0 | Illegal symbol: 0 | Bus overruns: 22
Queue drop: 0 | Xon: 0 | Xoff: 0
Jumbo: 2769m | JMBuf RxFrames: 0 | JMBuf DrvCopy: 0
- Single cluster port reports low mW in
ifconfig -vvv
output:
::> system node run -node <nodename> -command ifconfig -vvv
…
e0b: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
uuid: 0320a80b-caa3-11eb-b14a-d039ea306760
...
RX: 0.06 mW (-12.13 dBm) TX: 0.55 mW (-2.59 dBm)
- You may see the follwing messages in EMS logs and health alerts generated
netif.linkErrors: Excessive link errors on network interface e2c. Might indicate a bad cable, switch port, or NIC, or that a
cable connector is not fully inserted in a socket. On a 10/100 port, might indicate a duplex mismatch.
- The following alert may be generated against the connected switch port of the only node port without receive CRC errors:
[?] Tue Nov 01 18:13:10 -0700 [node-01: mgwd: callhome.hm.alert.major:alert]: Call home for Health Monitor process cshm: ClusterIfInErrorsWarn_Alert[switch01(FOC123456789)/Ethernet1/9].