Cluster Network Degraded alerts reported due to bad Cluster port/SFP connection
Applies to
- ONTAP 9
- Cluster Network
Issue
- Following cluster network degraded autosupport notification is received multiple times:
HA Group Notification (CLUSTER NETWORK DEGRADED) ALERT
- Multiple errors are reported on cluster ports of all the nodes except for one cluster port as seen in RECEIVE section of IFSTAT output.
- You may see the cluster switch health as degraded in the
systemhealth alertshowcommand output - Cluster ports with multiple errors (may see a combination of CRC err, Error symbol, Illegal symbol, etc):
clustershell::>system node run -node <nodename> -command ifstat -a
-- interface e0b (219 days, 5 hours, 16 minutes, 2 seconds) --
RECEIVE
Frames/second: 2301 | Bytes/second: 3785k | Errors/minute: 0
Discards/minute: 0 | Total frames: 154g | Total bytes: 176t
Total errors: 43918| Total discards: 65 | Multi/broadcast: 4452k
No buffers: 0 | Non-primary u/c: 0 | L2 terminate: 14908
Tag drop: 0 | Vlan tag drop: 0 | Vlan untag drop: 0
Vlan forwards: 0 |CRC errors: 29328| Runt frames: 0
Fragment: 0 | Long frames: 65 | Jabber: 0
Error symbol: 29328 | Illegal symbol: 14590 | Bus overruns: 0
Queue drop: 0 | Xon: 0 | Xoff: 0
Jumbo: 5634k | JMBuf RxFrames: 162g | JMBuf DrvCopy: 27146
- Single cluster port without errors:
clustershell::>system node run -node <nodename> -command ifstat -a
-- interface e0b (219 days, 7 hours, 2 minutes, 24 seconds) --
RECEIVE
Frames/second: 1092 | Bytes/second: 950k | Errors/minute: 0
Discards/minute: 0 | Total frames: 47631m | Total bytes: 107t
Total errors: 0| Total discards: 1159 | Multi/broadcast: 4473k
No buffers: 1087 | Non-primary u/c: 0 | L2 terminate: 302
Tag drop: 0 | Vlan tag drop: 0 | Vlan untag drop: 0
Vlan forwards: 0 |CRC errors: 0| Runt frames: 0
Fragment: 0 | Long frames: 50 | Jabber: 0
Error symbol: 0 | Illegal symbol: 0 | Bus overruns: 22
Queue drop: 0 | Xon: 0 | Xoff: 0
Jumbo: 2769m | JMBuf RxFrames: 0 | JMBuf DrvCopy: 0
- You may see the follwing messages in EMS logs and health alerts generated
netif.linkErrors: Excessive link errors on network interface e2c. Might indicate a bad cable, switch port, or NIC, or that a cable connector is not fully inserted in a socket. On a 10/100 port, might indicate a duplex mismatch.
- The following alert may be generated against the connected switch port of the only node port without receive CRC errors:
[?] Tue Nov 01 18:13:10 -0700 [node-01: mgwd: callhome.hm.alert.major:alert]: Call home for Health Monitor process cshm:ClusterIfInErrorsWarn_Alert[switch01(FOC123456789)/Ethernet1/9].
