Bad link on cluster network causes errors on multiple cluster ports
Applies to
- ONTAP 9
- FAS/AFF Systems
- Switched cluster network
Issue
- The below error is reported for the cluster port in the event logs and the cluster network is degraded due to that:
[Node-01: intr: netif.linkErrors:error]: Excessive link errors on network interface e0a. Might indicate a bad cable, switch port, or NIC, or that a cable connector is not fully inserted in a socket. On a 10/100 port, might indicate a duplex mismatch.
[Node-01: vifmgr: callhome.clus.net.degraded:alert]: Call home for CLUSTER NETWORK DEGRADED: CRC Errors Detected - High CRC errors detected on port e0a node Node-01
- The following health alert is generated for errors coming into the cluster switch port:
::> system health alert show
Node: node-01
Alert ID: ClusterIfInErrorsWarn_Alert
Resource: Ethernet1/7
Severity: Major
Indication Time: Wed Apr 19 08:45:22 2023
Suppress:
false
Acknowledge: false
Probable Cause: The percentage of inbound packet errors of switch interface "sw1(FOCXXXXXXX)/Ethernet1/7" is
above the warning threshold.
- The ifstat output for the reported cluster port shows high number of CRC errors:
::> system node run -node <node> -command ifstat <port>
-- interface e0a (62 days, 23 hours, 58 minutes, 45 seconds) --
RECEIVE
Total frames: 81107m | Frames/second: 14901 | Total bytes: 387t
Bytes/second: 71201k | Total errors: 1396k | Errors/minute: 15
Total discards: 0 | Discards/minute: 0 | Multi/broadcast: 1646k
Non-primary u/c: 0 | CRC errors: 1396k | Runt frames: 0
Fragment: 0 | Long frames: 10 | Jabber: 0
Length errors: 108 | No buffer: 0 | Xon: 0
Xoff: 0 | Pause: 0 | Jumbo: 45712m
Noproto: 0 | Error symbol: 0 | Illegal symbol: 0
- The interface statistics show high input errors and CRC errors for the switch interface alerting:
Switch-1#show interface Ethernet1/7
Ethernet1/7 is up
RX
190220628517 unicast packets 802665 multicast packets 571723 broadcast packets
190222002905 input packets 423856190729771 bytes
44207022448 jumbo packets 0 storm suppression packets
661 runts 0 giants 2775922 CRC 0 no buffer
2905280 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input discard
0 Rx pause
- One or more lanes may report receive power of the associated QSFP near the warning value:
Switch-1#show interface transceiver details
Lane Number:2 Network Lane
SFP Detail Diagnostics Information (internal calibration)
----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low
----------------------------------------------------------------------------
Temperature 32.21 C 75.00 C -5.00 C 70.00 C 0.00 C
Voltage 3.30 V 3.63 V 2.97 V 3.46 V 3.09 V
Current 7.50 mA 12.00 mA 3.00 mA 12.00 mA 3.00 mA
Tx Power -1.47 dBm 2.99 dBm -11.30 dBm 0.00 dBm -7.30 dBm
Rx Power -0.18 dBm 2.99 dBm -13.97 dBm 0.00 dBm -9.91 dBm
Transmit Fault Count = 0
----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning