Skip to main content
NetApp Knowledge Base

Cluster network degraded due to high CRC errors on cluster ports

Views:
827
Visibility:
Public
Votes:
0
Category:
aff-series
Specialty:
hw
Last Updated:

Applies to

  • ONTAP 9
  • FAS/AFF Systems
  • CN1610 Cluster Switches
  • BES-53248 Cluster Switches
  • Cisco Cluster switches 

Issue

  • Cluster network is degraded because of CRC errors, and the following errors are seen in event logs:

[Node-01: intr: netif.linkErrors:error]: Excessive link errors on network interface e0b. Might indicate a bad cable, switch port, or NIC, or that a cable connector is not fully inserted in a socket. On a 10/100 port, might indicate a duplex mismatch.
[Node-01: vifmgr: vifmgr.cluscheck.hwerrors:alert]: Port e0a on node Node-01 is reporting a high number (at least 1 per 1000 packets) of observed hardware errors (CRC, length, alignment, dropped).
[Node-01: vifmgr: callhome.clus.net.degraded:alert]: Call home for CLUSTER NETWORK DEGRADED: CRC Errors Detected - High CRC errors detected on port e0a node Node-01

  • If link flaps are observed on the cluster ports, the following alerts are seen in the event logs:

[Node-01: vifmgr: vifmgr.port.monitor.failed:error]: The "link_flapping" health check for port e0a (node Node-01) has failed. The port is operating in a degraded state.
[Node-01: vifmgr: callhome.clus.net.degraded:alert]: Call home for CLUSTER NETWORK DEGRADED: Frequent Link Flapping - Cluster port e0a on node Node-01 has experienced multiple link down notifications.

  • High CRC errors are observed on the cluster ports of all the nodes:

::> system node run -node <node-name> -command ifstat <port-name>

-- interface  e0a  (4 days, 14 hours, 42 minutes, 47 seconds) --

RECEIVE
 Total frames:    86771k | Frames/second:     218  | Total bytes:       289g
 Bytes/second:      727k | Total errors:    65389  | Errors/minute:      10
 Total discards:      0  | Discards/minute:     0  | Multi/broadcast:   121k
 Non-primary u/c:     0  | CRC errors:      22207  | Runt frames:         0
 Fragment:            0  | Long frames:         0  | Jabber:          41971
 Length errors:    1211  | No buffer:           0  | Xon:                 0
 Xoff:                0  | Pause:               0  | Jumbo:           31475k
 Noproto:             0  | Error symbol:      243k | Illegal symbol:    217k
 Bus overruns:        0  | Queue drops:         0  | LRO segments:    62544k

  • High number of Rx and Tx errors and port flaps are observed on the switch side as well:

#show interface counters

Port              InOctets      InUcastPkts      InMcastPkts      InBcastPkts       InDropPkts         Rx Error
--------- ---------------- ---------------- ---------------- ---------------- ---------------- ----------------
0/1         63884683472614      34223820975           116925            80962                5            35838
0/2        265584648397991      43844458781           116922            81071                1          1961079

Port             OutOctets     OutUcastPkts     OutMcastPkts     OutBcastPkts      OutDropPkts         Tx Error
--------- ---------------- ---------------- ---------------- ---------------- ---------------- ----------------
0/1        265607061634499      43843431844          1638223           565759          1952351          1952351
0/2         63884090686727      34225894361          1638180           565624            35018            35015

  • Replacing the SFPs on the node side does not stop the errors.
  • All nodes/ports reporting errors on storage may be connected to the same switch per network device-discovery show output.

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.