Ethernet Port Goes Down with CRC Errors
Applies to
- NetApp ONTAP 9.18.1 ASAR2
- ASA-A30 (All SAN Array)
- Ethernet ports (e1a, e1b) in interface groups (ifgrp/LACP)
- Sites with recent hardware (switch) changes
- Environments using jumbo frames (MTU 9000)
Issue
Issue:Ethernet ports (e1a and e1b) on both HA nodes intermittently go down, reporting high numbers of CRC errors and degraded interface group state. The issue started after a switch replacement. Ports can be brought back up by disabling and re-enabling them, but CRC errors persist with MTU set to 9000. Lowering the MTU to 1500 immediately stops CRC errors.
Sample Log Output:
[?] Wed Apr 01 15:15:07 +0000 [node-02: intr: net.ifgrp.lacp.link.inactive:error]: ifgrp a0a, port e1b has transitioned to an inactive state. The interface group is in a degraded state. [?] Wed Apr 01 16:15:22 +0000 [node-02: clock: net.ifgrp.lacp.link.active:notice]: ifgrp a0a, port e1b has transitioned to the active state. [?] Wed Apr 01 21:15:07 +0000 [node-02: intr: net.ifgrp.lacp.link.inactive:error]: ifgrp a0a, port e1b has transitioned to an inactive state. The interface group is in a degraded state.Reason [?] Wed Apr 01 15:15:07 +0000 [node-02: mgmt_port_link_status_poll: netif.linkDown:info]: Ethernet e1b: Link down, check cable. [?] Wed Apr 01 15:15:07 +0000 [node-02: intr: net.ifgrp.lacp.link.inactive:error]: ifgrp a0a, port e1b has transitioned to an inactive state. The interface group is in a degraded state. [?] Wed Apr 01 15:15:07 +0000 [node-02: vifmgr: vifmgr.portdown:notice]: A link down event was received on node node-02, port e1b. [?] Wed Apr 01 15:15:07 +0000 [node-02: vifmgr: vifmgr.cluscheck.hwerrors:alert]: Port e1b on node node-02 is reporting a high number (at least 1 per 1000 packets) of observed hardware errors (CRC, length, alignment, dropped). [?] Wed Apr 01 15:15:07 +0000 [node-02: vifmgr: vifmgr.port.monitor.failed:error]: The "crc_errors" health check for port e1b (node-02) has failed. The port is operating in a degraded state.and after that flapping [?] Sun Apr 05 18:15:35 +0000 [node-02: mgmt_port_link_status_poll: netif.linkUp:info]: Ethernet e1b: Link up. [?] Sun Apr 05 18:15:35 +0000 [node-02: vifmgr: vifmgr.portup:notice]: A link up event was received on node node-02, port e1b. [?] Sun Apr 05 18:15:41 +0000 [node-02: clock: net.ifgrp.lacp.link.active:notice]: ifgrp a0a, port e1b has transitioned to the active state. [?] Sun Apr 05 18:16:44 +0000 [node-02: vifmgr: vifmgr.reach.ok:notice]: Network port e1b on node node-02 can reach its expected broadcast domain Default:Replication. No other broadcast domains appear to be reachable from this port. [?] Mon Apr 06 00:15:20 +0000 [node-02: mgmt_port_link_status_poll: netif.linkDown:info]: Ethernet e1b: Link down, check cable. [?] Mon Apr 06 00:15:20 +0000 [node-02: intr: net.ifgrp.lacp.link.inactive:error]: ifgrp a0a, port e1b has transitioned to an inactive state. The interface group is in a degraded state. [?] Mon Apr 06 00:15:20 +0000 [node-02: vifmgr: vifmgr.portdown:notice]: A link down event was received on node-02, port e1b.
RECEIVE Total frames: 7182k | Frames/second: 0 | Total bytes: 850m Bytes/second: 0 | Total errors: 2394 | Errors/minute: 0 <<<<< CRC errors from environment Total discards: 0 | Discards/minute: 0 | Multi/broadcast: 1714k Non-primary u/c: 0 | Errored frames: 0 | Unsupported Op: 0 CRC errors: 2394 | Runt frames: 0 | Fragment: 0 Long frames: 0 | Jabber: 0 | Length errors: 0 Alignment errors: 0 | No buffer: 0 | Pause: 0 Jumbo: 0 | Error symbol: 0 | Bus overruns: 0 <<<<< no local errors Queue drops: 0 | LRO segments: 5466k | LRO bytes: 646m LRO6 segments: 0 | LRO6 bytes: 0 | Bad UDP cksum: 0 Bad UDP6 cksum: 0 | Bad TCP cksum: 0 | Bad TCP6 cksum: 0 Mcast v6 solicit: 0 | Lagg errors: 0 | Lacp errors: 0 Lacp PDU errors: 0 TRANSMIT Total frames: 16121k | Frames/second: 0 | Total bytes: 4743m Bytes/second: 0 | Total errors: 0 | Errors/minute: 0 Total discards: 0 | Queue overflow: 0 | Multi/broadcast: 1759k Collisions: 0 | Pause: 0 | Jumbo: 1562 Cfg Up to Downs: 6 | TSO segments: 207k | TSO bytes: 2824m TSO6 segments: 0 | TSO6 bytes: 0 | HW UDP cksums: 0 HW UDP6 cksums: 0 | HW TCP cksums: 0 | HW TCP6 cksums: 0 Mcast v6 solicit: 0 | Lagg drops: 0 | Lagg no buffer: 0 Lagg no entries: 0 | Tx No Buf: 0 DEVICE Mcast addresses: 3 | Rx MBuf Sz: 4096 LINK INFO Speed: 0 | Duplex: full | Flowcontrol: full Media state: no carrier | Up to downs: 12 | HW assist: 514k <<<<< Down with flapping -- interface e1b (40 days, 10 hours, 4 minutes, 55 seconds) -- RECEIVE Total frames: 47775k | Frames/second: 14 | Total bytes: 5922m Bytes/second: 1696 | Total errors: 5018 | Errors/minute: 0 <<<<<< CRC errors Total discards: 0 | Discards/minute: 0 | Multi/broadcast: 3027k Non-primary u/c: 0 | Errored frames: 0 | Unsupported Op: 0 CRC errors: 5018 | Runt frames: 0 | Fragment: 0 Long frames: 0 | Jabber: 0 | Length errors: 0 Alignment errors: 0 | No buffer: 0 | Pause: 0 Jumbo: 0 | Error symbol: 0 | Bus overruns: 0 <<<<<< no local errors Queue drops: 0 | LRO segments: 44733k | LRO bytes: 5621m LRO6 segments: 0 | LRO6 bytes: 0 | Bad UDP cksum: 0 Bad UDP6 cksum: 0 | Bad TCP cksum: 0 | Bad TCP6 cksum: 0 Mcast v6 solicit: 0 | Lagg errors: 0 | Lacp errors: 0 Lacp PDU errors: 0 TRANSMIT Total frames: 47104k | Frames/second: 13 | Total bytes: 14245m Bytes/second: 4079 | Total errors: 0 | Errors/minute: 0 Total discards: 0 | Queue overflow: 0 | Multi/broadcast: 2113k Collisions: 0 | Pause: 0 | Jumbo: 4387 Cfg Up to Downs: 0 | TSO segments: 631k | TSO bytes: 8605m TSO6 segments: 0 | TSO6 bytes: 0 | HW UDP cksums: 0 HW UDP6 cksums: 0 | HW TCP cksums: 0 | HW TCP6 cksums: 0 Mcast v6 solicit: 0 | Lagg drops: 0 | Lagg no buffer: 0 Lagg no entries: 0 | Tx No Buf: 0 DEVICE Mcast addresses: 3 | Rx MBuf Sz: 4096 LINK INFO Speed: 10000M | Duplex: full | Flowcontrol: full Media state: active | Up to downs: 7 | HW assist: 514k <<<<<<< Active
After lowering MTU to 1500, CRC errors stop, and ports remain stable.
