CHW-219: FAS2500: Quad Gigabit Ethernet Controller 82580 ports e0a, e0b, e0M, e0P missing or failed
Issue
FAS2500-series systems, the "Quad Gigabit Ethernet Controller 82580" chip can
fail, resulting in several possible symptoms:
- Node reboots. The following messages can be present before shutting down:
{{{}Mon Jul 5 09:13:39 CEST [node_name:netif.hangDetected:warning]: Network interface e0b hung (PCIe RcvMstAdt). Resetting to recover. Driver: igb.{}}}{{{}Mon Jul 5 09:22:13 CEST [node_name:netif.hangDetected:warning]: Network interface e0a hung (PCIe RcvMstAdt). Resetting to recover. Driver: igb.{}}}- Node remains up but stops responding on the network via ports e0a, e0b, and e0M.
Sysconfig outputs:
- When ONTAP boots, the e0a, e0b, e0M and e0P ports are missing in "sysconfig -v"
output.
For example:
...slot 0: Internal 10/100/1000 Ethernet SwitchWrench: auto-1000t-fd-upLocked-wrench: auto-10tx-hd-downDevice Type: 88E6176>>> [Missing slot 0: Quad Gigabit Ethernet Controller 82580 output] <<<slot 0: 1G/10G Ethernet Controller CNA EP 8324(Multiple Dual-port, QLogic CNA 8324(8362) rev. 2)e0c MAC Address: 00:a0:98:80:e5:d8 (auto-unknown-down)e0d MAC Address: 00:a0:98:80:e5:d9 (auto-10g_twinax-fd-up)...- Ports e0a, e0b, e0M, or e0P report hardware initialization errors or MAC addresses
with all zeros.
For example:
...slot 0: Internal 10/100/1000 Ethernet SwitchWrench: auto-1000t-fd-upLocked-wrench: auto-10tx-hd-downDevice Type: 88E6176slot 0: Quad Gigabit Ethernet Controller 82580e0a MAC Address: 00:00:00:00:00:00 (Hardware Initialization Failed: IG)e0b MAC Address: 00:00:00:00:00:00 (Hardware Initialization Failed: IG)e0M MAC Address: 00:00:00:00:00:00 (Hardware Initialization Failed: IG)e0P MAC Address: 00:00:00:00:00:00 (Hardware Initialization Failed: IG)Device Type: 150EFirmware Version: 0.0, 0x00000000slot 0: 1G/10G Ethernet Controller CNA EP 8324(Multiple Dual-port, QLogic CNA 8324(8362) rev. 2)e0c MAC Address: 00:a0:98:80:e5:d8 (auto-unknown-down)e0d MAC Address: 00:a0:98:80:e5:d9 (auto-10g_twinax-fd-up)...- (ONTAP 9.1 and earlier) ONTAP does not boot or does not join cluster quorum.
Console messages similar to the following may appear:
Invalid PCIe device detected below PCIe Root Port(Bus/Dev/Func): 00/1C/00Actual Vendor ID and Device ID:FFFF/FFFFExpected Vendor ID and Device ID:8086/150EMezzanine Card ID(02 - 10GbE, 03 - FC, 07 - No Dev, others - Resv):07BIOS is resetting system...