System does not boot due to "Initialization of network interface e0a/e0b failed"
Applies to
- FAS8300
- Ethernet ports
e0a,e0b - 10G/25G Ethernet Controller CX5
- Firmware Version:
16.26.4012
- Ethernet ports
- ONTAP 9
Issue
- Takeover is initiated due to no heartbeat detected from partner
Sun Feb 06 11:10:20 +0100 [Node-01: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of Node-01 by Node-02 disabled (HA interconnect error. Verify that the partner node is running and that the HA interconnect cabling is correct, if applicable. For further assistance, contact technical support).Sun Feb 06 11:10:23 +0100 [Node-01: cf_main: cf.fsm.partnerNotResponding:notice]: Failover monitor: partner not respondingSun Feb 06 11:10:23 +0100 [Node-01: cf_main: cf.fsm.takeoverCountdown:info]: Failover monitor: takeover scheduled in 10 secondsSun Feb 06 11:10:33 +0100 [Node-01: cf_main: cf.fsm.takeover.noHeartbeat:alert]: Failover monitor: Takeover initiated after no heartbeat was detected from the partner node.- During ANDU the node may fail to boot to
waiting for givebackstate and keep reporting the following event at console:
Waiting for reservations to clear
- When attempting to boot the down node following types of messages are observed
Feb 08 11:10:25 [Node-02:netif.init.failed:ALERT]: Initialization of network interface e0a failed due to unexpected software error mlx5_core err=0xfffffff0:0.Feb 08 11:12:25 [Node-02:netif.init.failed:ALERT]: Initialization of network interface e0b failed due to unexpected software error mlx5_core err=0xfffffff0:0- The output of
storage failover showdisplays following status
Cluster::> storage failover show
Takeover
Node Partner Possible State Description
-------------- -------------- -------- -------------------------------------
Node-01
Node-02 false In takeover
Node-02
Node-01 - Waiting for reservations to clear
2 entries were displayed.
SYSCONFIG -Vat maintenance mode show that he HA Interconnect ports are both down or missing.
slot 0: 10G/25G Ethernet Controller CX5 e0a MAC Address: XX:XX:XX:XX:XX:XX (auto-unknown-fd-down) SFP Vendor: Amphenol SFP Part Number: NDCCGF-N103 SFP Serial Number: XXXXXXXXXXXXXX e0b MAC Address: XX:XX:XX:XX:XX:XX (auto-unknown-fd-down) SFP Vendor: Amphenol SFP Part Number: NDCCGF-N103 SFP Serial Number: XXXXXXXXXXXXXX Device Type: CX5 PSID(NAP0000000006) Firmware Version: 16.26.4012
- Reseating the motherboard of the node temporarily recovers the ports
- Power-cycling the node through the Baseboard Management Controller (BMC) temporarily recovers the ports
