Skip to main content
NetApp Knowledge Base

StorageGRID appliance having all HIC ports down frequently

Views:
644
Visibility:
Public
Votes:
0
Category:
storagegrid
Specialty:
sgrid
Last Updated:

Applies to

NetApp StorageGRID Appliances

Issue

StorageGRID node randomly loses connectivity on some ports. The ports are disconnected and when reconnecting it may synchronise with the LACP if configured

  • warn log under /var/local/log of the affected node shows instances of Tx Timeout for the HIC ports:

Jan 10 03:12:23 localhost kernel: [1456351.753113] [qede_tx_timeout:991(hic2)]Tx timeout!
Jan 10 03:12:23 localhost kernel: [1456351.753338] [qed_mfw_report:3613(hic2)]Txq[1]: FW cons [host] fce8, SW cons fc97, SW prod fce8 [idx c6] [Jiffies 4658987302]
Jan 10 03:12:23 localhost kernel: [1456351.753588] [qed_mfw_report:3613(hic2)]Txq[1]: SB[0x0002] - IGU: prod 00339d9f cons 00339b03 CAU Tx fce8
Jan 10 03:12:23 localhost kernel: [1456351.753832] [qed_mfw_report:3613(hic2)]Last DB: 0000fce8 [Jiffies 4658985126]

Jan 10 03:11:57 localhost kernel: [1456325.502522] NETDEV WATCHDOG: hic4 (qede): transmit queue 6 timed out
Jan 10 03:11:58 localhost kernel: [1456326.281083] [qede_tx_timeout:991(hic4)]Tx timeout!
Jan 10 03:11:58 localhost kernel: [1456326.337487] bond0: link status down for interface hic4, disabling it in 200 ms
Jan 10 03:11:58 localhost kernel: [1456326.337490] bond0: invalid new link 1 on slave hic4
Jan 10 03:11:58 localhost kernel: [1456326.474543] qede 0000:42:00.3 hic4: speed changed to 0 for port hic4
Jan 10 03:11:58 localhost kernel: [1456326.497102] [qede_generic_hw_err_handler:4012(hic4)]Starting a generic HW error handling (sleep requiring operations) - err_flags 0x80000002, err_flags_override 0x0

  • Later the HICs are recovered.

Jan 10 03:34:59 localhost kernel: [    9.312373] qede 0000:42:00.1 hic2: renamed from eth0
Jan 10 03:35:08 localhost kernel: [   43.979425] bond0: Enslaving hic2 as a backup interface with a down link
Jan 10 03:35:08 localhost kernel: [   44.104547] [qede_validate_bond:423(hic2)]RDMA bonding - Can't bond PF1 and PF3
Jan 10 03:35:08 localhost kernel: [   44.273897] device hic2 entered promiscuous mode
Jan 10 03:35:10 localhost kernel: [   45.863791] [qede_link_update:3829(hic2)]Link is up
Jan 10 03:35:10 localhost kernel: [   45.901661] bond0: link status up for interface hic2, enabling it in 0 ms
Jan 10 03:35:10 localhost kernel: [   45.908646] bond0: link status definitely up for interface hic2, 10000 Mbps full duplex

Jan 10 03:34:59 localhost kernel: [    9.398066] qede 0000:42:00.3 hic4: renamed from eth3
Jan 10 03:35:08 localhost kernel: [   44.112259] bond0: Enslaving hic4 as a backup interface with a down link
Jan 10 03:35:08 localhost kernel: [   44.280087] device hic4 entered promiscuous mode
Jan 10 03:35:10 localhost kernel: [   46.077201] [qede_link_update:3829(hic4)]Link is up
Jan 10 03:35:10 localhost kernel: [   46.137659] bond0: link status up for interface hic4, enabling it in 200 ms
Jan 10 03:35:10 localhost kernel: [   46.144587] bond0: invalid new link 3 on slave hic4
Jan 10 03:35:10 localhost kernel: [   46.353923] bond0: link status definitely up for interface hic4, 10000 Mbps full duplex

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.