StorageGRID intermittently reports Appliance LACP port missing on SG1000
Applies to
- NetApp StorageGRID
- SG1000 Appliance
- LACP
Issue
- StorageGRID UI intermittently reporting Appliance LACP port missing (1 alert) on bondX that later self-resolves on SG1000 appliance.
/proc/net/bonding # cat bond1output on affected node indicates high link failure count on single port:
Slave Interface: hic3
MII Status: up
Speed: 100000 Mbps
Duplex: full
Link Failure Count: 19024
Permanent HW addr:00:00:00:00:00:00
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: monitoring
Partner Churn State: monitoring
Actor Churned Count: 2
Partner Churned Count:2
- The
base-os-logs/var/log/syslogin StorageGRID logs shows port flapping:
Mar 1 22:36:45 localhost kernel: [16695796.894052]mlx5_core 0000:18:00.0 hic3: Link down
Mar 1 22:36:45 localhost kernel: [16695796.913271]bond1: (slave hic3): speed changed to 0 on port 2
Mar 1 22:36:45 localhost kernel: [16695796.997644]bond1: (slave hic3): link status definitely down, disabling slave
Mar 1 22:36:51 localhost kernel: [16695802.955633]mlx5_core 0000:18:00.0 hic3: Link up
Mar 1 22:36:51 localhost kernel: [16695803.013283]bond1: (slave hic3): link status up, enabling it in 200 ms
Mar 1 22:36:51 localhost kernel: [16695803.234008]bond1: (slave hic3): link status definitely up, 100000 Mbps full duplex
Mar 1 22:37:04 localhost kernel: [16695816.539922]bond1: (slave hic3): speed changed to 0 on port 2
Mar 1 22:37:05 localhost kernel: [16695816.624960]bond1: (slave hic3): link status definitely down, disabling slaveMar 1 22:37:09 Mar 1 22:36:45 localhostkernel: [16695796.792420] mlx5_core 0000:18:00.0 hic3: Link up
