StorageGRID diagnostics reports TCP Retransmit Errors on the gateway node due to custom alert
Applies to
- StorageGRID
- Custom Prometheus alerting for TCP retransmission
Issue
Recurring TCP retransmit errors are reported on a single StorageGRID node’s client network interface, which acts as a gateway/load balancer. These are detected by a custom Prometheus alert.
Example log output / alert:
TCP-RetransmitErrorsforStorageGridNodes
Prometheus query: increase(node_netstat_Tcp_RetransSegs[1h]) sumwithout(device)(increase(node_network_transmit_packets_total{device=~"eth.+"}[1h])) >= 0.003 (minor), 0.004 (major/critical)
