Trident daemonset pod often restarts on worker node with error: prompting liveness probe and readiness probe failed
Applies to
- Astra Trident
Issue
- Trident daemonset pod often restarts on the same worker node.
- Viewing the event of this daemonset pod by "oc describe pod", it keeps prompting liveness probe and readiness probe failed on the application node.
- Other Trident daemonset pods running on another application nodes are normal as usual.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 136m (x22 over 3d8h) kubelet Readiness probe failed: Get "https://Worker_node_IP:17546/readiness": context deadline exceeded
Warning Unhealthy 130m (x103 over 7d23h) kubelet Readiness probe failed: Get "https://Worker_node_IP:17546/readiness": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 85m (x130 over 6d20h) kubelet Liveness probe failed: Get "https://Worker_node_IP:17546/liveness": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 46m (x73 over 7d5h) kubelet Readiness probe failed: Get "https://Worker_node_IP:17546/readiness": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 109s (x864 over 10d) kubelet Liveness probe failed: Get "https://Worker_node_IP:17546/liveness": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 108s (x910 over 9d) kubelet Readiness probe failed: Get "https://Worker_node_IP:17546/readiness": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Kubelet logs:
Dec 17 13:22:17 Worker_node_name kubenswrapper[2279]: I1217 13:22:17.271710 2279 prober.go:114] "Probe failed" probeType="Liveness" pod="trident/trident-node-linux-4cd2l" podUID=5ff3a214-7233-48cf-9d2d-b006f188836c containerName="trident-main" probeResult=failure output="Get \https://Worker_node_IP:17546/liveness\: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
Dec 17 13:25:28 Worker_node_name kubenswrapper[2279]: I1217 13:25:28.012805 2279 prober.go:114] "Probe failed" probeType="Readiness" pod="trident/trident-node-linux-4cd2l" podUID=5ff3a214-7233-48cf-9d2d-b006f188836c containerName="trident-main" probeResult=failure output="Get \https://Worker_node_IP:17546/readiness\: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
Dec 17 00:32:26 Worker_node_name kubenswrapper[2279]: E1215 00:32:26.987146 2279 upgradeaware.go:440] Error proxying data from backend to client: read tcp Worker_node_IP:36582->Worker_node_IP:10010: read: connection reset by peer