Troubleshooting Statefulset pods stuck into ContainerCreating or Init state
Applies to
- ONTAP Tools for VMware (OTV) 10.1
- iSCSI HA deployment
Issue
- When node goes down for more than half an hour, and when it comes back up, in the maintenance console, the application status will look similar to below:
- When we list the pods using the following kubectl command, we can observe that the statefulset pods are stuck in a ContainerCreating or Init state for more than 10 minutes.
- To list the pods, below given command can be utilized and its output would look similar to below:
kubectl -n ntv-system get po -w| grep -e ContainerCreating -e Init -e Pending -e CrashLoopBackOff
ntv-mongodb-
1
0
/
2
Init:
0
/
1
0
10m17s
ntv-vault-
1
0
/
1
ContainerCreating
0
10m25s
- When we describe any of these stuck pods using the below command, we would observe below warning:
Example: kubectl describe po ntv-vault-1 -n ntv-system
MountVolume.SetUp failed for volume "pvc-43451cff-8774-47f8-a49e-557b0dc4d4d2" : rpc error: code = Internal desc = unable to mount device; exit status 32 .
- The kubelet is trying to mount the PV on the pod and the Kubelet logs displays as seen below:
Example: tail -f /opt/netapp/rancher/rke2/agent/logs/kubelet.log
MountVolume.WaitForAttach entering for volume "pvc-43451cff-8774-47f8-a49e-557b0dc4d4d2"
MountVolume.WaitForAttach succeeded for volume "pvc-43451cff-8774-47f8-a49e-557b0dc4d4d2"
Error: MountVolume.SetUp failed for volume "pvc-43451cff-8774-47f8-a49e-557b0dc4d4d2" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-43451cff-8774-47f8-a49e-557b0dc4d4d2") pod "ntv-mongodb-1" (UID: "b1ae36be-a713-46d9-9dbe-94184be7832f") : rpc error: code = Internal desc = unable to mount device; exit status 32