Upgrading GKE with Anthos fails on Trident storage
Applies to
- Astra Trident
- Google Kubernetes Engine (GKE) cluster with Anthos
Issue
Running gkectl diagnose in preparation for upgrading gives a FAILURE on checking storage similar to:
user@hostname:~$ gkectl diagnose cluster --kubeconfig kubeconfig --cluster-name gke-anthos-cluster
Preparing for the diagnose tool...
Diagnosing the cluster...... DONE
Diagnose result is saved successfully in /home/user/diagnose-user-gke-anthos-cluster-20230130155819.json
- Validation Category: Cluster Healthiness
Checking user cluster and node pools...SUCCESS
Checking user cluster certificates...SUCCESS
Checking cluster object...SUCCESS
...
Checking GKE Hub Membership...SUCCESS
Checking all poddisruptionbudgets...SUCCESS
Checking storage...FAILURE
Reason: 3 storage error(s).
Unhealthy Resources:
PersistentVolume kubernetes.io/csi/csi.trident.netapp.io^pvc-1234abcd-1234-abcd-1234-12345abcde12: virtual disk "kubernetes.io/csi/csi.trident.netapp.io^pvc-1234abcd-1234-abcd-1234-12345abcde12" IS NOT attached to machine "hostname-of-node-01" but IS listed in the Node.Status
PersistentVolume kubernetes.io/csi/csi.trident.netapp.io^pvc-1234abcd-1234-abcd-1234-12345abcde12: virtual disk "kubernetes.io/csi/csi.trident.netapp.io^pvc-1234abcd-1234-abcd-1234-12345abcde12" IS NOT attached to machine "hostname-of-node-02" but IS listed in the Node.Status
PersistentVolume kubernetes.io/csi/csi.trident.netapp.io^pvc-1234abcd-1234-abcd-1234-12345abcde12: virtual disk "kubernetes.io/csi/csi.trident.netapp.io^pvc-1234abcd-1234-abcd-1234-12345abcde12" IS NOT attached to machine "hostname-of-node-03" but IS listed in the Node.Status