Trident volumes stuck in deleting state after OpenShift Virtualization VM deletion
Applies to
- NetApp Trident 26.02
- OpenShift Container Platform (OCP) 4.x
- OpenShift Virtualization (OCP Virt) 4.x
Issue
volumeState=deleting indefinitely and are never reclaimed.tridentctl get volumes -n tridentshows volumes indeletingstate that do not progress.- Trident controller logs contain:
level=warning msg="Backend update resulted in an orphaned volume." backend=<backend-name> vol.Config.InternalName=trident_<backend>_pvc_<uuid> volume=pvc-<uuid> workflow="cr=reconcile"Tridentlevel=debug msg="Updating an existing volume." volume=pvc-<uuid> volumeState=deleting volume_orphaned=true workflow="cr=reconcile"level=debug msg="Attempting snapshot delete."backend=<backend-name> snapshotName=snapshot-<uuid> volumeName=pvc-<uuid>workflow="snapshot=delete"level=warning msg="Retried locked snapshot delete, clone split timer not yet expired." logLayer=core requestID=<id> requestSource=CSI secondsBeforeSplit=86366.62 snapshot=pvc-<uuid>
- VolumeSnapshot objects (
vmsnapshot-*) remain in the namespace after VM deletion. - In environments with high VM creation rates, Trident controller performance may also degrade (slow
tridentctlresponses, timeouts).
Cause
- When a VM is deleted in OpenShift Virtualization (OCP Virt), its associated VolumeSnapshot objects (
vmsnapshot-*) are not automatically removed. Each remaining VolumeSnapshot holds a chain of Kubernetes finalizers that blocks the Trident volume deletion cascade:
VolumeSnapshot→ VolumeSnapshotContent (bound-protection finalizer)→ TridentSnapshot (trident.netapp.io finalizer)→ ONTAP snapshot→ TridentVolume stuck in deleting- In Trident 26.02, the
cloneSplitDelayparameter defaults to86400seconds (24 hours).
Solution
|
WARNING Verify the VM is no longer needed before deleting its snapshots. Deleting the VolumeSnapshot triggers cascade removal of the VolumeSnapshotContent and TridentSnapshot if no other finalizer is blocking. |
Repeat the following steps for each stuck volume.
- Delete OCP Virt VolumeSnapshot objects (if present) by identifying and deleting any
vmsnapshot-*objects in the affected namespace:oc get volumesnapshot -n <namespace>oc delete volumesnapshot <vmsnapshot-name> -n <namespace>
- Delete orphaned ONTAP snapshots:
- Check for snapshots remaining directly on the ONTAP volume:
snapshot show -vserver <svm> -volume <trident_internal_volume_name> - Delete any with Busy=false that have no corresponding Kubernetes object:
snap delete -vserver <svm> -volume <trident_internal_volume_name> -snapshot <snapshot_name>
- Check for snapshots remaining directly on the ONTAP volume:
- Reduce
cloneSplitDelayto300seconds in all Trident backend configurations used by OCP Virt workloads: (This ensures clone splits complete within 5 minutes, significantly reducing the window during which a source VM deletion can trigger stuck volumes.)"cloneSplitDelay": 300
- Upgrade to Trident 26.06 when available
Partner Notes
partnerNotes_text
Additional Information
During VM creation from a template, Trident creates a snapshot and a clone for each PVC; the snapshot is expected to be deleted automatically after the clone split completes. With the 86400s delay, if the source VM is deleted before the split window closes, the snapshot remains, blocking the deletion chain. In environments with many VMs, this results in a large number of stranded snapshots, degrading Trident controller performance.
Internal Notes
If the ONTAP volume and snapshots are already gone, but the volume remains in a deleting state, remove the finalizers from the TridentSnapshot and TridentVolume CRDs:
- To find the TridentSnapshot name for a given PVC:
- kubectl get tridentsnapshot -n trident | grep <pvc-name>
- kubectl patch tridentsnapshot <snapshot-name> -n trident \
-p '{"metadata":{"finalizers":[]}}' --type=merge - kubectl delete tridentsnapshot <snapshot-name> -n trident
kubectl patch tridentvolume <pvc-name> -n trident \
-p '{"metadata":{"finalizers":[]}}' --type=merge - kubectl delete tridentvolume <pvc-name> -n trident
Bug TRID-19333: https://jira.ngage.netapp.com/browse/TRID-19333
