Skip to main content
NetApp Knowledge Base

Trident volumes stuck in deleting state after OpenShift Virtualization VM deletion

Views:
12
Visibility:
Public
Votes:
0
Category:
astra_trident
Specialty:
astra
Last Updated:

Applies to

  • NetApp Trident 26.02
  • OpenShift Container Platform (OCP) 4.x
  • OpenShift Virtualization (OCP Virt) 4.x

Issue

After deleting a virtual machine (VM) in OpenShift Virtualization, one or more Trident PVCs remain stuck in volumeState=deleting indefinitely and are never reclaimed.
  • tridentctl get volumes -n trident shows volumes in deleting state that do not progress.
  • Trident controller logs contain:
    • level=warning msg="Backend update resulted in an orphaned volume." backend=<backend-name> vol.Config.InternalName=trident_<backend>_pvc_<uuid> volume=pvc-<uuid> workflow="cr=reconcile"
    • Tridentlevel=debug msg="Updating an existing volume."  volume=pvc-<uuid> volumeState=deleting volume_orphaned=true workflow="cr=reconcile"
    • level=debug msg="Attempting snapshot delete."backend=<backend-name> snapshotName=snapshot-<uuid> volumeName=pvc-<uuid>
      workflow="snapshot=delete"
    • level=warning msg="Retried locked snapshot delete, clone split timer not yet expired." logLayer=core requestID=<id> requestSource=CSI secondsBeforeSplit=86366.62 snapshot=pvc-<uuid>
  • VolumeSnapshot objects (vmsnapshot-*) remain in the namespace after VM deletion.
  • In environments with high VM creation rates, Trident controller performance may also degrade (slow tridentctl responses, timeouts).

Cause

  • When a VM is deleted in OpenShift Virtualization (OCP Virt), its associated VolumeSnapshot objects (vmsnapshot-*) are not automatically removed. Each remaining VolumeSnapshot holds a chain of Kubernetes finalizers that blocks the Trident volume deletion cascade:
          VolumeSnapshot
            → VolumeSnapshotContent (bound-protection finalizer)
              → TridentSnapshot (trident.netapp.io finalizer)
                → ONTAP snapshot
                  → TridentVolume stuck in deleting
  • In Trident 26.02, the cloneSplitDelay parameter defaults to 86400 seconds (24 hours). 

Solution

    WARNING

    Verify the VM is no longer needed before deleting its snapshots. Deleting the VolumeSnapshot triggers cascade removal of the VolumeSnapshotContent and TridentSnapshot if no other finalizer is blocking.

     Repeat the following steps for each stuck volume.

    1. Delete OCP Virt VolumeSnapshot objects (if present) by identifying and deleting any vmsnapshot-* objects in the affected namespace:
      1. oc get volumesnapshot -n <namespace>
      2. oc delete volumesnapshot <vmsnapshot-name> -n <namespace>
    2. Delete orphaned ONTAP snapshots:
      1. Check for snapshots remaining directly on the ONTAP volume: snapshot show -vserver <svm> -volume <trident_internal_volume_name>
      2. Delete any with Busy=false that have no corresponding Kubernetes object: snap delete -vserver <svm> -volume <trident_internal_volume_name> -snapshot <snapshot_name>
    3. Reduce cloneSplitDelay to 300 seconds in all Trident backend configurations used by OCP Virt workloads: (This ensures clone splits complete within 5 minutes, significantly reducing the window during which a source VM deletion can trigger stuck volumes.)
      1. "cloneSplitDelay": 300
    4. Upgrade to Trident 26.06 when available

    Partner Notes

    partnerNotes_text

    Additional Information

    During VM creation from a template, Trident creates a snapshot and a clone for each PVC; the snapshot is expected to be deleted automatically after the clone split completes. With the 86400s delay, if the source VM is deleted before the split window closes, the snapshot remains, blocking the deletion chain. In environments with many VMs, this results in a large number of stranded snapshots, degrading Trident controller performance.

    Internal Notes

    If the ONTAP volume and snapshots are already gone, but the volume remains in a deleting state, remove the finalizers from the TridentSnapshot and TridentVolume CRDs:

    • To find the TridentSnapshot name for a given PVC:
      • kubectl get tridentsnapshot -n trident | grep <pvc-name>
      • kubectl patch tridentsnapshot <snapshot-name> -n trident \
          -p '{"metadata":{"finalizers":[]}}' --type=merge
      • kubectl delete tridentsnapshot <snapshot-name> -n trident
        kubectl patch tridentvolume <pvc-name> -n trident \
          -p '{"metadata":{"finalizers":[]}}' --type=merge
      • kubectl delete tridentvolume <pvc-name> -n trident

    Bug TRID-19333: https://jira.ngage.netapp.com/browse/TRID-19333

    Sign in to view the entire content of this KB article.

    New to NetApp?

    Learn more about our award-winning Support

    NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.