PVC Resize Concurrent with Pod Termination Causes Multipath Device Mismatch and CSI Unmount Failure
Applies to
- NetApp Trident
- Kubernetes
Issue
When a PVC resize occurs while the application pod using that volume is terminating, the CSI unmount process can fail. This leads to:
- CSI unmount and flush device timeouts
- Multipath device size mismatch and path reconciliation failures
- Hung I/O causing delayed cgroup and container cleanup
- Node reboot required to restore device state
NetApp Storage Logs:
NOTICE: SAN::LUN resize_imp: vserver: <SVM>,
path: /vol/trident_pvc_xxxxxxx/lun0,
size: 858993459200
Kubelet Logs:
UnmountDevice failed: flush device failed for /dev/dm-0 : process killed after timeout
Failed to delete cgroup paths ... Timed out while waiting for systemd ...
Trident Logs:
GRPC error: rpc error: code = Internal desc = flush device failed for /dev/dm-0 : process killed after timeout
failed to unstage volume
