vMotion Failures Due to Latency During StorageGRID Decommission
Applies to
- NetApp StorageGRID
- ONTAP FabricPool clusters (ONTAP 9.16.1P8 and above)
- NFSv3 protocol environments
- Veeam Instant Recovery (IR) workflows
- Large grid environments with active node decommissioning
Issue
- Persistent Storage vMotion failures when migrating VMs from a temporary Veeam IR datastore to a production datastore using vMotion (NFSv3 protocol).
- Failures present as timeout errors and “Filesystem timeout (Ok to retry)” messages.
- Application slowness and backup performance issues.
- Observed after an ONTAP upgrade.
- Profiler and logs show high cloud/object store latency, especially during active node decommission.
- Errors such as HTTP 500 (
asyncPush error: no consumer) and 499 for GETs, high IO wait times, and disk read latency spikes (300-500ms). - Example log output:
HTTPMethod=GET, HTTPStatusCode=500, Details={asyncPush error: no consumer}Filesystem timeout (Ok to retry)
