ONTAP SAN lost access to VMware datastore
Applies to
- ONTAP 9.8 and above
- VMware ESXi 8 u2
- Brocade FC switch
Issue
- Soon after updating ONTAP from 9.7 VMware vSphere reports "Lost access to volume 1234 due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly"
- Reported lost access to VMware datastore periodically during times when other service have a high workload
- Fabric and multipathing paths are healthy
- Very high LUN latency spike which coincides with a ramp up of another workload while volume latency is very low
- Perf archive for workload indicates very high QoS min throughput latency
Examples:
ONTAP:
EMS log:
qos_vd_window_manager: qos.VioDet.maxThrottle:notice]: QoS violation limit for NODE xy is 1000 IOPS
VMWare:
Vobd.log:
vobd[2098027]: [vmfsCorrelator] 3596269549259us: [vob.vmfs.heartbeat.timedout] xyz SVM_abc
vobd[2098027]: [vmfsCorrelator] 3596244876022us: [esx.problem.vmfs.heartbeat.timedout] xyz SVM_abc
vmkernel.log:
vmkernel: cpu9:2097307)NMP: nmp_ThrottleLogForDevice:3863: Cmd 0x89 (0x4, 8) to dev "naa.600a01234
17251" on path "vmhba4:C0:T13:L13" Failed:
vmkernel: cpu9:2097307)NMP: nmp_ThrottleLogForDevice:3863: Cmd 0x89 (0x4, 8) to naa.600a01234
ATS command failing on storage path:
vmkernel: cpu9:2097307)NMP: nmp_ThrottleLogForDevice:3863: Cmd 0x89 (0x45ba0179a3c0, 2097288) to dev "naa.600a01234" on path "vmhba4:C0:T13:L13" Failed: