APD error messages observed on VMware due to network congestion
Applies to
- Ontap 9
- NFS
- Cisco network switch
Issue
- NFS APD issues observed on VMware ESXi.
- In vmkernel log we see error messages as below.
Lost connection to server 1xx.2x.xc.xx mount point
Restored connection to server 1xx.2x.xc.xx
- Every one hour datastores are disconnecting.
- When disconnect happens, VM goes into hung state.
- MTU on ESXi hosts is set to 1500.
- The NFS.MaxQueueDepth parameter was adjusted to 128, but even after rebooting the hosts, the issue persisted.
- APD messages are received when VM is running and storage vmotion is happening.
- Captured simultaneous packet traces and observed network congestion in the path between the two endpoints on certain ports or trunk links
- Performance archives are analyzed to see if there is any latency, but no latency was observed on affected volumes.
