High NFS Latency with multiple connections to node LIFs from common client TCP socket
Applies to
- ONTAP 9.5
- ONTAP 9.6
- ONTAP 9.7 prior to P7
Issue
- OTHER Ops for NFS workload shows extremely high latency (>100ms) in Active IQ Unified Manager.
- Breakdown of latency on shows high latency on Network processing or CPU_Network. (Latency induced on the workload from CPU Network delays).
- No evidence of any CPU contention or high CPU utilization.
- One or many clients use the same source TCP socket (source address & port) for multiple simultaneous NFS (v3/v4) mounts to different interfaces (LIFs) on same node.
- The same issue can occur with multiple connections to different ports on the same LIF from the same source socket.
- The LIFs can be on the same SVM or different SVMs, but residing on the same node.
cluster::*> system node run -node node_1 -command netstat -na
---- Default IPSpace ----
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address Foreign Address (state) VCTX Services
tcp4 0 0 10.8.24.135.2049 10.8.24.82.1023 ESTABLISHED 13 0x00000804
tcp4 0 0 10.8.24.129.2049 10.8.24.82.1023 ESTABLISHED 12 0x00000804
tcp4 0 0 10.8.24.123.4045 10.8.26.8.1023 ESTABLISHED 11 0x00000804
tcp4 0 0 10.8.24.123.2049 10.8.26.8.1023 ESTABLISHED 11 0x00000804
- NFS client may be unable to perform any operations on affected mount (cd, ls, etc.) and receive error:
file temporarily unavailable on the server, retrying...
- NetApp Active IQ may notify Risk ID 5637:
- Risk: Multiple NFS connections from the same client using the same ephemeral port to multiple LIFs in a single IPSpace on a single node can result in NFS slowness/hangs for that client.
- Potential Impact: Clients in this configuration may experience NFS slowness and/or hangs.