NFS Services intermittent hung due to Network packet drops
Applies to
- NetApp ONTAP 9.13.1P9 (Cluster-Mode) and above
- NFS Protocol v4
- Environments with external network switches/firewalls between NFS client and NetApp storage
Issue
- NFS clients intermittently report that the NFS server is not responding, causing application and user impact. The following errors are observed on the client and in packet traces:
- Example client log output:
nfs: server 10.240.x.y not responding, timed out
- Example client log output:
- Network and packet trace observations:
-
- Just before the error, packets sent from client and Netapp are not being received by each other, could see TCP-retransmissions by both client and server.
- After a certain time, the client tries tries to setup the TCP connection by sending TCP SYN, however ONTAP still has this TCP connection in the established state hence it responds with TCP ACK, acknowledging the previously received data from the client, instead of sending TCP SYN-ACK and this is an expected behavior.
- Now after the client just receives the TCP SYN from the ONTAP, it resets the TCP connection gracefully by sending TCP RST.
- Now the client initiates the TCP 3 way handshake, which gets completed, now the client sends NFSv4 Call READ and this never reaches ONTAP, then the client resets the connection by sending TCP Reset and the pattern continues and does not recover from the packet loss.
