CONTAP-484943: Some NFS ops against qtrees taking 5 minutes on 9.16.1
Issue
- After running or upgrading to 9.16.1 occasional NFS ops against qtrees taking 5 minutes before getting a response
- Linux clients with default timeo of 600 (60 seconds), and default retrans of 2 would likely see an NFS server not responding error messages after 3 minutes, and then NFS server ok 2 minutes later.
- Applications such as IBM MQ which require faster response times may be impacted
- To Identify on the ONTAP side,
- first check for hourly EMS message indicating any number of NFS operations taking over 60 seconds:
Nblade.NfsResponseTraceTriggerHourly:debug]: params: \{'responseCount': '14', 'trigger': '60'} - If OPS taking > 60 seconds are noted, enable nfs server traces:
set diag; nfs server modify -vserver * -trace-enabled true - Look for ems events showing NFS process time (procTime) taking close to 300 seconds:
Nblade.NfsResponseTraceTrigger:debug]: params: \{'clientAddr': '10.1.1.2', 'op': 'NFSv4 COMPOUND', 'vserverId': '#', 'procTime': '297', 'trigger': '60'}
Note:
- To be exposed to this issue, systems must be running ONTAP 9.16.1 (without the fix or workaround for this issue deployed) and must be using qtree exports over NFS.
- This issue is more likely to be seen on higher-end systems with high CPU counts due to the increased concurrency these systems allow.