Skip to main content
NetApp Knowledge Base

Extreme Client latency and/or hang due to shortage of long-term, replay cache buckets

Views:
350
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
nfs
Last Updated:

Applies to

  • ONTAP 8.3.x and later
  • NFS

Issue

Based on ONTAP statistics, high latency could be observed from OPM, Grafana, Perfstat/PerfArchive in the latency breakdown section.  Depending on the tools that are used to monitor performance, the bulk of the latency is from 'CPU_NETWORK' or 'CLUSTER_INTERCONNECT'.

Several errors/warnings could be found from various logs:

  1. CSM timeouts observed from request blade(Nblade) in a perfstat 'sysctl sysvar.csm', the same output could be collected manually from SystemShell by running 'sysctl sysvar.csm'.

Example output:

SpinNPSessionInt::timeout): this=0xffffff80085e1028, sessionId=(req=cluster_n01:nblade, rsp=cluster_n02:dblade, uniquifier=00053816b2747090): In last 3974071360 ms, 104 of 2168524218 Ops timed out, 2171533701 started, 0 Ops timed out unsent. 4289664640/0/0 Ops await replies, 0 segs sent, 0 await ACKs

  1. CSM Flowcontrol on the receiver node(Dblade) in a perfstat 'sysctl sysvar.csm', the same output could be collected manually from SystemShell by running 'sysctl sysvar.csm'.

Example output:

SpinNPSessionInt::processSessionFlowcontrolQueue): sess = 0xffffff8007bdf028, sessionId = (req=c55f68b8-7cc0-11e4-84e6-098b9834504d, rsp=cluster_n02:dblade, uniquifier=00053816b2747090), iface = 1, delivered REQUEST pkt = 0xffffff05931fa271 to flow control list

  1. Nblade.nfsConnResetAndClose - 'Maximum number of rewind attempts has been exceeded' could be found from EMS logs.

Example Output:

Nblade.nfsConnResetAndClose: Shutting down connection with the client. Vserver ID is xx; network data protocol is NFS; client IP address:port is xx.xx.xx.xx:xxx. local IP address is xx.xx.xx.xx; reason is CSM error - Maximum number of rewind attempts has been exceeded.

  1. High SpinNP latency outliers observed from a perfstat ‘stats spinnp’ section, check and make sure it increments across iterations. The same output could also be collected manually by running 'statistics show -object spinnp -raw' from ClusterShell(diag mode).

Example Output:

spinnp:spinnp:latency_hist.<1s:2577819
spinnp:spinnp:latency_hist.<2s:7878237
spinnp:spinnp:latency_hist.<4s:6262884
spinnp:spinnp:latency_hist.<6s:1629240
spinnp:spinnp:latency_hist.<8s:307280
spinnp:spinnp:latency_hist.<10s:85273
spinnp:spinnp:latency_hist.<20s:145299
spinnp:spinnp:latency_hist.<30s:51447
spinnp:spinnp:latency_hist.<60s:30
spinnp:spinnp:latency_hist.<90s:10
spinnp:spinnp:latency_hist.<120s:6
spinnp:spinnp:latency_hist.>120s:50

  1. Spinhi statistics indicates that almost all the Spinhi requests are in deferred queue, this could be found from a perfstat 'spinhi_stats' section, or manually collect it from nodeshell by running 'spinhi_stats'(diag mode).

Example Output:

(spinhi_stats) size=39502 total_req=421874001827 cur_req=25780 max_req=26702 total_resp=421873962781 total_replay_resp=289138 defer_req=55765 cur_defer=25780 max_defer=25780 hipri=15603269 unmarshal_errs=0 marshal_errs=0 fastpath_null_resps=0 cur_nogrow_filecb_bulk=0, cur_nogrow_filecb_op=0 redo=131995, max_nogrow_filecb_bulk=0 max_nogrow_filecb_fileop=0 Access: count=44862084546 hipri=0 errs=77411717 elapsed: max=14087030.76 avg=280.45

cur_req: Current number of requests in SpinHi
cur_defer: Current number of requests in SpinHi Defer Queue
If cur_defer == cur_req, that means, all the current requests at Spinhi are in the Defer Queue
Counter "spinnp_replay_max_long_term_hit" increments across iterations in a perfstat section 'stats spinnp_replay_cache', for example:
spinnp_replay_cache:spinnp_replay_cache:spinnp_replay_max_long_term_hit:20467472
spinnp_replay_max_long_term_hit: Total number of times max long term limit was hit"

 

CUSTOMER EXCLUSIVE CONTENT

Registered NetApp customers get unlimited access to our dynamic Knowledge Base.

New authoritative content is published and updated each day by our team of experts.

Current Customer or Partner?

Sign In for unlimited access

New to NetApp?

Learn more about our award-winning Support