Skip to main content
NetApp Knowledge Base

NFS sessions hung and high latency is reported because of load balancer problems

Views:
626
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
nas
Last Updated:

Applies to

  • ONTAP 9
  • FabricPool

Issue

  • NFS sessions hung issue reported on all the volumes that are configured with FabricPool in the cluster.
  • It starts with one of the node in the cluster reporting high latency.
  • nblade_execsOverLimit_1 and Nblade.nfsLongRunningOp errors seen in EMS logs
    [Node1: kernel: Nblade.nfsLongRunningOp:debug]: Detected a long running network process operation. The client IP address:port is 92.X.X.66:694. The local IP address:port is 10.X.X.82:2049. The protocol requesting the operation is NFS3. The RPC program number for the operation is 100003. The protocol procedure for the operation is Read (6). The disk process UUID is 05238d4dXXXXXXXXXXXXX160cedebc32. The Vserver associated with the operation is XXXX. The UID of the user is 23068. The MSID for the volume is 2161146647. The inode number of the file is 12955.

    [Node1: kernel: nblade_execsOverLimit_1:debug]: params: {'clientIpAddress': '10.X.X.58', 'lifIpAddress': '10.X.X.64', 'vserverId': '4', 'execsLimit': '128'}
  • If attempted to perform takeover/giveback of the affected node, it may panic with the below string
    RPANIC:giveback or arl hung in wafl while doing SENDHOME_DOING_COMMIT in SK process sendhome_hang_detector on release 9.8P19 (C)
  • The issue doesn't reoccur for sometime post TO/GB of the affected node.
  • sktrace logs indicate cloud io error

    [5:0] CLOUD_BIN_ERR:  cio_error_to_raid_error: Cloud-bin read block 35286487738791  data unavailable cloud io error 9 btid: 8969343 btuuid: cab5f25b-3425-476f-a361-11a69e7db847, seq_num: 1241209
    [13:0] CLOUD_BIN_ERR:  cio_error_to_raid_error: Cloud-bin read block 35277266573844  data unavailable cloud io error 9 btid: 40852388 btuuid: f23e6bae-2ef0-4168-b611-3e3d87274447, seq_num: 1637591
    [13:0] CLOUD_BIN_ERR:  cio_error_to_raid_error: Cloud-bin read block 35284330697390  data unavailable cloud io error 9 btid: 40183670 btuuid: d06a9d55-46c3-473c-b06f-0c6091fa3b02, seq_num: 171567
  • Object store show command shows available

    cluster::> storage aggregate object-store show
    Aggregate      Object Store Name Availability
    -------------- ----------------- -------------
    aggr1          s3_bucket         available
  • As per storage aggregate object-store profiler start command, the PUT has 0 failures, however all of the GET showed the same amount of failures as total run.

    Object store config name: s3_bucket
    Node name: Node1
    Status: Done
    Start time: 8/2/2023 15:08:38
    Op      Size       Total     Failed             Latency(ms)          Throughput
                                            min       max       avg
    -------------------------------------------------------------------------------
    PUT     4MB        1041      0         91        17799     2891      66.98MB
    GET     4KB        77095     270       5         35501     94        4.28MB
    GET     8KB        284       284       10003     35502     23920     0B
    GET     32KB       297       297       10000     35000     22532     0B
    GET     256KB      285       285       9999      33006     22843     0B
    5 entries were displayed.
  • StorageGrid is configured as the capacity tier.
  • No issues seen on the StorageGrid nodes. 

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.