NFS operations hang or NFS not responding errors reported when entire flexgroup usage reaches 100 percent
Applies to
- ONTAP 9
- Flexgroup
- NFS
Issue
- NFS client's kernel log contains
mount: server <name> not responding, timed out
find
command does not respond- Client experiences NFS latency
storage aggregate show
command returns error
cluster::*> storage aggregate show
Info: Failed to get the information for aggregate aggr0_node09. Reason: ZSM - failed, status code = 571, extra = Timeout: Operation "ksmfRawZapi_iterator::get_imp()" took longer than 110
seconds to complete [from mgwd on node "node01" (VSID: -1) to kernel at 169.254.33.96], took 109.996s, max 110s [169.254.33.96:951].
Failed to get the information for aggregate node09. Reason: ZSM - failed, status code = 571, extra = Timeout: Operation "ksmfRawZapi_iterator::get_imp()" took longer than 110
seconds to complete [from mgwd on node "node01" (VSID: -1) to kernel at 169.254.33.96], took 109.997s, max 110s [169.254.33.96:951].
Aggregate Size Available Used% State #Vols Nodes RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
aggr0_node09 - - - unknown - node09 -
aggr0_node10 1020GB 49.46GB 95% online 1 node10 raid_dp,normal
node09 - - - unknown - node09 -
node10 527.0TB 148.3TB 72% online 93 node10 raid_dp,normal
cf status
command returns error
cluster::*> cf status
Takeover
Node Partner Possible State Description
-------------- -------------- -------- -------------------------------------
node09 node10 - Up. Node accessible via HA-IC, but cluster access failed
node10 node09 true Connected to node09
- EMS log
Sun Jan 08 01:20:04 [node09: wafl_exempt14: wafl.vol.fsp.full:error]: volume flexvol__0005@vserver:xxxxxxxx-0a45-11e8-86ae-xxxxxxxxxxxx: insufficient space in FSP wafl_remote_reserve to satisfy a request of 0 holes and 12 overwrites.
Sun Jan 08 01:20:30 [node01: kernel: Nblade.nfsLongRunningOp:debug]: Detected a long running network process operation.
The client IP address:port is xxx.xxx.109.64:922.
The local IP address:port is xxx.xxx.207.30:2049.
The protocol requesting the operation is NFS3.
The RPC program number for the operation is 100003.
The protocol procedure for the operation is ReadDirPlus (17).
The disk process UUID is xxxxxxxx926a11e9999b00a0xxxxxxxx.
The Vserver associated with the operation is vserver1.
The UID of the user is 0.
The MSID for the volume is xxxxxxxxxx.
The inode number of the file is 45644.
Sun Jan 08 01:21:56 [node01: kernel: Nblade.dBladeNoResponse.NFS:error]: File operation timed out because there was no response from the data-serving node.
Node UUID: xxxxxxxx-ff66-11e9-9b05-xxxxxxxxxxxx,
file operation protocol: NFS,
client IP address: xxx.xxx.109.58,
RPC procedure: 3.
Sun Jan 08 01:27:11 [node01: kernel: Nblade.dBladeNoResponse.NFS:error]: File operation timed out because there was no response from the data-serving node.
Node UUID: xxxxxxxx-ff66-11e9-9b05-xxxxxxxxxxxx,
file operation protocol: NFS,
client IP address: xxx.xxx.109.60,
RPC procedure: 17.
Sun Jan 08 01:27:36 [node09: wafl_exempt04: wafl.vol.full:alert]: Insufficient space on volume flexvol__0005@vserver:xxxxxxxx-0a45-11e8-86ae-xxxxxxxxxxxx to perform operation. 76.0KB was requested but only 12.0KB was available.
Sun Jan 08 01:28:19 [node09: wafl_exempt06: wafl.vol.fsp.full:error]: volume flexvol__0005@vserver:xxxxxxxx-0a45-11e8-86ae-xxxxxxxxxxxx: insufficient space in FSP wafl_remote_reserve to satisfy a request of 1 holes and 26 overwrites.
Sun Jan 08 11:44:26 [node09: kernel: Nblade.nfsLongRunningOp:debug]: Detected a long running network process operation.
The client IP address:port is 10.96.103.108:775.The local IP address:port is xxx.xxx.207.207:2049.
The protocol requesting the operation is NFS3.The RPC program number for the operation is 100003.
The protocol procedure for the operation is LookUp (3).The disk process UUID is xxxxxxxx926a11e9999b00a0xxxxxxxx.
The Vserver associated with the operation is vserver1.The UID of the user is 0.The MSID for the volume is xxxxxxxxxx.
The inode number of the file is xxxxxxxx.
Sun Jan 08 11:49:31 [node09: kernel: Nblade.nfsLongRunningOp:debug]: Detected a long running network process operation.
The client IP address:port is 10.96.103.108:823.The local IP address:port is xxx.xxx.207.206:2049.
The protocol requesting the operation is NFS3.The RPC program number for the operation is 100003.The protocol procedure for the operation is ReadDirPlus (17).
The disk process UUID is xxxxxxxx926a11e9999b00a0xxxxxxxx.The Vserver associated with the operation is vserver1.
The UID of the user is 0.The MSID for the volume is xxxxxxxxxx. The inode number of the file is xxxxxxxx.
Fri Jun 21 01:19:58 -0400 [node09: kernel: Nblade.nfsLongRunningOp:debug]: Detected a long running network process operation. The client IP address:port is xx.xx.xx.xx:808. The local IP address:port is xx.xx.xx.xx:2049. The protocol requesting the operation is NFS3. The RPC program number for the operation is 100003. The protocol procedure for the operation is Write (7). The disk process UUID is xxxxxxxxxxxxxxxx. The Vserver associated with the operation is vserver1. The UID of the user is xxxxxx. The MSID for the volume is xxxxxx. The inode number of the file is xxxxxx.