What data should I collect for NFSv4 StorePool issues?
Applies to
- ONTAP 9
- NFSv4.x
Answer
- EMS Messages provide a significant amount of information to help narrow down your troubleshooting
- Trigger an AutoSupport to provide this to support
- Take note of the Node Management lif for the node reporting the error
- Statistics provide guidance to see if any buckets are full and what instance they are associated with
- What are performance archives and how are they triggered?
- statistics start can be used to collect a sample over a period of time that can be referenced many times.
- Use the REST API call to check for
storePool_*
counters:-
curl -siku "admin:PASSWORD" https://Cluster-Mgmt-IP/api/cluster/counter/tables/nfs_v4_diag/rows/
NOTE: use the response from above to run the second call per node to get the storePool counters like in example below.
Example:
curl -siku "admin: PASSWORD " https://Cluster-Mgmt-IP/api/cluster/counter/tables/nfs_v4_diag/rows/Cluster-01%3Cluster-01%3A67261b3
-
- Objects that can be relevant to storepool
- nfsv4
- nfsv4_diag
- nfsv4_error
- nfsv4_1_error
- nfsv4_1_diag
- nfsv4_1
- spinnp_error
- spinnp
- spinhi
- lmgr_ng
- Collect Locks from the node to see current utilization,
vserver locks nfs4
is node scoped- vserver locks show commands can be used to see what locks are existing
- If a top client was identified previously, collect a packet-trace filtered for that client for 5 minutes while the issue is occurring
- Collect this data again if LIF migration is done to mitigate the issue.
- Be sure to substitute in the node management IP for the node where the LIF now resides.
Additional Information
- NFSv4 Storepool - Resolution guide
-
A bash script has been provided to facilitate easier collection of this data
#!/bin/bash #Set the IP address of the node management server NODE_MANAGEMENT="NODE-MGMT-IP" # NODE_MANAGEMENT="10.2.1.3" #Set the username to user during authentication (Ideal to use pubkey auth) USERNAME="admin" #Create a directory to store the statistics mkdir -p statistics/$NODE_MANAGEMENT #Collect the statistics for NFSv4 and NFSv4_1 for version in nfsv4 nfsv4_1 do for TYPE in "" "_diag" "_error" do ssh $USERNAME@$NODE_MANAGEMENT "set d -c off; rows 0; date; statistics show -object $version$TYPE -raw" >> statistics/$NODE_MANAGEMENT/nfs.txt done done #Collect the statistics for spinnp for TYPE in spinnp_error spinhi spinnp do ssh $USERNAME@$NODE_MANAGEMENT "set d -c off; rows 0; date; statistics show -object $TYPE -raw" >> statistics/$NODE_MANAGEMENT/spin.txt done #Collect the statistics for lmgr_ng ssh $USERNAME@$NODE_MANAGEMENT "set d -c off; rows 0; date; statistics show -object lmgr_ng -counter files|hosts|owners|locks|*max -raw" >> statistics/$NODE_MANAGEMENT/locks.txt #Collect the statistics for vserver locks nfsv4 ssh $USERNAME@$NODE_MANAGEMENT "set d -c off; rows 0; date; vserver locks nfsv4 show -inst" >> statistics/$NODE_MANAGEMENT/locks.txt