What data should I collect for NFSv4 StorePool issues?
Applies to
- ONTAP 9
- NFSv4.x
Answer
- EMS Messages provide a significant amount of information to help narrow down your troubleshooting
- Trigger an AutoSupport to provide this to support
- Take note of the Node Management lif for the node reporting the error
- Statistics provide guidance to see if any buckets are full and what instance they are associated with
- What are performance archives and how are they triggered?
- statistics start can be used to collect a sample over a period of time that can be referenced many times.
- Use the REST API call to check for
storePool_*counters:-
curl -siku "admin:PASSWORD" https://Cluster-Mgmt-IP/api/cluster/counter/tables/nfs_v4_diag/rows/NOTE: use the response from above to run the second call per node to get the storePool counters like in example below.
Example:
curl -siku "admin: PASSWORD " https://Cluster-Mgmt-IP/api/cluster/counter/tables/nfs_v4_diag/rows/Cluster-01%3Cluster-01%3A67261b3
-
- Objects that can be relevant to storepool
- nfsv4
- nfsv4_diag
- nfsv4_error
- nfsv4_1_error
- nfsv4_1_diag
- nfsv4_1
- spinnp_error
- spinnp
- spinhi
- lmgr_ng
- Collect NFSv4 Locks from the node that was calling out the storepool EMS
::> vserverlocks nfsv4 show -instanceis node scoped. As such you MUST ssh to the node (node managment lif) that is reporting the storepool alert
- If a top client was identified previously, collect a packet-trace filtered for that client for 5 minutes while the issue is occurring
- Collect this data again if LIF migration is done to mitigate the issue.
Additional Information
- NFSv4 Storepool - Resolution guide
- To view in real time NFSv4 storepool blocked clients:
::*> vserver nfs storepool blocked-client show -node node_name
-
The following BASH script can be used from a *nix host to collect the needed NFSv4 locking information.
#!/bin/bash
# Set the IP address of the node management server
NODE_MANAGEMENT="NODE-MGMT-IP" # NODE_MANAGEMENT="10.2.1.3"
# Set the username to use during authentication (Ideal to use pubkey auth)
USERNAME="admin"
# Directory to store the statistics
STAT_DIR="statistics/$NODE_MANAGEMENT"
mkdir -p "$STAT_DIR"
# Check SSH connection
if ! ssh -q -o BatchMode=yes -o ConnectTimeout=5 $USERNAME@$NODE_MANAGEMENT exit; then
echo "SSH connection to $NODE_MANAGEMENT failed"
exit 1
fi
# Collect the statistics for NFSv4 and NFSv4_1
NFS_VERSIONS=("nfsv4" "nfsv4_1")
NFS_TYPES=("" "_diag" "_error")
for version in "${NFS_VERSIONS[@]}"; do
for type in "${NFS_TYPES[@]}"; do
ssh $USERNAME@$NODE_MANAGEMENT "set d -c off; rows 0; date; statistics show -object ${version}${type} -raw" >> "$STAT_DIR/stats_nfs.txt"
done
done
# Collect the statistics for spinnp
SPIN_TYPES=("spinnp_error" "spinhi" "spinnp")
for type in "${SPIN_TYPES[@]}"; do
ssh $USERNAME@$NODE_MANAGEMENT "set d -c off; rows 0; date; statistics show -object $type -raw" >> "$STAT_DIR/stats_spin.txt"
done
# Collect the statistics for lmgr_ng
ssh $USERNAME@$NODE_MANAGEMENT "set d -c off; rows 0; date; statistics show -object lmgr_ng -counter files|hosts|owners|locks|*max -raw" >> "$STAT_DIR/stats_lmgr.txt"
# Collect the statistics for vserver locks nfsv4
ssh $USERNAME@$NODE_MANAGEMENT "set d -c off; rows 0; date; vserver locks nfsv4 show -inst" >> "$STAT_DIR/locks.txt"
