What data should I collect for NFSv4 StorePool issues?
Applies to
- ONTAP 9
- NFSv4.x
Answer
- EMS Messages provide a significant amount of information to help narrow down your troubleshooting
- Trigger an AutoSupport to provide this to support
- Take note of the Node Management lif for the node reporting the error
- Statistics provide guidance to see if any buckets are full and what instance they are associated with
- What are performance archives and how are they triggered?
- statistics start can be used to collect a sample over a period of time that can be referenced many times.
- Use the REST API call to check for
storePool_*
counters:-
curl -siku "admin:PASSWORD" https://Cluster-Mgmt-IP/api/cluster/counter/tables/nfs_v4_diag/rows/
NOTE: use the response from above to run the second call per node to get the storePool counters like in example below.
Example:
curl -siku "admin: PASSWORD " https://Cluster-Mgmt-IP/api/cluster/counter/tables/nfs_v4_diag/rows/Cluster-01%3Cluster-01%3A67261b3
-
- Objects that can be relevant to storepool
- nfsv4
- nfsv4_diag
- nfsv4_error
- nfsv4_1_error
- nfsv4_1_diag
- nfsv4_1
- spinnp_error
- spinnp
- spinhi
- lmgr_ng
- Collect NFSv4 Locks from the node that was calling out the storepool EMS
vserver
locks nfsv4 -inst
is node scoped. As such you MUST ssh to the node (node managment lif) that is reporting the storepool alert
WARNING Be sure to substitute in the node management IP for the node where the EMS is being reported. This will be where the node that houses the data lif. |
- If a top client was identified previously, collect a packet-trace filtered for that client for 5 minutes while the issue is occurring
- Collect this data again if LIF migration is done to mitigate the issue.
Additional Information
- NFSv4 Storepool - Resolution guide
-
The following BASH script can be used from a *nix host to collect the needed NFSv4 locking information.
#!/bin/bash # Set the IP address of the node management server NODE_MANAGEMENT="NODE-MGMT-IP" # NODE_MANAGEMENT="10.2.1.3" # Set the username to use during authentication (Ideal to use pubkey auth) USERNAME="admin" # Directory to store the statistics STAT_DIR="statistics/$NODE_MANAGEMENT" mkdir -p "$STAT_DIR" # Check SSH connection if ! ssh -q -o BatchMode=yes -o ConnectTimeout=5 $USERNAME@$NODE_MANAGEMENT exit; then echo "SSH connection to $NODE_MANAGEMENT failed" exit 1 fi # Collect the statistics for NFSv4 and NFSv4_1 NFS_VERSIONS=("nfsv4" "nfsv4_1") NFS_TYPES=("" "_diag" "_error") for version in "${NFS_VERSIONS[@]}"; do for type in "${NFS_TYPES[@]}"; do ssh $USERNAME@$NODE_MANAGEMENT "set d -c off; rows 0; date; statistics show -object ${version}${type} -raw" >> "$STAT_DIR/stats_nfs.txt" done done # Collect the statistics for spinnp SPIN_TYPES=("spinnp_error" "spinhi" "spinnp") for type in "${SPIN_TYPES[@]}"; do ssh $USERNAME@$NODE_MANAGEMENT "set d -c off; rows 0; date; statistics show -object $type -raw" >> "$STAT_DIR/stats_spin.txt" done # Collect the statistics for lmgr_ng ssh $USERNAME@$NODE_MANAGEMENT "set d -c off; rows 0; date; statistics show -object lmgr_ng -counter files|hosts|owners|locks|*max -raw" >> "$STAT_DIR/stats_lmgr.txt" # Collect the statistics for vserver locks nfsv4 ssh $USERNAME@$NODE_MANAGEMENT "set d -c off; rows 0; date; vserver locks nfsv4 show -inst" >> "$STAT_DIR/locks.txt"