Performance impact of a full FlexGroup volume
Applies to
- ONTAP 9.1 and later
- FlexGroup
Issue
- Slow response time observed from a FlexGroup volume with all or some constituents above about 90% full.
- Very slow response running ls command on the FlexGroup volume.
- The VMs where the volume is mounted may go hung after logging in and respond very slow .
- in some cases it leads to performance degrading of the node to a degree that you cannot connect to a SVM e.g. serving data is affected
- Constituents with vastly differing space usage between constituents can also be impacted in a similar way as all constituents being 80-90% or more full.
- OCUM reports high latency for other operations.
- Potential operation timeout events in Event log:
[xxx: kernel: Nblade.dBladeNoResponse.NFS:error]: File operation timed out because there was no response from the data-serving node. Node UUID: 9xxx, file operation protocol: NFS, client IP address: xx.xx.xx.xx, RPC procedure: 3.
- Insufficient space errors for FlexGroup volumes in Event log:
wafl_exempt08: wafl.vol.fsp.full:error]: volume amperexxxxxx@vserver:xxxxxx: insufficient space in FSP wafl_remote_reserve to satisfy a request of 0 holes and 27 overwrites.
wafl_exempt08: wafl.vol.fsp.full:error]: volume xxxxxxx@vserver:xxxxxxxxxxx: insufficient space in FSP wafl_remote_reserve to satisfy a request of 2 holes and 27 overwrites
Example:
- All the constituent member volumes are 96% full for the fg1 FlexGroup:
clstr::> volume show -vserver vs1 -volume-style-extended flexgroup
Vserver Volume Aggregate State Type Size Available Used%
--------- ------------ ------------ ---------- ---- ---------- ---------- -----
vs1 fg1 - online RW 500GB 207.5GB 96%
clstr::> volume show -vserver vs1 -volume-style-extended flexgroup-constituent
Vserver Volume Aggregate State Type Size Available Used%
--------- ------------ ------------ ---------- ---- ---------- ---------- -----
vs1 fg1__0001 aggr3 online RW 31.25GB 12.97GB 96%
vs1 fg1__0002 aggr1 online RW 31.25GB 12.98GB 96%
vs1 fg1__0003 aggr1 online RW 31.25GB 13.00GB 96%
vs1 fg1__0004 aggr3 online RW 31.25GB 12.88GB 96%
vs1 fg1__0005 aggr1 online RW 31.25GB 13.00GB 96%
vs1 fg1__0006 aggr3 online RW 31.25GB 12.97GB 96%
vs1 fg1__0007 aggr1 online RW 31.25GB 13.01GB 96%
vs1 fg1__0008 aggr1 online RW 31.25GB 13.01GB 96%
vs1 fg1__0009 aggr3 online RW 31.25GB 12.88GB 96%
vs1 fg1__0010 aggr1 online RW 31.25GB 30.01GB 96%
vs1 fg1__0011 aggr3 online RW 31.25GB 12.97GB 96%
vs1 fg1__0012 aggr1 online RW 31.25GB 13.01GB 96%
vs1 fg1__0013 aggr3 online RW 31.25GB 12.95GB 96%
vs1 fg1__0014 aggr3 online RW 31.25GB 12.97GB 96%
vs1 fg1__0015 aggr3 online RW 31.25GB 12.88GB 96%
vs1 fg1__0016 aggr1 online RW 31.25GB 13.01GB 96%
16 entries were displayed.
- Run
qos
command to check the source of the high latency is fromData
layer : QOS commands to monitor volume latency in real time
clstr::> qos statistics volume latency show -vserver vs1 -volume fg1
Workload ID Latency Network Cluster Data Disk QoS Max QoS Min NVRAM
--------------- ------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
-total- - 206.06ms 3.27ms 3.10ms 199.7ms 0ms 0ms 0ms 0ms
fg1-wid233.. 23350 206.06ms 3.27ms 3.10ms 199.7ms 0ms 0ms 0ms 0ms
-total- - 347.23ms 5.08ms 2.81ms 334.85ms 0ms 0ms 0ms 2.42ms
fg1-wid233.. 23350 347.23ms 5.08ms 2.81ms 334.85ms 0ms 0ms 0ms 2.42ms
-total- - 308.52ms 4.75ms 2.82ms 296.02ms 0ms 0ms 0ms 2.70ms