Performance impact of a full FlexGroup volume

Last updated

Jan 7, 2025
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 2,070

Visibility:: Public

Votes:: 2

Category:: ontap-9

Specialty:: perf

Last Updated:: 1/7/2025, 12:14:59 AM

Applies to

ONTAP 9.1 and later
FlexGroup

Issue

Slow response time observed from a FlexGroup volume with all or some constituents above about 90% full.
Very slow response running ls command on the FlexGroup volume.
The VMs where the volume is mounted may go hung after logging in and respond very slow .
in some cases it leads to performance degrading of the node to a degree that you cannot connect to a SVM e.g. serving data is affected
Constituents with vastly differing space usage between constituents can also be impacted in a similar way as all constituents being 80-90% or more full.
OCUM reports high latency for other operations.
Potential operation timeout events in Event log:

[xxx: kernel: Nblade.dBladeNoResponse.NFS:error]: File operation timed out because there was no response from the data-serving node. Node UUID: 9xxx, file operation protocol: NFS, client IP address: xx.xx.xx.xx, RPC procedure: 3.

Insufficient space errors for FlexGroup volumes in Event log:

wafl_exempt08: wafl.vol.fsp.full:error]: volume amperexxxxxx@vserver:xxxxxx: insufficient space in FSP wafl_remote_reserve to satisfy a request of 0 holes and 27 overwrites.
wafl_exempt08: wafl.vol.fsp.full:error]: volume xxxxxxx@vserver:xxxxxxxxxxx: insufficient space in FSP wafl_remote_reserve to satisfy a request of 2 holes and 27 overwrites

Example:

All the constituent member volumes are 96% full for the fg1 FlexGroup:

clstr::> volume show -vserver vs1 -volume-style-extended flexgroup

Vserver Volume Aggregate State Type Size Available Used% --------- ------------ ------------ ---------- ---- ---------- ---------- ----- vs1 fg1 - online RW 500GB 207.5GB 96%

clstr::> volume show -vserver vs1 -volume-style-extended flexgroup-constituent Vserver Volume Aggregate State Type Size Available Used% --------- ------------ ------------ ---------- ---- ---------- ---------- ----- vs1 fg1__0001 aggr3 online RW 31.25GB 12.97GB 96% vs1 fg1__0002 aggr1 online RW 31.25GB 12.98GB 96% vs1 fg1__0003 aggr1 online RW 31.25GB 13.00GB 96% vs1 fg1__0004 aggr3 online RW 31.25GB 12.88GB 96% vs1 fg1__0005 aggr1 online RW 31.25GB 13.00GB 96% vs1 fg1__0006 aggr3 online RW 31.25GB 12.97GB 96% vs1 fg1__0007 aggr1 online RW 31.25GB 13.01GB 96% vs1 fg1__0008 aggr1 online RW 31.25GB 13.01GB 96% vs1 fg1__0009 aggr3 online RW 31.25GB 12.88GB 96% vs1 fg1__0010 aggr1 online RW 31.25GB 30.01GB 96% vs1 fg1__0011 aggr3 online RW 31.25GB 12.97GB 96% vs1 fg1__0012 aggr1 online RW 31.25GB 13.01GB 96% vs1 fg1__0013 aggr3 online RW 31.25GB 12.95GB 96% vs1 fg1__0014 aggr3 online RW 31.25GB 12.97GB 96% vs1 fg1__0015 aggr3 online RW 31.25GB 12.88GB 96% vs1 fg1__0016 aggr1 online RW 31.25GB 13.01GB 96% 16 entries were displayed.

Run qos command to check the source of the high latency is from Data layer : QOS commands to monitor volume latency in real time

clstr::> qos statistics volume latency show -vserver vs1 -volume fg1
Workload            ID    Latency    Network    Cluster       Data       Disk    QoS Max    QoS Min      NVRAM
--------------- ------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
-total-              -   206.06ms     3.27ms     3.10ms    199.7ms        0ms        0ms        0ms        0ms
fg1-wid233..      23350  206.06ms     3.27ms     3.10ms    199.7ms        0ms        0ms        0ms        0ms
-total-              -   347.23ms     5.08ms     2.81ms   334.85ms        0ms        0ms        0ms     2.42ms
fg1-wid233..      23350  347.23ms     5.08ms     2.81ms   334.85ms        0ms        0ms        0ms     2.42ms
-total-              -   308.52ms     4.75ms     2.82ms   296.02ms        0ms        0ms        0ms     2.70ms