Skip to main content
NetApp Knowledgebase

Elastic Sizing with FlexGroup Volumes

Views:
163
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
core
Last Updated:

Applies to

  • ONTAP 9
  • FlexGroup

Answer

What is elastic sizing?

This article describes what elastic sizing is and how it works with FlexGroup volumes, as well as what impact it can have on workloads.

Elastic sizing (introduced in ONTAP 9.6) is a way for ONTAP to better manage space allocation in a FlexGroup volume, with the goal of preventing "out of space" errors in constituent volumes from causing write failures to files. Elastic sizing is different from autogrow/autoshrink in that the goal is not to increase the overall capacity footprint of the volume. Instead, elastic sizing keeps the total volume size intact and will leverage free space across constituent volumes inside the FlexGroup to borrow/trade space to accommodate writes.

Elastic sizing is not intended to eliminate the need to manage capacity in the FlexGroup - steps should still be taken to ensure there is adequate free space in the volume and across constituents and should not be considered a replacement for space management.

FlexGroup volumes will still perform ingest load balance as usual, so imbalances across the volume should be addressed without need for manual intervention.

For more detailed information on what elastic sizing is, or to provide a public reference to customers curious about this functionality, see TR-4571.
 

Enabling/Disabling

Elastic sizing is enabled by default and there is no cluster shell command available to disable it. However, it can be disabled in the node-level flexgroup options using "node run nodename flexgroup set elastic-sizing=false".

This is not recommended unless engineering has given guidance to do so.

How it works

Elastic sizing will kick in only if a file write has reached a point where ONTAP is about to issue an "out of space" error to the client performing the write. Before that message gets sent, ONTAP will make a last-ditch effort to find some available space in the FlexGroup by borrowing 1% of the total capacity of a constituent volume in the FlexGroup - between 10MB and a default maximum of 10GB.

 

1098127_1.png

These values are adjustable via the node-level "flexgroup set' commands but should not be modified without guidance from engineering.

When a file write is about to fail, ONTAP scans the constituent volumes for available space and will reduce the size of one of those constituent volumes and add that amount of space to the constituent volume that is about to run out of space.

Impact of elastic sizing

This process is not free - performance will take a hit when this happens as the write pauses to find space. This will show up as latency spikes in the workload. The impact will vary based on the amount of space needed to complete the write, as well as the number of times ONTAP has to search for space. For example, since we only take 1% of space from a constituent volume at a time and the amount could be as low as 10MB, then a file that needs 1GB to finish a write will need to pause around ~100 times to give the proper amount of free space. Conversely, if 1% of the constituent volume is 10GB, then the write would only need to pause once to complete.

The following example shows a test where a file was copied to a FlexGroup. In the first test, the FlexGroup constituent wasn't large enough to hold the file, so elastic sizing was used. The 6.7GB file took around 2 minutes to copy:

  [root@centos7 /]# time cp Windows.iso /elastic/
real    1m52.950s
user    0m0.028s
sys     1m8.652s

When the FlexGroup constituent volume was large enough, the same copy took 15 seconds less:

 [root@centos7 /]# time cp Windows.iso /elastic/
real   1m37.233s
user   0m0.052s
sys   0m54.443s


That shows there can be a real latency impact when elastic sizing enacts.

The following graphs illustrate the latency hit on the constituent volume:

1098127_2.png

1098127_3.png

The constituent volume 0001 as about .5ms more latency when elastic sizing is being enacted:
 

1098127_4.png

Elastic sizing logs

There are no EMS events that trigger for elastic sizing specifically. Instead, we'd need to rely on regular space/volume nearly full EMS messages and df/volume size output to clue us in to the issue being related to available free space.

Additionally, the performance hit elastic sizing would be isolated to write latency, rather than read latency.

To verify if elastic sizing is contributing to performance problems, a perfstat/wafl_susp -w capture could confirm. The WAFL messages you'd look for would be:

FLEXGROUP_ELASTIC_RESIZE

SLOWPATH_FLEXGROUP_ELASTIC_RESIZE

For example:

  Suspend Because:                      Count\ AvgSusp MaxSusp
FLEXGROUP_ELASTIC_RESIZE
  WAFL_SPINNP_WRITE

In the above example, the FlexGroup was 10GB in size, with 4 constituent volumes of 2.5GB each (not recommended, but used to illustrate the point):

 cluster::*> vol show -vserver DEMO -volume elastic* -fields size,used
vserver  volume  size      used
------- ------- ------- -------
DEMO    elastic 10.01GB  263.5MB
DEMO    elastic__0001
                 1GB     68.20MB
DEMO    elastic__0002
                 6.98GB  67.38MB
DEMO    elastic__0003
                 1.02GB  58.50MB
DEMO    elastic__0004
                 1.00GB  69.39MB

Since the constituents were so small, we'd see elastic sizing only kick in for an increment of 25MB per request. (1% of 2.5GB is 25MB)

Since the file is 6.7GB and the constituent volume is 2.5GB, ONTAP would need to borrow ~4.2GB from other constituent volumes. 4.2GB is around 4300MB, which means ONTAP will borrow space 4300/25 times, or, roughly 172 times.

We can see evidence of that in the wafl_susp output:

   WAFL Message Slow-path Stats                 Count
SLOWPATH_FLEXGROUP_ELASTIC_RESIZE
      WAFL_REMOTE_RETRIEVE                      208
      WAFL_REMOTE_WRITEBACK                     104
     WAFL_FLEXGROUP_RESIZE_MEMBER               104

Remediating issues with elastic sizing
If elastic sizing is kicking in on a workload, that means the volume is no longer sized appropriately and remediate actions should be taken, regardless of if there is performance impact. Elastic sizing is simply an insurance policy against failed writes/corrupted files and should not be viewed as a panacea.

To prevent elastic sizing from kicking in:

  • Review all constituent volume sizes in the environment for evenness of balance across constituent volumes. If one constituent has much more used space than others, then further investigation into why this is the case should be done (for example, was a bunch of data deleted? Did a file grow very large? Were a lot of files zipped into a single tar file?).
  • If possible, delete some data to free up space, as well as snapshots that lock that data in place.
  • Increase the size of the entire FlexGroup volume to increase the available free space.
  • Look into the benefits of FabricPool tiering if using AFF; look into inactive data reporting to give a ballpark estimate of space savings across active file system and snapshots.
  • Ensure all available storage efficiencies are enabled.

The goal is to free up enough space in each constituent volume so that a single file won't cause elastic sizing to borrow space from other constituent volumes. That threshold of free space will vary depending on average file size and largest file size in a workload, as well as if files in the volume grow over time.

Recommend setting alerts/warnings for "volume nearly full" and/or leveraging quotas to help monitor free space. TR-4571 covers how to use space monitoring with FlexGroup volumes.

Generally, it's best to notify when a constituent volume hits no more than 80% used to allow adequate time to plan and rectify capacity issues.

Other considerations

Normalization

After the FlexGroup capacity has adjusted (as in, files have been deleted or size has been added to the flexgroup), the constituent volume capacity will normalize and return back to their regular capacity levels on their own.

Autogrow

Volume autogrow can be used in conjunction with elastic sizing, but they operate independently. This means that if volume autosize is enabled on the volume, then elastic sizing will no longer be in use.

 cluster::*> vol show -vserver DEMO -volume elastic* -fields
size,used,autosize-mode
vserver   volume   size    used   autosize-mode
------- ------- ------- ------  -------------
DEMO     elastic  15.73GB  6.78GB grow
DEMO     elastic__0001
                  2.50GB     68.34MB
                                  grow
DEMO     elastic__0002
                 2.50GB   67.44MB
                                  grow
DEMO     elastic__0003
                  8.23GB   6.58GB grow
DEMO     elastic__0004
                  2.50GB  69.25MB
                                 grow
5 entries were displayed.

cluster::*> event log show -message-name *auto*
Time                Node             Severity      Event
------------------- ---------------- ------------- ---------------------------
11/12/2019 16:55:59 ontap9-tme-8040-01

                                                                        NOTICE

wafl.vol.autoSize.done: Volume autosize: Automatic grow of volume \'elastic__0003@vserver:7e3cc08e-d9b3-11e6-85e2-00a0986b1210\' by 118MB is complete.
11/12/2019 16:55:47 ontap9-tme-8040-01

                                                                        NOTICE

wafl.vol.autoSize.done: Volume autosize: Automatic grow of volume \'elastic__0003@vserver:7e3cc08e-d9b3-11e6-85e2-00a0986b1210\' by 935MB is complete.
11/12/2019 16:55:40 ontap9-tme-8040-01

                                                                        NOTICE

wafl.vol.autoSize.done: Volume autosize: Automatic grow of volume \'elastic__0003@vserver:7e3cc08e-d9b3-11e6-85e2-00a0986b1210\' by 169MB is complete.
11/12/2019 16:55:39 ontap9-tme-8040-01

                                                                        NOTICE

wafl.vol.autoSize.done: Volume autosize: Automatic grow of volume \'elastic__0003@vserver:7e3cc08e-d9b3-11e6-85e2-00a0986b1210\' by 1.11GB is complete.
11/12/2019 16:55:31 ontap9-tme-8040-01

                                                                        NOTICE

wafl.vol.autoSize.done: Volume autosize: Automatic grow of volume \'elastic__0003@vserver:7e3cc08e-d9b3-11e6-85e2-00a0986b1210\' by 606MB is complete.
11/12/2019 16:55:26 ontap9-tme-8040-01

                                                                        NOTICE

wafl.vol.autoSize.done: Volume autosize: Automatic grow of volume \'elastic__0003@vserver:7e3cc08e-d9b3-11e6-85e2-00a0986b1210\' by 833MB is complete.
11/12/2019 16:55:20 ontap9-tme-8040-01

                                                                        NOTICE

wafl.vol.autoSize.done: Volume autosize: Automatic grow of volume \'elastic__0003@vserver:7e3cc08e-d9b3-11e6-85e2-00a0986b1210\' by 720MB is complete.
11/12/2019 16:55:14 ontap9-tme-8040-01

                                                                        NOTICE

wafl.vol.autoSize.done: Volume autosize: Automatic grow of volume \'elastic__0003@vserver:7e3cc08e-d9b3-11e6-85e2-00a0986b1210\' by 102MB is complete.
11/12/2019 16:55:13 ontap9-tme-8040-01

                                                                        NOTICE

wafl.vol.autoSize.done: Volume autosize: Automatic grow of volume \'elastic__0003@vserver:7e3cc08e-d9b3-11e6-85e2-00a0986b1210\' by 602MB is complete.
11/12/2019 16:55:09 ontap9-tme-8040-01

                                                                        NOTICE

wafl.vol.autoSize.done: Volume autosize: Automatic grow of volume \'elastic__0003@vserver:7e3cc08e-d9b3-11e6-85e2-00a0986b1210\' by 156MB is complete.
11/12/2019 16:55:08 ontap9-tme-8040-01

                                                                        NOTICE

wafl.vol.autoSize.done: Volume autosize: Automatic grow of volume \'elastic__0003@vserver:7e3cc08e-d9b3-11e6-85e2-00a0986b1210\' by 482MB is complete.
11/12/2019 12:47:28 ontap9-tme-8040-01

** Note: When AutoSize is enabled, the Elastic Sizing will be disabled. See burt 1243448 for more details.

Additional Information

additionalInformation_text