Skip to main content
NetApp Knowledge Base

Elastic Sizing with FlexGroup Volumes

Views:
5,480
Visibility:
Public
Votes:
5
Category:
ontap-9
Specialty:
CORE
Last Updated:

Applies to

  • ONTAP 9
  • FlexGroup

Answer

What is elastic sizing?

Note: For more detailed information on elastic sizing see TR-4571.

This article describes what elastic sizing is and how it works with FlexGroup volumes, as well as what impact it can have on workloads.

Elastic sizing (introduced in ONTAP 9.6) is a way for ONTAP to better manage space allocation in a FlexGroup volume, with the goal of preventing "out of space" errors in constituent volumes from causing write failures to files. Elastic sizing is different from autogrow/autoshrink in that the goal is not to increase the overall capacity footprint of the volume. Instead, elastic sizing keeps the total volume size intact and will leverage free space across constituent volumes inside the FlexGroup to borrow/trade space to accommodate writes.

Elastic sizing is not intended to eliminate the need to manage capacity in the FlexGroup - steps should still be taken to ensure there is adequate free space in the volume and across constituents and should not be considered a replacement for space management.

FlexGroup volumes will still perform ingest load balance as usual, so imbalances across the volume should be addressed without need for manual intervention.

How it works

Elastic sizing will kick in only if a file write has reached a point where ONTAP is about to issue an "out of space" error to the client performing the write. Before that message gets sent, ONTAP will make a last-ditch effort to find some available space in the FlexGroup by borrowing 1% of the total capacity of a constituent volume in the FlexGroup - between 10MB and a default maximum of 10GB.

 

FlexGroup Volumes

These values are adjustable via the node-level "flexgroup set' commands but should not be modified without guidance from engineering.

When a file write is about to fail, ONTAP scans the constituent volumes for available space and will reduce the size of one of those constituent volumes and add that amount of space to the constituent volume that is about to run out of space.

Impact of elastic sizing

This process is not free - performance will take a hit when this happens as the write pauses to find space. This will show up as latency spikes in the workload. The impact will vary based on the amount of space needed to complete the write, as well as the number of times ONTAP has to search for space. For example, since we only take 1% of space from a constituent volume at a time and the amount could be as low as 10MB, then a file that needs 1GB to finish a write will need to pause around ~100 times to give the proper amount of free space. Conversely, if 1% of the constituent volume is 10GB, then the write would only need to pause once to complete.

The following example shows a test where a file was copied to a FlexGroup. In the first test, the FlexGroup constituent wasn't large enough to hold the file, so elastic sizing was used. The 6.7GB file took around 2 minutes to copy:

[root@centos7 /]# time cp Windows.iso /elastic/
real    1m52.950s
user    0m0.028s
sys     1m8.652s

When the FlexGroup constituent volume was large enough, the same copy took 15 seconds less:

[root@centos7 /]# time cp Windows.iso /elastic/
real   1m37.233s
user   0m0.052s
sys   0m54.443s


That shows there can be a real latency impact when elastic sizing enacts.

The following graphs illustrate the latency hit on the constituent volume:

FlexGroup Volumes

FlexGroup Volumes

The constituent volume 0001 as about .5ms more latency when elastic sizing is being enacted:
 

FlexGroup Volumes

Remediating issues with elastic sizing

If elastic sizing is kicking in on a workload, that means the volume is no longer sized appropriately and remediate actions should be taken, regardless of if there is performance impact. Elastic sizing is simply an insurance policy against failed writes/corrupted files and should not be viewed as a panacea.

To prevent elastic sizing from kicking in:

  • Review all constituent volume sizes in the environment for evenness of balance across constituent volumes. If one constituent has much more used space than others, then further investigation into why this is the case should be done (for example, was a bunch of data deleted? Did a file grow very large? Were a lot of files zipped into a single tar file?).
  • If possible, delete some data to free up space, as well as snapshots that lock that data in place.
  • Increase the size of the entire FlexGroup volume to increase the available free space.
  • Look into the benefits of FabricPool tiering if using AFF; look into inactive data reporting to give a ballpark estimate of space savings across active file system and snapshots.
  • Ensure all available storage efficiencies are enabled.

The goal is to free up enough space in each constituent volume so that a single file won't cause elastic sizing to borrow space from other constituent volumes. That threshold of free space will vary depending on average file size and largest file size in a workload, as well as if files in the volume grow over time.

Recommend setting alerts/warnings for "volume nearly full" and/or leveraging quotas to help monitor free space. TR-4571 covers how to use space monitoring with FlexGroup volumes.

Generally, it's best to notify when a constituent volume hits no more than 80% used to allow adequate time to plan and rectify capacity issues.

Other considerations

Normalization

After the FlexGroup capacity has adjusted (as in, files have been deleted or size has been added to the flexgroup), the constituent volume capacity will normalize and return back to their regular capacity levels on their own.

Autogrow

Volume autogrow can be used in conjunction with elastic sizing, but they operate independently. This means that if volume autosize is enabled on the volume, then elastic sizing will no longer be in use.

::*> vol show -vserver DEMO -volume elastic* -fields size,used,autosize-mode   
vserver volume             size     used     autosize-mode
------- -------------     -------   -------  -------------
DEMO    elastic           15.73GB    6.78GB  grow
DEMO    elastic__0001      2.50GB   68.34MB  grow
DEMO    elastic__0002      2.50GB   67.44MB  grow
DEMO    elastic__0003      8.23GB    6.58GB  grow
DEMO    elastic__0004      2.50GB   69.25MB  grow
5 entries were displayed.

::*> event log show -message-name *auto*
Time                Node             Severity      Event
------------------- ---------------- ------------- ---------------------------
11/12/2019 16:55:59 ontap9-8040-01   NOTICE        wafl.vol.autoSize.done: Volume autosize: Automatic grow of volume \'elastic__0003@vserver:7e3cc08e-d9b3-11e6-85e2-00a0986b1210\' by 118MB is complete.

 

Additional Information

additionalInformation_text

 

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.