What are the best practices for adding disks to an existing aggregate?

Last updated

Jan 28, 2025
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 13,921

Visibility:: Public

Votes:: 15

Category:: ontap-9

Specialty:: perf

Last Updated:: 1/28/2025, 9:16:00 AM

Applies to

ONTAP 9
FAS systems

Answer

Warning:

This article applies to hard drive disk (HDD) aggregates, but reallocate must not be done to a SSD aggregate, FabricPool, or in Cloud Volumes ONTAP (CVO) aggregates.
For CVO please create a new aggregate, volume move the existing volumes and destroy the existing aggregate once vacated.

High latency due to disk utilization on aggregates even after performing disk firmware update and changing efficiency best effort to background requires more disks added to the aggregate.
For best performance, it is advisable to add a new RAID group of equal size to existing RAID groups.
- If a new RAID group cannot be added, then at a minimum, three or more disks should be added at the same time to an existing RAID group.
- This allows the storage system to write new data across multiple disks.
A forced reallocate must be done to evenly distribute data across the RAID-group(s), otherwise most new writes will go to the new disk resulting in an imbalance of workload.
- If a reallocate is not done, performance will be worse and statit will look like below.
- Eventually, WAFL will fix itself, but the resource of background process is controlled by ONTAP which depends on the external workload, so it may take many months to complete.

::> set advanced
::*> node run -node node_1 statit -b
/* wait 60s */
::*> node run -node node_1 statit -e
...
disk             ut%  xfers  ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs
/aggr_data/plex0/rg0:
0a.10.6           32  84.50    0.16   3.65  5014  40.70  58.65   357  43.63  55.17   217   0.00   ....     .   0.00   ....     .
0a.10.8           32  83.93    0.17   3.55  4777  40.51  58.94   356  43.25  55.71   216   0.00   ....     .   0.00   ....     .
0a.10.10          51 111.80   29.66  10.65  1862  26.92  29.12   772  55.22  14.13   677   0.00   ....     .   0.00   ....     .
0a.10.12          52 112.22   30.35  10.71  1825  26.91  29.93   735  54.96  14.16   689   0.00   ....     .   0.00   ....     .
0a.10.14          53 112.81   30.63  10.34  1956  27.08  29.59   777  55.10  14.31   697   0.00   ....     .   0.00   ....     .
0a.10.16          54 114.66   31.85  10.76  1902  27.46  30.05   783  55.34  14.45   680   0.00   ....     .   0.00   ....     .
0a.10.18          53 114.26   30.45  11.23  1781  27.84  30.42   784  55.97  14.68   675   0.00   ....     .   0.00   ....     .
0a.10.20          52 113.79   29.10   8.11  2510  27.69  30.14   744  56.99  14.33   673   0.00   ....     .   0.00   ....     .
0a.10.24          53 116.80   29.56   8.08  2443  28.82  30.73   754  58.41  14.49   657   0.00   ....     .   0.00   ....     .
0a.10.26          54 117.57   31.09   8.67  2353  28.63  30.12   752  57.85  14.49   661   0.00   ....     .   0.00   ....     .
0a.10.28          55 118.71   30.31   9.07  2323  29.45  30.87   752  58.95  14.71   661   0.00   ....     .   0.00   ....     .
0a.10.30          50 106.95   28.86   8.86  2197  24.60  29.18   704  53.49  14.21   668   0.00   ....     .   0.00   ....     .
0a.10.36          78 154.61   48.59  11.54  2426  45.44  39.71   863  50.57  20.24   479   0.00   ....     .   0.00   ....     .
0a.10.38          75 158.05   61.35   8.91  2969  39.69  29.13   914  47.01  15.24   666   0.00   ....     .   0.00   ....     .
0a.10.40          75 156.63   60.31   9.21  2918  39.65  29.75   903  46.67  15.51   680   0.00   ....     .   0.00   ....     .
0a.10.42          75 158.28   60.53   9.48  2803  40.21  29.83   896  47.54  15.47   666   0.00   ....     .   0.00   ....     .
0a.10.44          76 159.14   67.07   7.15  3959  38.21  39.97   682  43.86  19.47   572   0.00   ....     .   0.00   ....     .

How should a reallocate be done?

This must be considered with your account team.

FlexVols

Forced reallocation ignores the optimization thresholds and completely rewrites the data to disk, unlike the normal reallocation process.
Although this improves the layout, routine use of [-force|-f [true]] reallocate is not a best practice, due to excessive load on the aggregate
Also, because all of the data is optimized, forced reallocation cannot be run against volumes that have existing Snapshot copies unless the physical reallocation method ([ -space-optimized|-p [true] ]) is also used.
- cluster::> reallocate start -vserver svm0 -path /vol/vol1 -f true -p true
  Copied
- One job at a time may be run, and if performance overhead exists, a second may be added.

FlexGroups

Reallocate can be done on an aggregate level, but oftentimes it is costly on disk cycles and takes several days or weeks.
Volume reallocates are not possible so an aggregate reallocate is needed:
- cluster::> storage aggregate reallocation start -once true -aggregate <aggr_name>
  Copied

Volume Moves

If space is available, volume move will restructure the data evenly across free RAID stripes.
Ideally, if an aggregate can be vacated and then rehydrated it will be the best layout.
One small thing to consider is the never ending deswizzler scan if this method is done.

Important points to consider

It is best to look at Active IQ Unified Manager under Aggregates, then performance, and Nodes under performance, to determine the quietest times such as after 5 PM or on weekends.
Reallocate will cause additional overhead so this must be accounted for.
- It's estimated to cause between 10-30% of a performance overhead per job, but this is an estimate and could easily take more or less.
- In the case of high disk utilization, a more measured approach is to do the busiest volume first, working to the quietest in the aggregate.

Applies to

Answer

How should a reallocate be done?

FlexVols

FlexGroups

Volume Moves

Important points to consider

Additional Information