How does ONTAP select spares for aggregate creation, aggregate addition and failed disk replacement?

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 5,225

Visibility:: Public

Votes:: 3

Category:: clustered-data-ontap-8

Specialty:: ontapselect

Last Updated:

Applies to

ONTAP 9

Answer

► Disk attributes used during spare selection

Data ONTAP uses the following disk attributes during spare selection for the creation of a new aggregate, addition of disks to an existing aggregate, and replacement of failed disks in an aggregate:

Disk type
RPM
Checksum type
Disk size
Pool
Pre-zeroed status
Position of disk in the storage system

Disk type

Data ONTAP associates a disk type with every disk in the system, based on the disk technology and connectivity type. The disk types used by Data ONTAP are:

BSAS - High capacity bridged SATA disks with additional hardware to enable them to be plugged into a SAS shelf
SAS - Serial Attached SCSI disks in matching shelves
FSAS - High-capacity (Fat) Serial Attached SCSI disks
SATA - Serial ATA disks in SAS shelves
MSATA - SATA disks in DS4486 multi-carrier disk shelves
SSD - Solid State disks
ATA - ATA disks with either IDE or serial ATA interface in shelves connected in FC-AL (Fibre Channel Arbitrated Loop)
FCAL - FC disks in shelves connected in FC-AL
LUN - A logical storage device backed by third-party storage and used by Data ONTAP as a disk

Disk type mixing options

Data ONTAP provides a configuration option, raid.disktype.enable, that determines whether or not mixing certain disk types in the same aggregate is allowed. If this option is set to true, separation of disks by disk type is strictly enforced, and only disks of a single disk type are allowed to be part of an aggregate. If the option is set to false, Data ONTAP forms the following groups of disks, and considers all disks in a group equal during spare selection:

Group disk type SAS - This group includes high performance, enterprise class disk types - FCAL and SAS.
Group disk type SATA - This group includes high capacity, near-line disk types - BSAS, FSAS, SATA and ATA. The MSATA disk type is not included in this group, and cannot be mixed together with any other disk type.

With the raid.disktype.enable option set to false, specifying a disk type with the '-T' option will result in the equivalent group disk type being used for spare selection, and the final set of selected spare disks may include disks from all the disk types included in the group disk type. For example, specifying '-T BSAS' in the aggregate creation or addition command will result in the group disk type SATA being used, and all BSAS, SATA and ATA disks will be considered equally during spare selection. The final set of selected spares may have a mix of BSAS, SATA and ATA disks, all of which will be added into the same aggregate. Thus, with the raid.disktype.enable option set to false, it is not possible to enforce the selection of disks of strictly one disk type, if the desired disk type is part of either of the two groups listed above. The only way to enforce selection of disks of a single disk type is to set raid.disktype.enable to true. The default value of the option is false.

If the raid.disktype.enable option is changed from false to true on a system that has existing aggregates with a mix of disk types, those aggregates will continue to accept new disks belonging to all the disk types already present in the aggregate. However, Data ONTAP will not allow new aggregates to be created with a mix of disk types, for as long as the raid.disktype.enable option is set to true.

RPM mixing options

The following two configuration options determine whether or not the mixing of disks with different RPMs in a single aggregate is allowed:

The option raid.rpm.ata.enable controls the mixing of ATA disks (disks of type ATA, SATA, BSAS and MSATA) of different RPMs in the same aggregate. If the option is set to true, ATA disks with different RPM values are considered different, and Data ONTAP only selects disks with the same RPM value to be part of an aggregate. If the option is set to false, ATA disks with different RPMs are considered equal and Data ONTAP may select disks with different RPMs to be part of the same aggregate.
The option raid.rpm.fcal.enable controls the mixing of SAS and FCAL disks with different RPMs in the same aggregate. If the option is set to true, FCAL and SAS disks with different RPMs are considered different, and Data ONTAP only selects disks with the same RPM value to be part of an aggregate. If the option is set to false, FCAL and SAS disks with different RPMs are considered equal and Data ONTAP may select disks with different RPMs to be part of the same aggregate.

The default value of raid.rpm.fcal.enable is true, which means that mixing of FCAL and SAS disks of different speeds in the same aggregate is not allowed by default. This is because 15K RPM drives are more expensive than 10K RPM drives, and using 15K RPM drives exclusively in an aggregate guarantees better performance. The default value of raid.rpm.ata.enable, however, is false, which means that mixing of ATA disks of different speeds in the same aggregate is allowed by default. This allows systems that have aggregates with 5.4K RPM ATA disks nearing end-of-life (EOL) to transition easily to 7.2K RPM disks.

As in the case of the disk type mixing options, there is no way to ensure the selection of disks with a certain RPM value during aggregate creation or disk addition if the above two options are set to false. If the system has a mix of disks with different RPMs, a desired RPM value specified with the '-R' option during aggregate creation may be ignored, if the corresponding configuration option is set to false. For example, if the user specifies '-T ATA -R 5400' in the aggregate creation command, to ensure the selection of 5.4K RPM ATA disks on a system with 5.4K RPM and 7.2K RPM ATA disks, Data ONTAP could end up selecting the 7.2K RPM ATA disks instead, if the option raid.rpm.ata.enable is set to false. This is because the two sets of disks are considered equivalent with respect to RPM, and the final selection is made based on one of the other disk attributes like disk size, checksum type, etc., which could result in the 7.2K RPM disks being given preference. To enforce the selection of disks of a specific RPM value, the configuration option for that disk type must be set to true.

Starting with Data ONTAP 8.2, the raid.rpm.ata.enable and raid.rpm.fcal.enable options are deprecated, and have been replaced by two new options that behave in exactly the same way, but are named differently to better indicate their functionality:

raid.mix.hdd.rpm.capacity – This option replaces raid.rpm.ata.enable and controls the mixing of capacity-based hard disk types (BSAS, FSAS, SATA, ATA and MSATA). The default value is true, which means that mixing is allowed.
raid.mix.hdd.rpm.performance – This option replaces raid.rpm.fcal.enable and controls the mixing of performance-based hard disk types (FCAL and SAS). The default value is false, which means that mixing is not allowed.

Note that the behaviour of the two new options is exactly the opposite of the behaviour of the old options. For the new options, a value of true means that disks with different RPMs are allowed to be part of the same aggregate, while a value of false means that they are not. In the case of raid.rpm.ata.enable and raid.rpm.fcal.enable, it is the opposite - a value of true means that disks are strictly separated by RPM and mixing of RPMs in the same aggregate is not allowed, while a value of false means that mixing is allowed.

The rest of this article uses the term 'RPM mixing options' to refer to the configuration options described above that determine whether or not the mixing of disks with different RPMs in the same aggregate is allowed. In Data ONTAP 8.1 and earlier releases, this term refers to the options raid.rpm.ata.enable and raid.rpm.fcal.enable. In Data ONTAP 8.2 and later releases, this term refers to the options raid.mix.hdd.rpm.capacity and raid.mix.hdd.rpm.performance.

Checksum

The checksum type of a disk is another attribute used by Data ONTAP during spare selection. Data ONTAP supports the following checksum types:

Block checksum (BCS): This checksum scheme uses 64 bytes to store checksum information for every 4096 bytes (4KB) of data. This scheme can be used on disks formatted with 520 bytes per sector ('bps') or 512 bytes per sector. On 520 bps disks, sets of 8 sectors are used to store 4KB of data and 64 bytes of checksum information. This scheme makes the best use of the available disk capacity. On disks formatted with 512 bps, Data ONTAP uses a scheme called 8/9 formatting to implement BCS. The scheme uses sets of 9 sectors - 8 512-byte sectors to store 4KB of data, with the 9th sector used to store 64 bytes of checksum information for the preceding 8 sectors. This scheme leaves about 10% of the available disk capacity unused, because only 64 bytes of every 9th sector is used for storing the checksum, with the remaining 448 bytes not used. Block checksums can also be used on disks formatted with 4160 bytes per sector.
Zone checksum (ZCS): In this checksum scheme, 63 blocks of 4KB each are followed by a single 4KB block of checksum information for the preceding 63 blocks. This scheme makes good use of the available disk capacity, but has a performance penalty because data and checksums are not co-located and an extra seek may be required to read the checksum information. Because of this performance penalty, the ZCS scheme is not widely used on disks any longer. However, it is still used on some older systems, and with LUNs.
Advanced Zone checksum (AZCS): This checksum scheme was introduced in Data ONTAP 8.1.1, specifically for disks requiring optimal storage efficiency and for disks formatted with 4 Kilobytes per sector. A new scheme is required for 4K bps disks because a scheme similar to the 8/9 BCS scheme on these disks would result in wastage of almost 50% of the disk capacity, and the performance penalty of the ZCS scheme would be too high. In the AZCS scheme, a disk is divided into zones with 64 4KB blocks in each zone. The middle block in each zone is designated the checksum block, and stores checksum information for all the other blocks in the zone. Placing the checksum block in the middle of a zone reduces the average seek distance between a data block and the checksum block, and results in better performance when compared to the ZCS scheme. The AZCS scheme can also be used on disks formatted with 512 bytes per sector.

The following list shows the current checksum types supported by the various Data ONTAP disk types. Note that this list is subject to change. To get up-to-date information for a specific Data ONTAP release, check the product documentation on the Support site.

SAS, FCAL - BCS
ATA, SATA, BSAS, FSAS - BCS
MSATA - AZCS
SSD - BCS

Disks of type LUN can be used in BCS, ZCS and AZCS aggregates.

The 'disk assign -c' command in 7-Mode and 'storage disk assign -checksum' command in C-Mode can be used to assign a specified checksum type to a disk or LUN. The command accepts two checksum values - 'block' and 'zoned'. Disks and LUNs that are assigned the 'block' checksum type can be added to BCS aggregates, and those that are assigned the 'zoned' checksum type can be added to AZCS aggregates as well as older ZCS aggregates.

Mixed checksum aggregates

Each aggregate in a Data ONTAP system is assigned a checksum type, based on the checksum type of the disks in the aggregate. Aggregates with BCS checksum disks have a checksum type of 'block', aggregates with AZCS checksum disks have a checksum type of 'azcs', and aggregates with zoned checksum LUNs have a checksum type of 'zoned'. Data ONTAP also allows aggregates with checksum type 'mixed' - these aggregates have both AZCS and BCS checksum disks, but in separate RAID groups. Such aggregates are called 'mixed checksum aggregates'. A mixed checksum aggregate is created when BCS disks are added to an AZCS aggregate, or when AZCS disks are added to a block checksum aggregate. A new RAID group is formed with the newly added disks, and the aggregate's checksum type is set to 'mixed'.

Disk size

Data ONTAP also uses disk size as a spare selection criterion. The user can specify a desired disk size value in the aggregate creation or disk addition command (using the '@size' option). In the case of failed disk replacement, the desired size value is the size of the failed disk needing replacement.

Given a desired value of disk size, Data ONTAP uses a spread factor of 20% to identify suitable spare disks. For every spare disk being considered, Data ONTAP computes two sizes - a 'minimum' size, which is 80% of the spare disk's size, and a 'maximum' size, which is 120% of the spare disk's size. It then checks to see if the desired size value falls in the range defined by the spare disk's minimum and maximum sizes. If it does, the spare disk is considered suitable for selection, with respect to disk size.

The disk size value used by Data ONTAP for all these calculations is the right-sized value of a disk's physical capacity, also referred to as the disk's 'usable capacity'. Right-sizing is a process that Data ONTAP uses to standardize the number of usable sectors on a disk, so that disks of similar sizes from different manufacturers can be used interchangeably in a Data ONTAP system. Right-sizing also takes into account the amount of space on the disk needed by Data ONTAP for its own use. The usable capacity of a disk is smaller than the physical capacity, and can be viewed using the 'sysconfig -r' command on 7-Mode (column 'Used MB/blks') and the 'storage disk show -fields usable-size' command in C-Mode. The Storage Management Guide contains a table listing the physical capacities and usable capacities for the different disks supported by Data ONTAP.

Another point to be noted is that Data ONTAP calculates and reports disk size values using binary prexes, while disk manufacturers report disk sizes using SI prexes. Because of the use of different units, the disk sizes reported by Data ONTAP are smaller than the disk sizes advertised by the manufacturers.

The size policy followed by Data ONTAP, in combination with the right-sizing of disks and the difference in disk size reporting units could result in unexpected spare selection behavior. For example, on a system with 2 TB SATA disks, specifying a desired size value of 2 TB in the aggregate creation or addition command does not result in the selection of the 2 TB disks present in the system. This is because 2 TB disks actually have a usable capacity of 1.62 TB, after right-sizing and using binary prexes to calculate disk size. Using the Data ONTAP size selection policy, the 20% spread calculated on a spare disk of size 1.62 TB gives a range of {1.29 TB, 1.94 TB}, which does not include the specified disk size of 2 TB. Thus, Data ONTAP does not select any of the 2 TB spare disks, even though the system has 2 TB disks and the user has specifically asked for them. The same behavior is seen with disks of size 1 TB and 3 TB.

To ensure that Data ONTAP picks a specific spare disk given an input size, the user should specify a size value such that the 80%-120% calculation performed on the desired spare disk's usable capacity results in a range that includes the specified size value. For example, to ensure the selection of 2 TB disks present in a system, the user should check the usable capacity of a 2 TB disk using the command 'sysconfig -r' and then specify a size value that lies in the 80%-120% range of that value.

Usable capacity of a 2 TB disk, from 'sysconfig -r':

Used (MB/blks) -------------- 1695466/3472314368

So, any size value in the range {80% of 1695466 MB, 120% of 1695466 MB} will result in the selection of the 2 TB spare disks. For example: '@1695466M' or '@1695G' or '@1700G'.

Pool

A pool is an abstraction used by Data ONTAP to segregate disks into groups, according to user specified assignments. All spare disks in a Data ONTAP system are assigned to one of two spare pools - Pool0 or Pool1. The general guidelines for assigning disks to pools are:

Disks in the same shelf or storage array should be assigned to the same pool
There should be an equal or close to equal number of disks assigned to each pool

By default, all spare disks are assigned to Pool0 when a Data ONTAP system is started up. If the system is not configured to use SyncMirror, having all disks in a single pool is sufficient for the creation of aggregates. If SyncMirror is enabled on the system, Data ONTAP requires the segregation of disks into two pools for the creation of SyncMirror aggregates. A SyncMirror aggregate contains two copies of the same WAFL filesystem, which are kept in sync with other. Each copy is called a 'plex'. In order to provide the best protection against data loss, the disks comprising one plex of a SyncMirror aggregate need to be physically separated from the disks comprising the other plex. During the creation of a SyncMirror aggregate, Data ONTAP selects an equal number of spare disks from each pool, and creates one plex of the aggregate with the disks selected from Pool0, and the other plex with the disks selected from Pool1. If the assignment of disks to pools has been done according to the guidelines listed above, this method of selecting disks ensures that the loss of a single disk shelf or storage array affects only one plex of the aggregate, and that normal data access can continue from the other plex while the affected plex is being restored.

The command 'disk assign -p <pool_number>' can be used to assigns disks to a pool, in both 7-Mode and C-Mode. If SyncMirror is enabled on the system, a system administrator will have to assign disks to Pool1 using this command, before any SyncMirror aggregates can be created.

Pre-zeroed status

Data ONTAP requires all spare disks that were previously part of an aggregate to be zeroed before they can be added to a new aggregate. Disk zeroing ensures that the creation of a new aggregate does not require a parity computation, and that addition of disks to an existing aggregate does not require a re-computation of parity across all RAID groups to which the new disks have been added. Non-zeroed spare disks that are selected for aggregate creation or addition have to be zeroed first, lengthening the overall duration of the aggregate creation or addition process. Replacement of a failed disk does not require completely zeroed spares, since reconstruction of data on the replacement disk overwrites the existing data on some of the disk blocks. The blocks that are not overwritten during reconstruction, however, have to be zeroed before the disk can be used by the aggregate.

Data ONTAP gives preference to pre-zeroed disks during spare selection for aggregate creation and addition, as well as failed disk replacement. However, despite the benefits of having pre-zeroed spare disks available in a system, Data ONTAP does not automatically zero disks as soon as they are removed from aggregates. This is to minimize the possibility of irrecoverable data loss in the event of a scenario where data on a disk is required even after the disk has been removed from the aggregate. Disk zeroing can only be started by the system administrator, using the command 'disk zero spares' in 7-Mode and 'storage disk zerospares' in C-Mode. This command starts the zeroing process in the background on the spare disks present in the system at that time.

Topology-based optimization of selected spares

Data ONTAP performs an optimization based on the topology of the storage system, on the set of spare disks that have been selected for aggregate creation or addition or failed disk replacement. First, it constructs a topology layout with the selected spare disks ordered by channel, shelf and slot. Then, it considers all the points of failure in the storage system (adapters, switches, bridges, shelves,), and estimates the 'load' on each, by counting the number of existing filesystem disks associated with each point of failure. When allocating spares, Data ONTAP attempts to distribute disks evenly across the different points of failure. It also attempts to minimize the points of failure that the selected disks have in common with the other disks in the target RAID group. Finally, it allocates the required number of spares, alternating the selected disks between all significant points of failure.

► Spare selection for new aggregate creation

Data ONTAP uses the following disk attributes for spare selection - disk type, checksum type, RPM and disk size. The user may specify desired values for some of these attributes in the aggregate creation command. For the attributes not specified by the user, Data ONTAP determines the values that will provide the best selection of spares.

First, Data ONTAP decides the disk type and checksum type of the disks to be selected. If the user has not specified a desired disk type, it finds the disk type with the most number of spare disks. If the user has specified a desired checksum type, it only counts the disks with that checksum type. If not, it looks through the disks in the following order of checksum type:

Advanced zone checksum disks
Block checksum disks
Zoned checksum disks

For each checksum type, Data ONTAP determines the disk type that has the most number of disks. If this number is insufficient for the creation of the new aggregate, it considers the disks with the next checksum type, and so on. If no checksum type has a sufficient number of disks, the aggregate create operation fails. Additional user-specified attributes are also considered in this step. For example, if the user has specified a desired checksum type and a desired RPM value, Data ONTAP determines the disk type that has the most disks with the specified checksum and RPM values.

If there are two or more disk types with the same number of spare disks, Data ONTAP selects a disk type in the following order of preference:

MSATA
FSAS
BSAS
SSD
SATA
SAS
LUN
ATA
FCAL

Once it has identified a set of disks according to disk type and checksum type, a subset is selected based on RPM. This step is only performed if the identified disk type is neither SSD nor LUN, since the concept of rotational speed does not apply to these disk types. If the user has specified a desired RPM value, only disks with that value are present in the selected set. If the user has not specified a value, Data ONTAP groups all selected disks by their RPM values and chooses the group that has the most number of disks. If two or more groups have the same number of disks, the group with the highest RPM is selected. The value of the RPM mixing option for a specified disk type determines if the disks of that disk type will be considered equal with respect to RPM. If the option is set to false, all disks of that disk type are counted together in the same group, even if they have different RPM values. If the option is set to true, disks of that disk type are strictly separated into groups according to their RPM values.

If the user has specified a desired disk size in the aggregate creation command, Data ONTAP selects spare disks such that the desired size lies within 80%-120% of the spare disk's size. If the user has not specified a desired size, Data ONTAP uses the selected disks in ascending order of size. The largest disk is made the dparity disk and the next largest disk is made the parity disk of the RAID group. Among disks of the same size, preference is given to pre-zeroed disks.

Once a set of spare disks has been identified based on these attributes, Data ONTAP optimizes the selection based on the topology of the storage system. The topology optimization procedure is described in detail in the Topology-based optimization of selected spares section.

As mentioned earlier, the values of disk type and RPM considered by Data ONTAP during spare selection depend on the values of the disk type mixing options and the RPM mixing options.

Creation of a root aggregate

Data ONTAP is designed to prefer HDDs over SSDs for the creation of the root aggregate in the system, even if SSDs are more numerous. SSDs are selected for the root aggregate only if there are not enough HDDs.

Creation of a unmirrored aggregate

For an unmirrored aggregate, Data ONTAP selects a set of spare disks from one of the two pools. It counts the number of available spare disks in each pool and chooses the set that has the larger number. If neither of the two pools has a sufficient number of disks, the aggregate creation will fail with an error message. Data ONTAP will never select a set of disks that spans the two pools. However, this behavior can be overridden by specifying the '-d/-disklist' option with a list of disks spanning both pools, and the '-f/-force' option to override the pool check.

Creation of a SyncMirror aggregate

The procedure to create a SyncMirror aggregate is the same as for an unmirrored aggregate, with one difference. Instead of selecting one set of disks from either of the pools to form the aggregate, Data ONTAP selects two sets of disks, one from each pool, to form the two plexes of the aggregate. The disks selected from Pool0 have to be identical to the disks selected from Pool1, with respect to disk type, RPM and checksum type. They may, however, differ in size. Data ONTAP pairs each disk in Pool0 with a disk in Pool1, and if the disks in one pair differ in size and are selected to be the data disks in the RAID group, the larger disk is downsized to the size of the smaller disk. If the disks in a pair are selected to be the parity or dparity disks in the RAID group, no downsizing is required even if they are of different sizes. If there are not enough disks in either pool, or if the disks in one pool are different from the disks in the other pool with respect to disk type, RPM or checksum type, the aggregate creation fails.

► Spare selection for disk addition to an existing aggregate

The procedure to select spares to add to an existing aggregate is similar to the procedure to create a new aggregate. The user may specify desired values for some of the spare selection attributes; Data ONTAP determines the best values for the rest of the attributes. While determining the best values for the unspecified attributes, Data ONTAP takes into account the attributes of the disks that are already present in the aggregate.

Disk type: The user may specify a desired disk type for the disks to be added to the aggregate. If the specified disk type is an SSD disk type and the aggregate only contains HDDs, it will be converted to a Flash Pool, if the feature has been enabled (as described in the Flash Pools section). In this case, a new SSD tier will be created with the newly added disks forming one or more new RAID groups. If the specified disk type is an HDD disk type and the aggregate only contains HDDs, the usual rules governing the mixing of disk types will apply. If the user has not specified a disk type, Data ONTAP will try to determine the value based on the disk type of the other disks in the aggregate. This depends on which RAID group the new disks are to be added to, and can be specified by the user using the '-g' option. This option accepts the following values:

RAID group name - add the disks to a specified, existing RAID group until it is full; discard remaining disks
'new' - create one or more new RAID groups with the disks being added
'all' - add disks to all existing RAID groups until they are full; create new RAID groups after that

If the user has not specified a disk type but has specified a RAID group value, Data ONTAP will try to determine the disk type from the RAID group value specified. For example, if the user species an existing RAID group, Data ONTAP will choose spare disks with the same disk type as the disks in that RAID group. If no RAID group value is specified, Data ONTAP will choose disks with the disk type of the first RAID group in the aggregate. If new disks are to be added to a Flash Pool, the aggregate addition command must contain enough information to unambiguously identify the tier to which the disks are to be added. This can be done by explicitly specifying the disk type using the '-T' option, or by specifying a RAID group value (with the '-g' option) that allows Data ONTAP to infer the disk type from. The '-d' option may also be used to explicitly specify a disk list. However, Data ONTAP only allows disks to be added to one tier in a single command, so the disk list specified may not contain both HDDs and SSDs.

Checksum type: The user may specify a desired checksum type for the disks to be added. If the specified checksum type is different from the prevailing checksum type of the aggregate, the aggregate will become a mixed-checksum aggregate (described in the Mixed checksum aggregates section), and one or more new RAID groups will be created with the newly added disks. If the user has not specified a desired checksum type, Data ONTAP chooses disks of the same checksum type as the first RAID group in the aggregate.

RPM: The user is not allowed to specify a desired RPM value for disks to be added to an existing aggregate. Data ONTAP determines the prevailing RPM value in the aggregate, by grouping the disks in the aggregate by RPM, and choosing the RPM with the largest count of disks. If there are two or more same-sized sets of disks with different RPMs, the larger RPM value is chosen as the desired RPM value. In the absence of spares with the desired RPM value, Data ONTAP may select disks with a different RPM. This depends on the value of the RPM mixing option for the selected disk type - if the value is set to false, disks with a different RPM value may be selected. Disks with an RPM different from that of the majority of disks in the aggregate may be added to the aggregate, by specifying the disks with the '-d/-disklist' option together with the '-f/-force' option.

Size: If the user has specified a desired size for the disks to be added, Data ONTAP chooses spare disks such that the desired size lies within 80% - 120% of the selected spare disk's size. If the user has not specified a desired size, Data ONTAP uses the size of the largest data disk in the target RAID group as a 'baseline' size, and selects spare disks in the following order:

Disks that are the same size as the baseline size
Disks that are smaller than the baseline size, in descending order
Disks that are larger than the baseline size, in ascending order

If the disks are going to form a new RAID group, Data ONTAP finds the newest RAID group in the aggregate with the same disk type and checksum type as the disks being added, and uses the size of the largest data disk in that RAID group as the baseline size.

Once a set of spare disks has been identified based on these attributes, Data ONTAP optimizes the selection based on the topology of the storage system. The optimization procedure is described in detail in the Topology-based optimization of selected spares section.

Addition of disks to an unmirrored aggregate

In the case of an unmirrored aggregate, the selected spare disks will be chosen from the same pool that the majority of existing disks in the aggregate belong to. To add disks from the opposite pool, the '-d/-disklist' option can be used to specify the list of disks to be added, together with the '-f/-force' option to override the pool check, as described in the Spare selection with the '-disklist' option section.

Addition of disks to a SyncMirror aggregate

In the case of a SyncMirror aggregate, the selected spare disks are evenly divided between the two plexes, with an equal number of disks coming from each spare pool. If each pool does not have the required number of matching disks, Data ONTAP will not mix disks from the two pools, and the aggregate addition operation will fail.

► Spare selection for replacement of a failed disk

Data ONTAP uses the following attributes to select a replacement for a failed disk - disk type, RPM, pool, checksum type and disk size. The desired values for these attributes are determined by Data ONTAP, by considering the attributes of the failed disk that is being replaced, as well as some attributes of the aggregate to which it belonged. A matching spare disk is a spare disk that has the desired values for all the attributes considered. A suitable spare disk is a spare disk that does not have all the desired values but is deemed a suitable replacement for the failed disk. Data ONTAP first tries to find a matching spare disk to replace the failed disk. If it does not find any matching spares, it tries to find a suitable spare disk.

Data ONTAP determines the desired values of the selection attributes as follows:

Disk type: Disk type is a hard requirement in the selection of a replacement spare disk - the disk type of the selected spare disk has to be the same as the disk type of the RAID group that the failed disk belonged to. Data ONTAP will not select a spare disk with a different disk type to replace a failed disk. However, as described in the Disk type section, certain disk types will be grouped together during spare selection, if the disk type mixing options are turned on.

RPM: The desired RPM value of the selected spare disk is based on the RPMs of the remaining disks in the aggregate (or plex, if it is a SyncMirror aggregate), and not on the RPM of the failed disk. A matching spare disk has the same RPM value as the majority of disks in the aggregate. If there are two or more same-sized sets of disks with different RPMs, the larger RPM value is chosen as the desired RPM value. In the absence of matching spare disks, Data ONTAP may select suitable spare disks with a different RPM value. Disks with higher RPM values are preferred, but if there are none, disks with lower RPM may be selected as well. The RPM mixing options decide whether or not to allow the mixing of disks with different RPMs in the same aggregate, as described in the RPM section.

Pool: A matching spare disk has to belong to the same pool as the parent plex of the aggregate containing the failed disk. In the absence of a matching spare disk, Data ONTAP may select a suitable spare disk from the opposite pool, if the aggregate is unmirrored. For a mirrored aggregate, Data ONTAP will select a disk from the opposite pool only if the aggregate is mirror-degraded or is resyncing.

Checksum: The desired checksum type of a spare disk is the checksum type of the RAID group that the failed disk belonged to. Data ONTAP may select a spare disk with a different checksum type, if the selected spare disk also supports the desired checksum type.

Size: Selected spare disks have to be the same size as or larger than the failed disk being replaced. If the disks selected are larger in size, they are downsized before being used.

If multiple matching or suitable spare disks are found, Data ONTAP uses two additional attributes to choose a single disk - the pre-zeroed status of the disks and the topology of the storage system. Data ONTAP gives preference to spares that are already zeroed, as described in the Pre-zeroed status section. It also tries to optimize the selection based on the topology of the storage system, as described in the Topology-based optimization of selected spares section.

Failed disk replacement in an unmirrored aggregate

Data ONTAP first tries to find a matching spare disk to replace a failed disk in an unmirrored aggregate. If there are no matching spares found, it tries to find a suitable spare disk, by varying the selection attributes in the following order:

Different RPM, same pool
Same RPM, different pool
Different RPM, different pool

Failed disk replacement in a SyncMirror aggregate

As in the case of an unmirrored aggregate, Data ONTAP first tries to find a matching spare disk to replace a failed disk. If there are no matching spares available, it looks for suitable spares. The attribute variations listed above are tried in the same order, with one difference - Data ONTAP does not look for suitable spare disks in the opposite pool, if the aggregate is in a normal, fault-isolated state. Data ONTAP will search for suitable spares in the opposite pool only if the aggregate is mirror-degraded or is resyncing, with the plex containing the failed disk serving as the source of the resync. In all other cases, the disk replacement will fail if there are not any suitable or matching spares available in the same pool.

► Spare selection with DS4486 shelves

Data ONTAP 8.1.1 introduces support for DS4486 disk shelves - a new dense disk shelf in which two physical disks are housed per disk carrier. In a DS4486 shelf, the smallest field replaceable unit (FRU) is the disk carrier, which means that it is the smallest unit in the shelf that can be replaced individually. If either of the disks in a carrier fails, the entire carrier has to be replaced, even if the other disk is healthy. If the healthy disk in a failed carrier is part of an aggregate, Data ONTAP has to initiate a disk copy operation to copy the healthy disk to another disk, before the carrier can be taken out of the shelf to be replaced. Thus, spare selection in a DS4486 environment is slightly different, because each carrier has to be considered a single point of failure.

Data ONTAP avoids allocating two spares from the same carrier into the same RAID group, because a failure in one of the disks in the carrier would require a complete disk copy of the healthy disk along with the reconstruction on the selected spare disk, putting the RAID group at risk while these operations are in progress. Data ONTAP also avoids selecting a spare disk from a carrier that already has a failed or pre-failed disk. These modifications in the selection are all performed during the topology optimization stage. The selection of spare disks is done as usual, with each disk in a carrier considered independently (disks within the same carrier usually have identical characteristics). Once Data ONTAP has identified candidate spare disks, it orders all of them by channel, shelf, carrier and slot. All selected spare disks that have a failed or pre-failed disk as a carrier-mate are removed from consideration. It then estimates the 'load' on each point of failure in the topology, including each carrier. A carrier that has two spare disks is given a higher preference than a carrier that has one spare disk and one used disk. Data ONTAP then allocates disks, trying as far as possible to evenly distribute disks across all points of failure, and alternating the selected disks between channels, shelves and carriers.

When the number of spare disks in the system is low, Data ONTAP cannot avoid allocating two disks from a carrier into the same RAID group. When this happens, a background process is started after the aggregate addition, which performs a series a disk copy operations to rearrange the disks in existing RAID groups to eliminate cases of two disks from one carrier being in the same RAID group.

► Spare selection parameters and options

The aggregate creation and addition commands accept certain input parameters which can be used to specify values for disk attributes that must be considered during spare selection. During aggregate creation or addition, the user should specify values for as many of these parameters as possible to ensure the selection of a desired set of disks. These parameters are as follows:

-T <disk type>
-R <rpm value>
-c <checksum type>
@<size value>

In addition to these parameters, spare selection behavior also depends on the values of the disk type mixing options and RPM mixing options. Unexpected spare disk selections could arise as a result of the values that these options are set to. For instance, in Data ONTAP 8.1 and earlier, disk type mixing is allowed by default, which could result in an unexpected disk type being selected, even when the '-T' option is explicitly used to specify a disk type. As an example, if disk type mixing is allowed, Data ONTAP considers FCAL and SAS disks to be part of the same disk type group ('SAS'), so a command like 'aggr create <aggrname> -T FCAL <diskcount>' may result in the aggregate being created with SAS disks, even if the required number of FCAL disks are present in the system. This is because the FCAL and SAS disks are considered equivalent with regard to disk type, and so the selection of disks is made on the basis of other disk attributes like RPM, checksum type, size, topology, etc., which could result in the SAS disks being given preference over the FCAL disks. If a strict enforcement of disk types is required, the disk type mixing options should be disabled.

Similar to the enforcement of disk type, the RPM mixing options control the selection of disks based on RPM. If a strict enforcement of RPM is required, these options should be disabled.

► Spare selection with the '-disklist' option

The aggregate creation and addition commands have an option '-d' that accepts a space-separated list of spare disks. Data ONTAP checks this list to ensure that the disks have compatible values of disk type, RPM, checksum type and pool, and then carries out the creation or addition operation with the specified disks. For the creation of an unmirrored aggregate, Data ONTAP checks that the disks in the disk list belong to the same pool and have the same RPM value. For the addition of disks to an unmirrored aggregate, Data ONTAP checks that the disks in the disk list belong to the same pool, and have the same RPM value as the prevailing RPM in the aggregate. If these checks fail, Data ONTAP rejects the disk list and fails the command. This behavior can be overridden with the '-f/-force' option - when a disk list is specified along with the '-f' option, Data ONTAP ignores the results of the RPM and pool checks, thus allowing disks from different pools and with different RPMs to be present in the same aggregate.

For the creation of or addition of disks to a SyncMirror aggregate, Data ONTAP expects two disk lists to be specified, one for each pool. The '-f' option can be used here as well, to override the RPM and pool checks.

► Examples

On a system with 10 FCAL, 10 SAS and 10 SATA disks, the user executes the command 'aggr create <aggrname> 5'. Which disk type does Data ONTAP select for the creation of the new aggregate?

The disk type selected depends on the value of the disk type mixing option. If disk type mixing is allowed, FCAL and SAS disks are considered as having group disk type SAS, so they are counted together. Data ONTAP picks the disk type that has the most number of disks. Assuming that all disks have the same checksum type, it selects disk type SAS (10 FCAL + 10 SAS disks = 20 disks with group disk type SAS vs. 10 disks with group disk type ATA). From the set of disks with group disk type SAS, Data ONTAP could end up selecting either FCAL or SAS disks for the creation of the aggregate - that would depend on the other disk attributes, such as RPM, size, pre-zeroed status and storage topology.

If disk type mixing is not allowed, the three disk types are considered separately. Since all three disk types have the same number of disks, Data ONTAP chooses a disk type in the order listed in the Spare selection for new aggregate creation section. SAS is higher on the list than FCAL and SATA, so Data ONTAP will select 5 SAS disks for the creation of the new aggregate.

On a system with 6 SATA BCS disks, 4 MSATA AZCS disks and 8 FCAL BCS disks, the user executes the command 'aggr create <aggrname> 5'. Which disk type and checksum type does Data ONTAP select for the creation of the aggregate?

The selection is done first by the checksum type, then by disk type and count. Data ONTAP first considers AZCS checksum disks, and counts the number of disks of each disk type. Since there are only 4 AZCS checksum disks in total and the user wants 5 disks, we move on to the next checksum type - BCS. There are 6 SATA disks and 8 FCAL disks with checksum type BCS. Data ONTAP selects the disk type which has the higher number of disks - FCAL. If there was an equal number of SATA and FCAL disks, it would have selected a disk type in the order listed in the Spare selection for new aggregate creation section, so it would have picked SATA. In both cases, the checksum type selected is BCS.

A disk in an unmirrored aggregate fails and Data ONTAP has to select a spare disk to replace it. The other disks in the aggregate are of type FCAL, checksum BCS, 10K RPM and from Pool0. The available spare disks are as follows:

Group 1 - disk type FCAL, checksum BCS, RPM 15K, Pool1
Group 2 - disk type SATA, checksum BCS, RPM 7.2K, Pool1
Group 3 - disk type SAS, checksum BCS, RPM 15K, Pool0

Which group of disks does Data ONTAP pick a replacement disk from?

In this case, there is no perfectly matching spare available for the failed disk, because none of the spare disks have all the desired attributes. Data ONTAP first identifies the spare disks with matching disk type. Assuming that disk type mixing is allowed on the system, Data ONTAP treats FCAL and SAS disks as having the same effective disk type, so all FCAL and SAS spare disks are considered suitable replacements with respect to disk type. From this set of disks, Data ONTAP tries to find a suitable spare disk to replace the failed disk using the variations listed earlier:

Different RPM, same pool
Same RPM, different pool
Different RPM, different pool

Looking at the list of variations, the disks in group 3 match variation 1 on the list - different RPM, same pool. So Data ONTAP will pick a replacement disk from group 3. In this example, if the disks in group 3 were not present, Data ONTAP would go down the list till variation 3 - different RPM, different pool - and pick a disk from group 1.

If disk type mixing was turned off on the system, Data ONTAP would consider FCAL and SAS disks different with regard to disk type, and would consider only the FCAL spare disks suitable replacements for the failed disk. Thus, it would select a replacement disk from among the available FCAL spare disks in Group 1.

Additional Information

Add your text here.