What is a MailBox disk?
- Data ONTAP
What is a MailBox (MB) disk?
Mailbox disks are used to store data related to the HA-Pair that needs to be persistent across reboots. This data includes information about the cluster state, state of the mirrors, whether a shutdown was performed in a 'clean' way or not and so on.
An MB disk is another way to determine the state of the partner.
What is the function of the MB disk in case the InterConnect fails? How will the HA-Pair behave?
If the interconnect fails, the cluster can still determine if the partner is alive or not. The behavior differs depending on the situation following a failure of the interconnect:
- If the partner is still updating MB disks, then takeover will be disabled (as no NVRAM replication is taking place). No takeover will happen.
- If the partner is no longer updating MB disks, but no shutdown was performed, no takeover will happen.
- If the partner is no longer updating MB disks, but clean shutdown such as a 'halt' was performed, takeover will happen.
What if the partner just 'dies'? How will the HA-Pair behave?
- If the partner is no longer updating MB disks and cf was not disabled, takeover will happen (for example: Head loses power or freezes).
- If the partner is no longer updating MB disks and cf was disabled, no takeover will happen (for example: InterConnect failed before head stopped responding).
How does the storage system select a mailbox disk?
The storage system always selects the parity disk and the first data disk of the root aggr/volume to be the two mailbox disks in a RAID4 configuration. If the aggr is
Raid_DP, then the Parity and the D-Parity disks are selected.
In a SyncMirror configuration, the MB disks are mirrored. An HA-Pair deploying SyncMirror will have eight MB disks, four per node, assuming the root aggr/volume is mirrored and consists out of at least two disks.
Note: V-Series having a single LUN as root aggr might only have four MB disks.
How does the storage system access mailbox disks?
The storage system writes information to its own mailbox disks. It reads the information that is written by its partner from the partner's mailbox disks, but it never writes anything on the partner's mailbox. Each MB disk is queried in intervals of 3-5 seconds.
How does Data ONTAP use mailbox disks to judge in which situation the cluster is to be disabled?
- In normal situations, the 'more than half' rule is applied. This means that 'more than half' of the MB disks must be available to have takeover enabled.
- In a non-mirrored HA-Pair:
The node usually has two MB disks (exception: V-Series might only have one if the root aggr is consisting out of one LUN only). If one of these two MB disks fails, the takeover will be disabled. The MB states are verified and once another MB disk has been selected, takeover will be enabled.
- In a SyncMirrored configuration:
The node usually has four MB disks, two local and two remote (exception: V-Series might have only two, one local and one remote if root aggr consists of one LUN only).
If one plex containing two MB disks fails, takeover will be disabled. MB states are checked and these disks are removed from the MB disk set. Takeover will then be enabled.
If a node and one plex of the root aggregate fail simultaneously, takeover will not be possible and cannot be forced. The MetroCluster product provides this level of protection.
If one MB disk fails: Takeover will not be disabled. A new disk will be chosen.
If two MB disks fail: If the plex is still online, takeover will be disabled. If the plex is offline- see the plex-failure condition explained earlier. If one local and one remote MB disk fail, MB states are verified and takeover is enabled.
- Disk-fail: the disk fails and will be reconstructed.
- Disk-access-fail: the disk is fine but MB traffic fails.
The above scenarios are true for Disk-Failures. If, however, MB Disk-Access fails, then cf will be disabled regardless of the number of MB disks the access is failing for.
What if an MB disk fails during reboot?
If an MB disk fails during reboot, that is, the MB disk is available on shutdown but no longer on reboot, then the storage system will not boot due to 'stale MB instance on the local/remote side'.
To solve this, boot into maintenance mode and destroy the local set of MB disks. A new set of MB disks will be selected on reboot. Steps to recover are described in article Error message: Permanent errors on all HA mailbox disks (while writing master block) in process fmmbx_instance.
Note: Destroying local and partner MB disks might make a CFOD (ClusterFailoverOnDisaster) impossible.
Add your text here.