SyncMirror Plex Failed - AutoSupport message
Applies to
- MetroCluster
- Data ONTAP 8
- ONTAP 9
- SyncMirror
Event Summary
The AutoSupport message SYNCMIRROR PLEX FAILED
indicates that a plex of a SyncMirror has failed and the SyncMirror relationship is in a degraded state.
Validate
Determine which plexes are reporting as failed:
storage aggregate show
Aggregate Size Available Used% State #Vols Nodes RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
aggr0_siteA_02
953.8GB 46.22GB 95% online 1 siteA-02 raid_dp,
mirrored,
normal
aggr1_siteA_02
2.79TB 2.78TB 0% online 2 siteA-02 raid_dp,
mirror
degraded
Resolution
ONTAP 9
- Determine the aggregate and the failed plex:
siteA::> storage aggregate show
Aggregate Size Available Used% State #Vols Nodes RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
aggr0_siteA_02
953.8GB 46.22GB 95% online 1 siteA-02 raid_dp,
mirrored,
normal
aggr1_siteA_02
2.79TB 2.78TB 0% online 2 siteA-02 raid_dp,
mirror
degraded
A status of degraded
indicates an inoperable plex.
- Determine the reason for the plex failure:
- disk
- shelf
- switch ISL failure
- site failure
SiteA::> storage aggregate show-status -aggregate aggr2_siteA_02
Owner Node: SiteA-02
Aggregate: aggr2_siteA_02 (online, raid_dp, mirror degraded) (block checksums)
Plex: /aggr2_siteA_02/plex0 (online, normal, active, pool0)
RAID Group /aggr2_siteA_02/plex0/rg0 (normal, block checksums)
Usable Physical
Position Disk Pool Type RPM Size Size Status
-------- --------------------------- ---- ----- ------ -------- -------- ----------
dparity 3.11.4 0 SAS 10000 1.09TB 1.09TB (normal)
parity 3.11.5 0 SAS 10000 1.09TB 1.09TB (normal)
data 3.11.6 0 SAS 10000 1.09TB 1.09TB (normal)
data 3.11.18 0 SAS 10000 1.09TB 1.09TB (normal)
data 3.11.19 0 SAS 10000 1.09TB 1.09TB (normal)
Plex: /aggr2_siteA_02/plex1 (offline, failed, inactive, pool1)
RAID Group /aggr2_siteA_02/plex1/rg0 (partial, none checksums)
Usable Physical
Position Disk Pool Type RPM Size Size Status
-------- --------------------------- ---- ----- ------ -------- -------- ----------
dparity FAILED - - - 1.09TB 0B (failed)
parity FAILED - - - 1.09TB 0B (failed)
data FAILED - - - 1.09TB 0B (failed)
data FAILED - - - 1.09TB 0B (failed)
data FAILED - - - 1.09TB 0B (failed)
Solving the cause may bring the disks back into service and the plex back to operable state. If the plex comes back to operational state the resynchronization process should start automatically. In this case no further action is required. You can monitor the resynchronization process using:
SiteA:> storage aggregate plex show
- If the cause cannot be rectified - such as true disk (hardware) failures, a site failure, power surge or similar, and not sufficient disks can be brought back to service to let the plex come online, then the plex cannot be repaired, but must be destroyed and recreated.
Note: Destroying and recreating a plex requires a full mirror baseline to take place. Ensure that there are sufficient spares in the pool to recreate the mirror.
To destroy and recreate the mirror perform the following steps:
storage aggregate plex delete -aggregate <aggr_name> -plex <degraded_plex_name>
storage aggregate mirror -aggregate <aggr_name>
Data Ontap 8.2
- Determine the aggregate and the failed plex from
'aggr status -v
' output
> aggr status -v
Aggr State Status Options
aggr1 online raid_dp, aggr nosnap=on, raidtype=raid_dp, raidsize=14,
mirrored ignore_inconsistent=off, snapmirrored=off,
64-bit resyncsnaptime=60, fs_size_fixed=off,
lost_write_protect=on, ha_policy=cfo,
hybrid_enabled=off, percent_snapshot_space=15%,
free_space_realloc=off
Volumes: vol1, vol2, vol3
Plex /aggr1/plex0: online, normal, active
RAID group /aggr1/plex0/rg0: normal, block checksums
Plex /aggr1/plex1: offline, failed, inactive
RAID group /aggr1/plex1/rg0: partial, block checksums
A status of 'partial
' indicates an inoperable plex.
- Determine the reason for the plex failure:
- disk
- shelf
- switch ISL failure
- site failure
Solving the cause may bring the disks back into service and the plex back to operable state. If the plex comes back to operational state the resynchronization process should start automatically. In this case no further action is required. You can monitor the resynchronization process using:
> sysconfig -r
- If the cause cannot be rectified - such as true disk (hardware) failures, a site failure, power surge or similar, and not sufficient disks can be brought back to service to let the plex come online, then the plex cannot be repaired, but must be destroyed and recreated.
Note: Destroying and recreating a plex requires a full mirror baseline to take place. Ensure that there are sufficient spares in the pool to recreate the mirror.
To destroy and recreate the mirror perform the following steps:
aggr destroy aggr0/plex1
aggr mirror aggr0
Additional Information
If you require assistance with troubleshooting the plex failure or any further assistance, contact NetApp Technical Support.