SyncMirror Plex Failed - AutoSupport message

Last updated

Oct 10, 2022
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 10,265

Visibility:: Public

Votes:: 0

Category:: metrocluster

Specialty:: metrocluster

Last Updated:: 10/10/2022, 1:27:38 PM

Applies to

MetroCluster
Data ONTAP 8
ONTAP 9
SyncMirror

Event Summary

The AutoSupport message SYNCMIRROR PLEX FAILED indicates that a plex of a SyncMirror has failed and the SyncMirror relationship is in a degraded state.

Validate

Determine which plexes are reporting as failed:

storage aggregate show Aggregate Size Available Used% State #Vols Nodes RAID Status --------- -------- --------- ----- ------- ------ ---------------- ------------ aggr0_siteA_02 953.8GB 46.22GB 95% online 1 siteA-02 raid_dp, mirrored, normal aggr1_siteA_02 2.79TB 2.78TB 0% online 2 siteA-02 raid_dp, mirror degraded

Resolution

ONTAP 9

Determine the aggregate and the failed plex:

siteA::> storage aggregate show Aggregate Size Available Used% State #Vols Nodes RAID Status --------- -------- --------- ----- ------- ------ ---------------- ------------ aggr0_siteA_02 953.8GB 46.22GB 95% online 1 siteA-02 raid_dp, mirrored, normal aggr1_siteA_02 2.79TB 2.78TB 0% online 2 siteA-02 raid_dp, mirror degraded

A status of degraded indicates an inoperable plex.

Determine the reason for the plex failure:
- disk
- shelf
- switch ISL failure
- site failure

SiteA::> storage aggregate show-status -aggregate aggr2_siteA_02

Owner Node: SiteA-02 Aggregate: aggr2_siteA_02 (online, raid_dp, mirror degraded) (block checksums) Plex: /aggr2_siteA_02/plex0 (online, normal, active, pool0) RAID Group /aggr2_siteA_02/plex0/rg0 (normal, block checksums) Usable Physical Position Disk Pool Type RPM Size Size Status -------- --------------------------- ---- ----- ------ -------- -------- ---------- dparity 3.11.4 0 SAS 10000 1.09TB 1.09TB (normal) parity 3.11.5 0 SAS 10000 1.09TB 1.09TB (normal) data 3.11.6 0 SAS 10000 1.09TB 1.09TB (normal) data 3.11.18 0 SAS 10000 1.09TB 1.09TB (normal) data 3.11.19 0 SAS 10000 1.09TB 1.09TB (normal)

Plex: /aggr2_siteA_02/plex1 (offline, failed, inactive, pool1) RAID Group /aggr2_siteA_02/plex1/rg0 (partial, none checksums) Usable Physical Position Disk Pool Type RPM Size Size Status -------- --------------------------- ---- ----- ------ -------- -------- ---------- dparity FAILED - - - 1.09TB 0B (failed) parity FAILED - - - 1.09TB 0B (failed) data FAILED - - - 1.09TB 0B (failed) data FAILED - - - 1.09TB 0B (failed) data FAILED - - - 1.09TB 0B (failed)

Solving the cause may bring the disks back into service and the plex back to operable state. If the plex comes back to operational state the resynchronization process should start automatically. In this case no further action is required. You can monitor the resynchronization process using:

SiteA:> storage aggregate plex show

If the cause cannot be rectified - such as true disk (hardware) failures, a site failure, power surge or similar, and not sufficient disks can be brought back to service to let the plex come online, then the plex cannot be repaired, but must be destroyed and recreated.

Note: Destroying and recreating a plex requires a full mirror baseline to take place. Ensure that there are sufficient spares in the pool to recreate the mirror.

To destroy and recreate the mirror perform the following steps:

storage aggregate plex delete -aggregate <aggr_name> -plex <degraded_plex_name>
storage aggregate mirror -aggregate <aggr_name>

Data Ontap 8.2

Determine the aggregate and the failed plex from 'aggr status -v' output

> aggr status -v Aggr State Status Options aggr1 online raid_dp, aggr nosnap=on, raidtype=raid_dp, raidsize=14, mirrored ignore_inconsistent=off, snapmirrored=off, 64-bit resyncsnaptime=60, fs_size_fixed=off, lost_write_protect=on, ha_policy=cfo, hybrid_enabled=off, percent_snapshot_space=15%, free_space_realloc=off Volumes: vol1, vol2, vol3

Plex /aggr1/plex0: online, normal, active RAID group /aggr1/plex0/rg0: normal, block checksums Plex /aggr1/plex1: offline, failed, inactive RAID group /aggr1/plex1/rg0: partial, block checksums

A status of 'partial' indicates an inoperable plex.

Determine the reason for the plex failure:
- disk
- shelf
- switch ISL failure
- site failure

> sysconfig -r

If the cause cannot be rectified - such as true disk (hardware) failures, a site failure, power surge or similar, and not sufficient disks can be brought back to service to let the plex come online, then the plex cannot be repaired, but must be destroyed and recreated.

Note: Destroying and recreating a plex requires a full mirror baseline to take place. Ensure that there are sufficient spares in the pool to recreate the mirror.

To destroy and recreate the mirror perform the following steps:

aggr destroy aggr0/plex1
aggr mirror aggr0

Additional Information

If you require assistance with troubleshooting the plex failure or any further assistance, contact NetApp Technical Support.