Skip to main content
NetApp Knowledge Base

SyncMirror Plex Failed - AutoSupport message

Views:
7,862
Visibility:
Public
Votes:
0
Category:
metrocluster
Specialty:
metrocluster
Last Updated:

 

Applies to

  • MetroCluster
  • Data ONTAP 8
  • ONTAP 9
  • SyncMirror

Event Summary

The AutoSupport message SYNCMIRROR PLEX FAILED indicates that a plex of a SyncMirror has failed and the SyncMirror relationship is in a degraded state.

Validate

Determine which plexes are reporting as failed:

storage aggregate show
 
Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
aggr0_siteA_02
           953.8GB   46.22GB   95% online       1 siteA-02         raid_dp,
                                                                   mirrored,
                                                                   normal
aggr1_siteA_02
            2.79TB    2.78TB    0% online       2 siteA-02         raid_dp,
                                                                   mirror
                                                                   degraded

Resolution

ONTAP 9
  1. Determine the aggregate and the failed plex:

siteA::> storage aggregate show
 
Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
aggr0_siteA_02
           953.8GB   46.22GB   95% online       1 siteA-02         raid_dp,
                                                                   mirrored,
                                                                   normal
aggr1_siteA_02
            2.79TB    2.78TB    0% online       2 siteA-02         raid_dp,
                                                                   mirror
                                                                   degraded

 

A status of degraded indicates an inoperable plex.

  1. Determine the reason for the plex failure:
    • disk
    • shelf
    • switch ISL failure
    • site failure

SiteA::> storage aggregate show-status -aggregate aggr2_siteA_02

Owner Node: SiteA-02
 Aggregate: aggr2_siteA_02 (online, raid_dp, mirror degraded) (block checksums)
  Plex: /aggr2_siteA_02/plex0 (online, normal, active, pool0)
   RAID Group /aggr2_siteA_02/plex0/rg0 (normal, block checksums)
                                                              Usable Physical
     Position Disk                        Pool Type     RPM     Size     Size Status
     -------- --------------------------- ---- ----- ------ -------- -------- ----------
     dparity  3.11.4                       0   SAS    10000   1.09TB   1.09TB (normal)
     parity   3.11.5                       0   SAS    10000   1.09TB   1.09TB (normal)
     data     3.11.6                       0   SAS    10000   1.09TB   1.09TB (normal)
     data     3.11.18                      0   SAS    10000   1.09TB   1.09TB (normal)
     data     3.11.19                      0   SAS    10000   1.09TB   1.09TB (normal)

  Plex: /aggr2_siteA_02/plex1 (offline, failed, inactive, pool1)
   RAID Group /aggr2_siteA_02/plex1/rg0 (partial, none checksums)
                                                              Usable Physical
     Position Disk                        Pool Type     RPM     Size     Size Status
     -------- --------------------------- ---- ----- ------ -------- -------- ----------
     dparity  FAILED                       -   -          -   1.09TB       0B (failed)
     parity   FAILED                       -   -          -   1.09TB       0B (failed)
     data     FAILED                       -   -          -   1.09TB       0B (failed)
     data     FAILED                       -   -          -   1.09TB       0B (failed)
     data     FAILED                       -   -          -   1.09TB       0B (failed)

Solving the cause may bring the disks back into service and the plex back to operable state. If the plex comes back to operational state the resynchronization process should start automatically. In this case no further action is required. You can monitor the resynchronization process using:

SiteA:> storage aggregate plex show

 

  1. If the cause cannot be rectified - such as true disk (hardware) failures, a site failure, power surge or similar, and not sufficient disks can be brought back to service to let the plex come online, then the plex cannot be repaired, but must be destroyed and recreated.

Note: Destroying and recreating a plex requires a full mirror baseline to take place. Ensure that there are sufficient spares in the pool to recreate the mirror.

To destroy and recreate the mirror perform the following steps:

  • storage aggregate plex delete -aggregate <aggr_name> -plex <degraded_plex_name>
  • storage aggregate mirror -aggregate <aggr_name>
Data Ontap 8.2
  1. Determine the aggregate and the failed plex from 'aggr status -v' output

> aggr status -v
           Aggr State           Status                Options
          aggr1 online          raid_dp, aggr         nosnap=on, raidtype=raid_dp, raidsize=14,
                                mirrored              ignore_inconsistent=off, snapmirrored=off,
                                64-bit                resyncsnaptime=60, fs_size_fixed=off,
                                                      lost_write_protect=on, ha_policy=cfo,
                                                      hybrid_enabled=off, percent_snapshot_space=15%,
                                                      free_space_realloc=off
                Volumes: vol1, vol2, vol3

                Plex /aggr1/plex0: online, normal, active
                    RAID group /aggr1/plex0/rg0: normal, block checksums
                Plex /aggr1/plex1: offline, failed, inactive
                    RAID group /aggr1/plex1/rg0: partial, block checksums

A status of 'partial' indicates an inoperable plex.

  1. Determine the reason for the plex failure:
    • disk
    • shelf
    • switch ISL failure
    • site failure

Solving the cause may bring the disks back into service and the plex back to operable state. If the plex comes back to operational state the resynchronization process should start automatically. In this case no further action is required. You can monitor the resynchronization process using:

> sysconfig -r

  1. If the cause cannot be rectified - such as true disk (hardware) failures, a site failure, power surge or similar, and not sufficient disks can be brought back to service to let the plex come online, then the plex cannot be repaired, but must be destroyed and recreated.

Note: Destroying and recreating a plex requires a full mirror baseline to take place. Ensure that there are sufficient spares in the pool to recreate the mirror.

To destroy and recreate the mirror perform the following steps:

  • aggr destroy aggr0/plex1
  • aggr mirror aggr0

Additional Information

If you require assistance with troubleshooting the plex failure or any further assistance, contact NetApp Technical Support.

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.