SyncMirror plex failed reported after aggregate healing performed
- Views:
- 45
- Visibility:
- Public
- Votes:
- 0
- Category:
- metrocluster
- Specialty:
- not set
- Last Updated:
- 2/1/2025, 12:49:14 AM
Applies to
- ONTAP 9
- Metrocluster FC
- FlexArray with NetApp E-Series backend
Issue
- Following a forced-on-disater switchover, aggregate healing is run on the suriving site, which shows successful:
mcc.drsom.fsmStateTrans:debug]: params: {'from_state': 'heal_aggrs_in_progress', 'event': 'success'}
mcc.drsom.fsmStateEntry:debug]: params: {'state': 'heal_aggrs_complete'}
- It is followed immediately by SyncMirror Plex failures:
raid.assim.rg.missingChild:debug]: Aggregate stor168sp4, rgobj_verify: RAID object 0 has only 4 valid children, expected 5.
raid.assim.plex.missingChild:debug]: Aggregate stor168sp4, plexobj_verify: Plex 0 only has 0 working RAID groups (1 total) and is being taken offline
raid.assim.rg.missingChild:debug]: Aggregate stor168sp1, rgobj_verify: RAID object 0 has only 5 valid children, expected 6.
raid.assim.plex.missingChild:debug]: Aggregate stor168sp1, plexobj_verify: Plex 0 only has 0 working RAID groups (1 total) and is being taken offline
raid.assim.rg.missingChild:debug]: Aggregate stor168sp15, rgobj_verify: RAID object 0 has only 4 valid children, expected 5.
raid.assim.plex.missingChild:debug]: Aggregate stor168sp15, plexobj_verify: Plex 6 only has 0 working RAID groups (1 total) and is being taken offline
- Remote mirror plexes on switched over aggregates are missing LUNs:
Aggregate stor168sp4 (online, raid0, mirror degraded) (block checksums)
Plex /stor168sp4/plex0 (offline, failed, inactive)
RAID group /stor168sp4/plex0/rg0 (partial, block checksums)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
data FAILED N/A 13972000/ -
data lns24bb1:14.126L28 0f - - 0 LUN N/A 13972000/28614656000 14000000/28672000000
data lns24ab1:14.126L29 0e - - 0 LUN N/A 13972000/28614656000 14000000/28672000000
data lns24ab1:14.126L31 0e - - 0 LUN N/A 13972000/28614656000 14000000/28672000000
data lns24bb1:14.126L30 0f - - 0 LUN N/A 13972000/28614656000 14000000/28672000000
Raid group is missing 1 disk.
- Missing LUNs are observed in broken pool of switchover cluster and incorrectly owned by switched-over cluster:
Aggregate stor168sp4 (failed, raid0, partial) (block checksums)
Plex /stor168sp4/plex0 (offline, failed, inactive)
RAID group /stor168sp4/plex0/rg0 (partial, block checksums)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
data lns24ab1:14.126L27 0e - - 0 LUN N/A 13972000/28614656000 14000000/28672000000
data FAILED N/A 13972000/ -
data FAILED N/A 13972000/ -
data FAILED N/A 13972000/ -
data FAILED N/A 13972000/ -
Raid group is missing 4 disks.