Volume move forced cutover caused interruption of hosted Flexcache volumes

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 551

Visibility:: Public

Votes:: 0

Category:: ontap-9

Specialty:: core

Last Updated:

Applies to

ONTAP 9
FlexCache

Issue

Customer starts volume move operation on a volume hosting several FlexCache volumes
Error seen after aborting cutover since it is taking too long for completion:

cluster::*> volume move show -instance Vserver Name: vserverA Volume Name: volumeA Volume Instance UUID: b3a1c763-2446-11eb-8ae3-d039ea00000 Destination Aggregate: aggr1 Destination Node: node05 Detailed Status: Volume move job cleaning up. Error: Volume move job stopped by user "admin". Estimated Time of Completion: - Internal Progress of Move: Volume move job cleaning up. Actual State of Job: CleanupState Job ID: 120749 Job UUID: 08d17cb7-0c07-11ee-80ec-d039e00000 Managing Node: node09 Percentage Complete: - Move Phase: cleaning_up Prior Issues Encountered: 6/18/2023 18:55:20 : Volume move job stopped by user "admin". 6/18/2023 18:55:20 : Move transfer failed: Device busy 6/18/2023 14:43:41 : Preparing source volume for cutover: Timeout: Operation "srcVolMoveObject_lockdown_iterator::create_imp()" took longer than 200 seconds to complete [from mgwd on node "node09" (VSID: -3) to kernel at 127.0.0.1] 6/18/2023 14:38:24 : Preparing source volume for cutover: Volume quiesce failed because there are outstanding file system requests on the volume (Volume can't be quiesced as it did not drain in time.) 6/18/2023 08:25:16 : Preparing source volume for cutover: The volume is involved in a SnapMirror operation and cannot be moved until the SnapMirror operation is complete. Wait for the SnapMirror operation to finish or abort the SnapMirror operation by issuing a 'snapmirror abort -hard true' command. Estimated Remaining Duration: - Replication Throughput: 340KB/s Duration of Move: 4 days 12:30 Source Aggregate: aggr1 Source Node: node09 Start Time of Move: Thu Jun 15 22:31:24 2023 Move State: warning

Error generates Vreport:

cluster::*> debug vreport show volume Differences: Name Reason Attributes -------- ------- --------------------------------------------------- vserverA:volumeA Present in VLDB and WAFL volume busy Node Name: scc67n09b Volume DSID:1072 MSID:2147627075 UUID: unknown Aggregate Name: aggr1 Aggregate UUID: df225b29-ecd2-4168-989e-0d64e0b0fa80 Vserver UUID: 7fdb68ef-0f1d-11eb-b1dd-d039ea1f2ee7 AccessType: DP_READ_ONLY StorageType: REGULAR Constituent Role: none junction Differences: Name Reason Attributes -------- ------- --------------------------------------------------- vserverA:volumeA Child volume not present in WAFL Parent Info: VolName: vserverA_rootvol MSID: 2147627028 DSID: 1025 vsID: 13 Child: (Not Present in WAFL) VolName: sdg74_ipr_0030RePlAcEDoTzoneonly MSID: 0 DSID: 1072 vsID: 13

Attempting to clean up the Vreport results in 'object not found'
EMS shows Nblade.JunctionRootLookup2:error:

Mon Jun 19 23:42:21 -0700 [scc67n09b: nblade2: Nblade.JunctionRootLookup2:error]: Junction root lookup of a volume in Vserver 13 with MSID 2147627075 has failed for reason "SPINNP(264)".

All hosted Flexcache volumes become inaccessible from clients after aborting forced cutover