Volume move forced cutover caused interruption of hosted Flexcache volumes
Applies to
- ONTAP 9
- FlexCache
Issue
- Customer starts volume move operation on a volume hosting several FlexCache volumes
- Error seen after aborting cutover since it is taking too long for completion:
cluster::*> volume move show -instance
Vserver Name: vserverA
Volume Name: volumeA
Volume Instance UUID: b3a1c763-2446-11eb-8ae3-d039ea00000
Destination Aggregate: aggr1
Destination Node: node05
Detailed Status: Volume move job cleaning up.
Error: Volume move job stopped by user "admin".
Estimated Time of Completion: -
Internal Progress of Move: Volume move job cleaning up.
Actual State of Job: CleanupState
Job ID: 120749
Job UUID: 08d17cb7-0c07-11ee-80ec-d039e00000
Managing Node: node09
Percentage Complete: -
Move Phase: cleaning_up
Prior Issues Encountered: 6/18/2023 18:55:20 : Volume move job stopped by user "admin".
6/18/2023 18:55:20 : Move transfer failed: Device busy
6/18/2023 14:43:41 : Preparing source volume for cutover: Timeout: Operation "srcVolMoveObject_lockdown_iterator::create_imp()" took longer than 200 seconds to complete [from mgwd on node "node09" (VSID: -3) to kernel at 127.0.0.1]
6/18/2023 14:38:24 : Preparing source volume for cutover: Volume quiesce failed because there are outstanding file system requests on the volume (Volume can't be quiesced as it did not drain in time.)
6/18/2023 08:25:16 : Preparing source volume for cutover: The volume is involved in a SnapMirror operation and cannot be moved until the SnapMirror operation is complete. Wait for the SnapMirror operation to finish or abort the SnapMirror operation by issuing a 'snapmirror abort -hard true' command.
Estimated Remaining Duration: -
Replication Throughput: 340KB/s
Duration of Move: 4 days 12:30
Source Aggregate: aggr1
Source Node: node09
Start Time of Move: Thu Jun 15 22:31:24 2023
Move State: warning
- Error generates Vreport:
cluster::*> debug vreport show
volume Differences:
Name Reason Attributes
-------- ------- ---------------------------------------------------
vserverA:volumeA Present in VLDB and WAFL volume busy
Node Name: scc67n09b
Volume DSID:1072 MSID:2147627075
UUID: unknown
Aggregate Name: aggr1
Aggregate UUID: df225b29-ecd2-4168-989e-0d64e0b0fa80
Vserver UUID: 7fdb68ef-0f1d-11eb-b1dd-d039ea1f2ee7
AccessType: DP_READ_ONLY
StorageType: REGULAR
Constituent Role: none
junction Differences:
Name Reason Attributes
-------- ------- ---------------------------------------------------
vserverA:volumeA Child volume not present in WAFL
Parent Info:
VolName: vserverA_rootvol
MSID: 2147627028 DSID: 1025 vsID: 13
Child: (Not Present in WAFL)
VolName: sdg74_ipr_0030RePlAcEDoTzoneonly
MSID: 0 DSID: 1072 vsID: 13
- Attempting to clean up the Vreport results in 'object not found'
- EMS shows
Nblade.JunctionRootLookup2:error
:
Mon Jun 19 23:42:21 -0700 [scc67n09b: nblade2: Nblade.JunctionRootLookup2:error]: Junction root lookup of a volume in Vserver 13 with MSID 2147627075 has failed for reason "SPINNP(264)".
- All hosted Flexcache volumes become inaccessible from clients after aborting forced cutover