Skip to main content
NetApp Knowledge Base

Volume move forced cutover caused interruption of hosted Flexcache volumes

Views:
407
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
CORE
Last Updated:
9/16/2024, 1:52:52 PM

Applies to

  • ONTAP 9
  • FlexCache

Issue

  • Customer starts volume move operation on a volume hosting several FlexCache volumes
  • Error seen after aborting cutover since it is taking too long for completion:

cluster::*> volume move show -instance

Vserver Name: vserverA
                             Volume Name: volumeA
                    Volume Instance UUID: b3a1c763-2446-11eb-8ae3-d039ea00000
                 
                   Destination Aggregate: aggr1
                        Destination Node: node05
                         Detailed Status: Volume move job cleaning up.
                                   Error: Volume move job stopped by user "admin".
            Estimated Time of Completion: -
               Internal Progress of Move: Volume move job cleaning up.
                     Actual State of Job: CleanupState
                                  Job ID: 120749
                                Job UUID: 08d17cb7-0c07-11ee-80ec-d039e00000
                           Managing Node: node09
                     Percentage Complete: -
                              Move Phase: cleaning_up
                Prior Issues Encountered: 6/18/2023 18:55:20 : Volume move job stopped by user "admin".
6/18/2023 18:55:20 : Move transfer failed: Device busy
6/18/2023 14:43:41 : Preparing source volume for cutover: Timeout: Operation "srcVolMoveObject_lockdown_iterator::create_imp()" took longer than 200 seconds to complete [from mgwd on node "node09" (VSID: -3) to kernel at 127.0.0.1]
6/18/2023 14:38:24 : Preparing source volume for cutover: Volume quiesce failed because there are outstanding file system requests on the volume (Volume can't be quiesced as it did not drain in time.)
6/18/2023 08:25:16 : Preparing source volume for cutover: The volume is involved in a SnapMirror operation and cannot be moved until the SnapMirror operation is complete. Wait for the SnapMirror operation to finish or abort the SnapMirror operation by issuing a 'snapmirror abort -hard true' command.
            Estimated Remaining Duration: -
                  Replication Throughput: 340KB/s
                        Duration of Move: 4 days 12:30
                        Source Aggregate: aggr1
                             Source Node: node09
                      Start Time of Move: Thu Jun 15 22:31:24 2023
                              Move State: warning

  • Error generates Vreport:

cluster::*> debug vreport show
volume Differences:

Name             Reason   Attributes
--------         -------  ---------------------------------------------------
vserverA:volumeA      Present in VLDB and WAFL volume busy
                          Node Name: scc67n09b
                          Volume DSID:1072 MSID:2147627075
                          UUID: unknown
                          Aggregate Name: aggr1
                          Aggregate UUID: df225b29-ecd2-4168-989e-0d64e0b0fa80
                          Vserver UUID: 7fdb68ef-0f1d-11eb-b1dd-d039ea1f2ee7
                          AccessType: DP_READ_ONLY
                          StorageType: REGULAR
                          Constituent Role: none

junction Differences:
Name             Reason   Attributes
--------         -------  ---------------------------------------------------
vserverA:volumeA Child volume not present in WAFL
                          Parent Info:
                          VolName: vserverA_rootvol
                          MSID: 2147627028 DSID: 1025 vsID: 13
                          Child: (Not Present in WAFL)
                          VolName: sdg74_ipr_0030RePlAcEDoTzoneonly
                          MSID: 0 DSID: 1072 vsID: 13

  • Attempting to clean up the Vreport results in 'object not found'
  • EMS shows Nblade.JunctionRootLookup2:error:

Mon Jun 19 23:42:21 -0700 [scc67n09b: nblade2: Nblade.JunctionRootLookup2:error]: Junction root lookup of a volume in Vserver 13 with MSID 2147627075 has failed for reason "SPINNP(264)".

  • All hosted Flexcache volumes become inaccessible from clients after aborting forced cutover 

 

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.