Giveback/Takeover of a Fabric Pool aggregate fails, but fabric pool destination is pingable from home node

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 1,637

Visibility:: Public

Votes:: 0

Category:: ontap-9

Specialty:: ontapselect

Last Updated:

Applies to

ONTAP 9
ONTAP Select
FabricPool
Cloud Volumes ONTAP(CVO)

Issue

Attempt to perform SFO giveback of a Fabric Pool aggregate fails with the below errors:

[Node-01: cf_giveback: gb.netra.ca.check.failed:error]: Giveback of aggregate 'aggr1' (uuid: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX) failed due to Object store is not reachable on destination preventing object store access on the destination node. [Node-01: cf_giveback: sfo.sendhome.subsystemAbort:alert]: The giveback operation of 'aggr1' was aborted by 'fabric pools'. [Node-01: cf_giveback: sfo.giveback.aggrProcessTime:debug]: Time taken to destination check during giveback of the aggregate 'aggr1' was 44 milliseconds. [Node-01: cf_giveback: sfo.giveback.failed:alert]: Giveback of aggregate aggr1 failed due to destination check failed. [Node-01: cf_giveback: ha.giveback.sysAbort:debug]: Subsystem wafl took 12 msecs to abort giveback of aggregate 'aggr1'. [Node-01: cf_giveback: ha.giveback.sysAbort:debug]: Subsystem lock_manager_NDO took 12 msecs to abort giveback of aggregate 'aggr1'. [Node-01: cf_giveback: ha.giveback.sysAbort:debug]: Subsystem coredump took 12 msecs to abort giveback of aggregate 'aggr1'. [Node-01: cf_giveback: ha.giveback.totalAbort:debug]: Total time taken to abort the giveback of aggregate 'aggr1' was 36 msecs.

Attempting to takeover a node fails with the below error:

[node1: sfo_arl_worker: sfo.tkAbort.ca.check.failed:error]: Planned SFO takeover of aggregate 'aggr1' (uuid: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX) failed due to Object store is not reachable on destination.

cluster1::> storage failover show-takeover Node Node Status Aggregate Takeover Status ---------- --------------------- -------------- ------------------------------- node1 Optimized takeover by partner aborted. aggr1 Previous takeover attempt was aborted because of a precheck failure. Failing module(s): "fabric pools". Use the "event log show -message-name sfo.tkAbort*" command to get more information, and follow the provided corrective actions.To abort the takeover, you can giveback the aggregates that were relocated as part of the attempted takeover using the "storage failover giveback -ofnode node1" command.If you want to continue the takeover, address the failures and then use the "storage failover takeover -ofnode node1" command.If you want to continue with the takeover without addressing the failures you can do so by using the "storage failover takeover -ofnode node1 -bypass-optimization true" command. Warning: Setting the "-bypass-optimization" parameter to true might result in a longer client outage during planned takeover. CFO aggregates Not attempted yet.

During ONTAP upgrade, ANDU will be paused because of the timeout while performing giveback:

[Node-02: notifyd: callhome.andu.pausederr:alert]: params: {'subject': 'AUTOMATED NDU PAUSED ON NODE: Node-01', 'epoch': 'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX'}

Object stores go to an unavailable state.
Fabric Pool destination is reachable from Intercluster LIFs on both nodes in the HA pair via ICMP.