Root aggregate WAFL inconsistent after waiting for giveback canceled

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 550

Visibility:: Public

Votes:: 0

Category:: ontap-9

Specialty:: core

Last Updated:

Applies to

ONTAP 9

Issue

During maintenance involving failover, node boots to waiting for giveback prompt where Ctrl+C is entered followed by "no" and "yes" to the prompts:

Waiting for giveback...(Press Ctrl-C to abort wait)

This node was previously declared dead.

Pausing to check HA partner status ...

partner is operational and in takeover mode.

You must initiate a giveback or shutdown on the HA

partner in order to bring this node online.

The HA partner is currently operational and in takeover mode.This node cannot continue unless you initiate a giveback on the partner.

Once this is done this node will reboot automatically.

waiting for giveback...

Do you wish to halt this node rather than wait [y/n]? n

The HA partner appears to be either not operational or not in takeover

mode. You will be asked whether you want to continue. If you answer "yes", the

existing failover monitor disk state will be overwritten and this node will be

rebooted. Answering "no" will halt this node with no modification to the failover

monitor disk state.

WARNING: Answering "yes" while the HA partner is operational and in

takeover mode will have unexpected and potentially catastrophic results:

YOUR FILESYSTEMS MAY BE DESTROYED

Do you wish to continue [y/n]? y

Oct 01 12:07:31 [cluster-02:cf.fm.overwriteState:notice]: System continuing after overwriting failover monitor state!

The taken over node will reboot and potentially panic:

Warning: previous shutdown was dirty, there is a possible loss of data.

Oct 01 12:11:04 [cluster-02:wafl.root.content.changed:error]: Contents of the root volume '' might have changed. Verify that all recent configuration changes are still in effect.

PANIC : NVRAM contents are invalid...

After panic, node reboots back to ONTAP login prompt but repeatedly halts:

SP-login: login: HALT: HA partner has taken over (ic) on Sun Oct 1 12:35:34 CDT 2023

Later, the up node panics due to WAFL metadata inconsistency in the taken over node's root volume:

Sun Oct 01 13:27:50 -0500 [cluster-02: wafl_exempt17: sk.panic:alert]: Panic String: Unrecoverable metadata block (file xxxx, block xxxxxxx, fbn xxxxxxx, level 1, file type 16) in aggregate partner:cluster02_root. WAFL inconsistent. Contact NetApp technical support.

The taken over node, previously halting if booted, now panics instead on boot attempts:

PANIC : Msg execution failed during replay, vol=vol0, msg=0xfffff70067600100, type=WAFL_WRITE, errno=192, replay_idx=1, coalesced=0 coalesced_cnt=63