Node reboots while partner is at Waiting for Giveback
Applies to
Issue
- The up-node during a takeover rebooted unexpectedly while the partner node was rebooting.
- Up-node EMS shows disk reservation issues just prior to rebooting:
Wed Aug 13 20:18:00 -0400 [Node1: disk_server_0: disk.reservationConflict.key:debug]: Device 4b.23.21 reported SCSI RESERVATION CONFLICT status. 0 milliseconds ago, the reservation key reported by this device was xxxxxxxxxxxxxxxx.
Wed Aug 13 20:18:00 -0400 [Node1: disk_server_1: disk.reservationConflict.key:debug]: Device 4b.23.22 reported SCSI RESERVATION CONFLICT status. 0 milliseconds ago, the reservation key reported by this device was xxxxxxxxxxxxxxxx.
Wed Aug 13 20:18:00 -0400 [Node1: disk_server_1: ha.resvConflictHalt:notice]: A disk reservation conflict was detected on disk 4b.23.22 at %-20s. Typically, this only occurs when the node was taken over by its partner.
- EMS prints message for SK reboot:
Wed Aug 13 20:18:00 -0400 [Node1: disk_server_1: kern.shutdown.initiator:debug]: SK reboot was initiated by "maytag.ko::fm_handleReserved+763".
- Up-node console shows a halt for reservations taken by partner:
HALT: HA partner has taken over disk reservations
- Console logs show the down node got to WFG, then Ctrl + C was hit, and the node continued booting, taking disk reservations and breaking the failover state.
Waiting for giveback...(Press Ctrl-C to abort wait)
This node was previously declared dead.
Pausing to check HA partner status ...
partner is operational and in takeover mode.
You must initiate a giveback or shutdown on the HA
partner in order to bring this node online.
The HA partner is currently operational and in takeover mode.This node cannot continue unless you initiate a giveback on the partner.
Once this is done this node will reboot automatically.
waiting for giveback...
Do you wish to halt this node rather than wait [y/n]? n <<<<<<<<<
The HA partner appears to be either not operational or not in takeover
mode. You will be asked whether you want to continue. If you answer "yes", the
existing failover monitor disk state will be overwritten and this node will be
rebooted. Answering "no" will halt this node with no modification to the failover
monitor disk state.
WARNING: Answering "yes" while the HA partner is operational and in
takeover mode will have unexpected and potentially catastrophic results:
YOUR FILESYSTEMS MAY BE DESTROYED
Do you wish to continue [y/n]? y <<<<<<<
[Node2:cf.fm.overwriteState:notice]: System continuing after overwriting failover monitor state! <<<<<<<