Both nodes offline unable to boot after shutting down from 1343620
Applies to
- ONTAP 9
Issue
- During BMC update, controllers become unresponsive and reboot as per symptoms of bug 1343620
- Prior to panic, Active IQ AutoSupport alerts include:
HA Group Notification (SP HBT MISSED) NOTICE
HA Group Notification (SP HBT STOPPED) ALERT
- Both nodes report the following shutdown event in EMS:
[cluster-01: env_mgr: sp.ipmi.lost.shutdown:EMERGENCY]: SP heartbeat stopped and cannot be recovered. To prevent hardware damage and data loss, the system will shut down in 10 minutes.
- Logging into BMC system console or serial console connection indicates one node is stuck in boot loop:
PANIC: NVRAM contents are invalid...
- Partner node panicked during its shutdown and cannot boot past
Waiting for reservations to clear
since partner had taken over and then shut down itself - Partner panic string:
Shutdown taking longer than 930 seconds in process nodewatchdog on release 9.10.1P4 (C)