CHW-3386: AFF C30 down after SP HBT MISSED AND STOPPED with monitor.shutdown.emergency
Issue
- ONTAP events for the impacted node:
HA Group Notification (SP HBT MISSED) NOTICE
HA Group Notification (SP HBT STOPPED) ALERT
- Node console messages:
BIOS Version: 21.1.2
...
Waiting for BMC ...
BMC failure. Resetting BMC from primary FW. This can take a few minutes
Waiting for BMC ...
BMC failure. Resetting BMC from backup FW. This can take a few minutes
Waiting for BMC ...
Failed to recover BMC
IPMI PCI Slot Control failed.
Configuring Devices ...
IPMI:Get controller FRU inventory:failed
IPMI:Get midplane FRU 0 inventory:failed
IPMI:Get Management Card FRU inventory:failed
...
BIOS POST Failure(s) detected: BMC IPMI failure. Abort AUTOBOOT
*** command status = Error(-1)
- BMC status from node LOADER:
LOADER-A> bmc status
Firmware Version: UnknownIPv4 Settings
Failed to get IPv4/IPv6 addressing enables: Unknown IPMI completion code.
Failed to get DHCP setting: Unknown IPMI completion code.
MAC Address: N/A - Error
Using DHCP: Failed to get DHCP setting: Unknown IPMI completion code.
N/A - Error
IP Address: Failed to get address: Unknown IPMI completion code.
N/A - Error
Gateway: Failed to get gateway: Unknown IPMI completion code.
N/A - Error
Netmask: Failed to get netmask: Unknown IPMI completion code.
N/A - Error
*** command status = Error(-1)
- BMC offline when the node is running:
::> sp show -instance
Node: node_name
Type of Device: BMC
Status: offline
Is Network Configured: true
Public IP Address: 192.168.0.1
MAC Address: aa:bb:cc:dd:ee:ff
Firmware Version: 19.1P1
Part Number: NA
Serial Number: NA
Device Revision: 19.1P1
Is Firmware Autoupdate Enabled: true
...