CFBMC-1904: On AFF A400, FAS8300, and FAS8700 systems, ONTAP reboots to recover the BMC after sp.heartbeat.stopped events
Issue
On NetApp AFF A400, FAS8300, and FAS8700 systems, ONTAP might lose communication with the baseboard management controller (BMC) and report "sp.heartbeat.stopped" events and related AutoSupport messages.
The following are examples of the events reported by the system:
21:45:49 +0100 [cluster-01: spmgrd: sp.heartbeat.stopped:error]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 600 seconds.
21:57:32 +0100 [cluster-01: spmgrd: sp.heartbeat.stopped:error]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 600 seconds.
21:57:32 +0100 [cluster-01: spmgrd: callhome.sp.hbt.missed:notice]: Call home for SP HBT MISSED
22:09:09 +0100 [cluster-01: spmgrd: callhome.sp.hbt.stopped:alert]: Call home for SP HBT STOPPED
22:12:16 +0100 [cluster-01: env_mgr: sp.ipmi.lost.shutdown:EMERGENCY]: SP heartbeat stopped and cannot be recovered. To prevent hardware damage and data loss, the system will shut down in 10 minutes.
22:22:16 +0100 [cluster-01: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (System reboot to recover the BMC)