CFBMC-3569: BMC heartbeat stopped causes ONTAP to reboot but BMC stays unresponsive
Issue
When the BMC becomes unresponsive, ONTAP attempts to reboot the BMC to recover it.
-The following are example events indicating the issue:
[Sat Jan 27 07:42:58 -0800 [netapp-01: spmgrd: sp.heartbeat.stopped:error]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 600 seconds.]
[Sat Jan 27 07:55:13 -0800 [netapp-01: spmgrd: callhome.sp.hbt.missed:notice]: Call home for SP HBT MISSED]
[Sat Jan 27 08:05:27 -0800 [netapp-01: spmgrd: callhome.sp.hbt.stopped:alert]: Call home for SP HBT STOPPED]
[Sat Jan 27 08:08:32 -0800 [netapp-01: env_mgr: sp.ipmi.lost.shutdown:EMERGENCY]: SP heartbeat stopped and cannot be recovered. To prevent hardware damage and data loss, the system will shut down in 10 minutes.]
Sat Jan 27 08:18:32 -0800 [netapp-01: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (System reboot to recover the SP)]
-The operation hangs, and the BMC is inaccessible.