CFBMC-8277: BMC "heartbeat stopped" event causes node shutdown with BMC 13.12 and later
Issue
- On AFF A400, AFF C400, ASA A400, ASA C400, FAS8700 or FAS87300 systems, ONTAP might detect the loss of BMC heartbeat and initiate a system shutdown or emergency recovery to prevent hardware damage or data loss.
- Example events indicating the issue include:
[node-01: spmgrd: sp.heartbeat.stopped:info]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 600 seconds.[node-01: spmgrd: sp.heartbeat.stopped:info]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 600 seconds.[node-01: spmgrd: callhome.sp.hbt.missed:notice]: Call home for SP HBT MISSED[node-01: spmgrd: callhome.sp.hbt.stopped:alert]: Call home for SP HBT STOPPED[node-01: env_mgr: sp.ipmi.lost.shutdown:EMERGENCY]: SP heartbeat stopped and cannot be recovered. To prevent hardware damage and data loss, the system will shut down in 10 minutes.[node-01: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (System reboot to recover the BMC)