Unexpected Controller Takeover Due to SP Heartbeat Stopped BMC 15.13
Applies to
- AFF A250, C250 or FAS500
- BMC Firmware version 15.13
Issue
A controller experienced an unexpected automatic takeover.
The event was triggered by the Service Processor (SP) heartbeat stopping on one node, leading to a forced reboot to recover the BMC.
EMS log:Fri Jan 02 23:37:50 +0800 [Node2: spmgrd: callhome.sp.hbt.stopped:alert]: Call home for SP HBT STOPPEDFri Jan 02 23:40:08 +0800 [Node2: env_mgr: sp.ipmi.lost.shutdown:EMERGENCY]: SP heartbeat stopped and cannot be recovered. To prevent hardware damage and data loss, the system will shut down in 10 minutes.Fri Jan 02 23:50:08 +0800 [Node2: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (System reboot to recover the BMC)Fri Jan 02 23:50:28 +0800 [Node2: shutdown_thread0: ha.localNodeShutDown:notice]: Shutdown of the local node has been initiated with inhibit_takeover set to FALSE.system log sel :
9e8 | 01/02/2026 | 15:55:26 | System Event #0xff | Timestamp Clock Sync | Asserted 9e9 | 01/02/2026 | 15:55:26 | System Event | Timestamp Clock Sync | Asserted 9ea | 01/02/2026 | 15:55:26 | Battery #0x4a | State Deasserted 9eb | 01/02/2026 | 15:55:26 | Battery #0x4b | State Asserted 9ec | 01/02/2026 | 15:55:26 | Battery #0x4c | State Asserted 9ed | 01/02/2026 | 15:55:26 | Battery #0x4d | State Deasserted 9ee | 01/02/2026 | 15:55:26 | Other FRU #0x50 | 9ef | 01/02/2026 | 15:55:26 | Other FRU #0x50 | 9f0 | 01/02/2026 | 15:55:26 | Other FRU #0x50 | 9f1 | 01/02/2026 | 15:55:26 | Other FRU #0x50 | 9f2 | 01/02/2026 | 15:55:43 | Battery #0x4a | State Deasserted 9f3 | 01/02/2026 | 15:55:43 | Battery #0x4b | State Asserted 9f4 | 01/02/2026 | 15:55:43 | Battery #0x4c | State Asserted 9f5 | 01/02/2026 | 15:55:43 | Battery #0x4d | State Deasserted 9f6 | 01/02/2026 | 15:55:43 | Battery #0x4f | State Deasserted 9f7 | 01/02/2026 | 15:55:43 | Other FRU #0x50 | 9f8 | 01/02/2026 | 15:55:43 | Other FRU #0x50 | 9f9 | 01/02/2026 | 15:55:43 | Other FRU #0x50 | 9fa | 01/02/2026 | 15:55:43 | Other FRU #0x50 | 9fb | 01/02/2026 | 15:55:44 | Power Supply #0x20 | Presence detected | Asserted 9fc | 01/02/2026 | 15:55:44 | Power Supply #0x25 | Presence detected | Asserted 9fd | 01/02/2026 | 15:55:44 | Power Supply #0x72 | Presence detected | Asserted 9fe | 01/02/2026 | 15:55:44 | Power Supply #0x73 | Presence detected | Asserted 9ff | 01/02/2026 | 15:55:45 | OEM record df | FPGA pull BMC whole reset a00 | 01/02/2026 | 15:55:46 | OEM record df | Pilot FPGA AC cycle a01 | 01/02/2026 | 15:55:51 | OEM record c0 | 000000 | 000105000000 a02 | 01/02/2026 | 15:55:59 | Critical Interrupt #0x31 | Bus Correctable error | Asserted a03 | 01/02/2026 | 15:55:59 | Critical Interrupt #0x31 | Bus Correctable error | Asserted a04 | 01/02/2026 | 15:55:59 | Critical Interrupt #0x31 | Bus Correctable error | Asserted a05 | 01/02/2026 | 15:55:59 | Critical Interrupt #0x31 | Bus Correctable error | Asserted