SP HBT STOPPED on A900 / FAS9500 - Unresponsive Node
Applies to
- ONTAP 9
- AFF A900
- FAS9500
- BMC 16.3, 16.4, 16.5, and 16.6
Issue
- Node down due to a SP heartbeat missed/stopped:
Sat Jan 27 07:42:58 -0800 [netapp-n01: spmgrd: sp.heartbeat.stopped:error]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 600 seconds.
Sat Jan 27 07:55:13 -0800 [netapp-n01: spmgrd: callhome.sp.hbt.missed:notice]: Call home for SP HBT MISSED
Sat Jan 27 08:05:27 -0800 [netapp-n01: spmgrd: callhome.sp.hbt.stopped:alert]: Call home for SP HBT STOPPED
Sat Jan 27 08:08:32 -0800 [netapp-n01: env_mgr: sp.ipmi.lost.shutdown:EMERGENCY]: SP heartbeat stopped and cannot be recovered. To prevent hardware damage and data loss, the system will shut down in 10 minutes.
Sat Jan 27 08:18:32 -0800 [netapp-n01: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (System reboot to recover the SP)
- In down node's collected logs error messages can be observed in SP-SYSTEM-EVENTS-LOG for multiple sensors at "Fault" during BMC attempts to recover:
Record 2351: Mon Feb 23 18:01:05.394956 2026 [IPMI.notice]: 0630 | 02 | EVT: 6fc824ff | System_Watchdog | Assertion Event, "Timer interrupt"Record 2352: Mon Feb 23 18:01:05.824853 2026 [IPMI Event.critical]: NMIRecord 2353: Mon Feb 23 18:01:05.825968 2026 [IPMI.notice]: 0631 | 02 | EVT: 6f00ffff | CriticalInt | Assertion Event, "NMI/Diag Interrupt"Record 2354: Mon Feb 23 18:12:33.848642 2026 [BMC.critical]: Heartbeat stoppedRecord 2355: Mon Feb 23 18:12:55.865260 2026 [ASUP.notice]: First notification email | (HEARTBEAT_LOSS) WARNING | Send failedRecord 2356: Mon Feb 23 18:27:33.916604 2026 [ASUP.notice]: Reminder email | (HEARTBEAT_LOSS) WARNING | Send failedRecord 2357: Mon Feb 23 17:58:14.000000 2026 [Controller.notice]: Appliance user command panic.Record 2358: Mon Feb 23 18:35:59.697651 2026 [IPMI Event.critical]: NMIRecord 2359: Mon Feb 23 18:35:59.258289 2026 [IPMI.notice]: 0632 | 02 | EVT: 6fc824ff | System_Watchdog | Assertion Event, "Timer interrupt"Record 2360: Mon Feb 23 18:35:59.698718 2026 [IPMI.notice]: 0633 | 02 | EVT: 6f00ffff | CriticalInt | Assertion Event, "NMI/Diag Interrupt"Record 2361: Mon Feb 23 18:36:03.781190 2026 [BMC.critical]: Filer RebootsRecord 2362: Mon Feb 23 18:36:15.843657 2026 [IPMI.notice]: 0634 | 02 | EVT: 6fc21fff | System_FW_Status | Assertion Event, "System Firmware restarting"Record 2363: Mon Feb 23 18:36:21.850964 2026 [IPMI.notice]: 0635 | 02 | EVT: 6fc201ff | System_FW_Status | Assertion Event, "Memory initialization in progress"Record 2364: Mon Feb 23 18:36:37.865775 2026 [IPMI.notice]: 0636 | 02 | EVT: 6fc203ff | System_FW_Status | Assertion Event, "Memory Initialization done"Record 2365: Mon Feb 23 18:36:39.437815 2026 [BMC.notice]: PCIE Config Info received from BIOSRecord 2366: Mon Feb 23 18:36:39.444435 2026 [BMC.notice]: ScratchPad Config Info received from BIOSRecord 2367: Mon Feb 23 18:33:58.000000 2026 [SysFW.notice]: BIOS Version: 18.9Record 2368: Mon Feb 23 18:36:51.310384 2026 [IPMI.notice]: (PUA) Enable power to all PCIe slotsRecord 2369: Mon Feb 23 18:36:51.383530 2026 [IPMI.notice]: (PUA) Enable power to all PCIe on board deviceRecord 2370: Mon Feb 23 18:36:51.448605 2026 [IPMI.notice]: (PUA) P_stat :slots=0x0,onboard_devs=0x0,finalRecord 2371: Mon Feb 23 18:36:51.448658 2026 [IPMI.notice]: (PUA) Power status of all PCIe slots unchangedRecord 2372: Mon Feb 23 18:38:50.001792 2026 [IPMI.notice]: 0637 | 02 | EVT: 6f02ffff | PCM_Status | Assertion Event, "Fault"Record 2373: Mon Feb 23 18:38:52.002939 2026 [IPMI.notice]: 0638 | 02 | EVT: 6fc220ff | System_FW_Status | Assertion Event, "Bootloader is running"Record 2374: Mon Feb 23 18:39:04.015211 2026 [IPMI.notice]: 0639 | 02 | EVT: ef02ffff | PCM_Status | Deassertion Event, "Fault"Record 2375: Mon Feb 23 18:39:06.019958 2026 [IPMI.notice]: 063a | 02 | EVT: 6fc22fff | System_FW_Status | Assertion Event, "Storage OS Running"Record 2391: Mon Feb 23 19:09:34.048152 2026 [IPMI Event.critical]: System power cycleRecord 2392: Mon Feb 23 19:09:34.680415 2026 [IPMI.notice]: 063b | 02 | EVT: 0150e1ff | Bat_Curr | Assertion Event, "Lower Non-critical going low " | Reading: -1.550 | Threshold: -0.050Record 2393: Mon Feb 23 19:09:34.694096 2026 [IPMI.notice]: 063c | 02 | EVT: 0152e1fe | Bat_Curr | Assertion Event, "Lower Critical going low " | Reading: -1.550 | Threshold: -0.100Record 2401: Mon Feb 23 19:09:35.792043 2026 [IPMI.notice]: 0644 | 02 | EVT: 6f02ffff | PCM_Status | Assertion Event, "Fault"Record 2402: Mon Feb 23 19:09:35.793305 2026 [IPMI.notice]: 0645 | 02 | EVT: ef03ffff | PCM_Status | Deassertion Event, "Power Good"Record 2403: Mon Feb 23 19:09:36.565127 2026 [IPMI.notice]: 0646 | 02 | EVT: 6f02ffff | Bat_Status | Assertion Event, "Fault"Record 2404: Mon Feb 23 19:09:37.012771 2026 [IPMI.notice]: 0647 | 02 | EVT: 6f02ffff | NVS_Status | Assertion Event, "Fault"Record 2405: Mon Feb 23 19:09:37.023414 2026 [IPMI.notice]: 0648 | 02 | EVT: 6f02ffff | IO2_Status | Assertion Event, "Fault"Record 2406: Mon Feb 23 19:09:37.043489 2026 [IPMI.notice]: 0649 | 02 | EVT: 6f02ffff | IO3_Status | Assertion Event, "Fault"Record 2407: Mon Feb 23 19:09:37.052011 2026 [IPMI.notice]: 064a | 02 | EVT: 6f02ffff | IO4_Status | Assertion Event, "Fault"Record 2408: Mon Feb 23 19:09:37.061193 2026 [IPMI.notice]: 064b | 02 | EVT: 6f02ffff | IO8_Status | Assertion Event, "Fault"Record 2409: Mon Feb 23 19:09:37.077093 2026 [IPMI.notice]: 064c | 02 | EVT: 6f02ffff | IO9_Status | Assertion Event, "Fault"Record 2410: Mon Feb 23 19:09:37.094519 2026 [IPMI.notice]: 064d | 02 | EVT: 6f02ffff | IO10_Status | Assertion Event, "Fault"Record 2421: Mon Feb 23 19:09:40.798052 2026 [IPMI.notice]: 0656 | 02 | EVT: 6f03ffff | PCM_Status | Assertion Event, "Power Good"Record 2422: Mon Feb 23 19:09:41.542885 2026 [BMC.critical]: Filer RebootsRecord 2423: Mon Feb 23 19:09:41.708995 2026 [IPMI.notice]: 0657 | 02 | EVT: 815200fe | Bat_Curr | Deassertion Event, "Lower Critical going low " | Reading: 0.000 | Threshold: -0.100Record 2424: Mon Feb 23 19:09:41.730374 2026 [IPMI.notice]: 0658 | 02 | EVT: 815000ff | Bat_Curr | Deassertion Event, "Lower Non-critical going low " | Reading: 0.000 | Threshold: -0.050Record 2425: Mon Feb 23 19:09:42.018386 2026 [IPMI.notice]: 0659 | 02 | EVT: ef02ffff | NVS_Status | Deassertion Event, "Fault"Record 2426: Mon Feb 23 19:09:42.304005 2026 [IPMI.notice]: 065a | 02 | EVT: ef02ffff | IO2_Status | Deassertion Event, "Fault"Record 2427: Mon Feb 23 19:09:42.316424 2026 [IPMI.notice]: 065b | 02 | EVT: ef02ffff | IO3_Status | Deassertion Event, "Fault"Record 2428: Mon Feb 23 19:09:42.324208 2026 [IPMI.notice]: 065c | 02 | EVT: ef02ffff | IO4_Status | Deassertion Event, "Fault"Record 2429: Mon Feb 23 19:09:42.346029 2026 [IPMI.notice]: 065d | 02 | EVT: ef02ffff | IO8_Status | Deassertion Event, "Fault"Record 2430: Mon Feb 23 19:09:42.359455 2026 [IPMI.notice]: 065e | 02 | EVT: ef02ffff | IO9_Status | Deassertion Event, "Fault"Record 2494: Mon Feb 23 19:57:47.628019 2026 [BMC.critical]: Filer RebootsRecord 2498: Mon Feb 23 19:57:51.852162 2026 [IPMI.notice]: 068c | 02 | EVT: ef02ffff | NVS_Status | Deassertion Event, "Fault"Record 2499: Mon Feb 23 19:57:52.119932 2026 [IPMI.notice]: 068d | 02 | EVT: ef02ffff | IO2_Status | Deassertion Event, "Fault"Record 2500: Mon Feb 23 19:57:52.136129 2026 [IPMI.notice]: 068e | 02 | EVT: ef02ffff | IO3_Status | Deassertion Event, "Fault"Record 2501: Mon Feb 23 19:57:52.143927 2026 [IPMI.notice]: 068f | 02 | EVT: ef02ffff | IO4_Status | Deassertion Event, "Fault"Record 2502: Mon Feb 23 19:57:52.153597 2026 [IPMI.notice]: 0690 | 02 | EVT: ef02ffff | IO8_Status | Deassertion Event, "Fault"Record 2503: Mon Feb 23 19:57:52.161655 2026 [IPMI.notice]: 0691 | 02 | EVT: ef02ffff | IO9_Status | Deassertion Event, "Fault"Record 2504: Mon Feb 23 19:57:52.180685 2026 [IPMI.notice]: 0692 | 02 | EVT: ef02ffff | IO10_Status | Deassertion Event, "Fault"Record 2505: Mon Feb 23 19:57:54.641572 2026 [IPMI.notice]: 0693 | 02 | EVT: 6fc21fff | System_FW_Status | Assertion Event, "System Firmware restarting"Record 2506: Mon Feb 23 19:57:58.632766 2026 [IPMI.notice]: 0694 | 02 | EVT: 6fc201ff | System_FW_Status | Assertion Event, "Memory initialization in progress"Record 2507: Mon Feb 23 19:58:00.634962 2026 [IPMI.notice]: 0695 | 02 | EVT: 6fc203ff | System_FW_Status | Assertion Event, "Memory Initialization done"Record 2508: Mon Feb 23 19:58:04.652462 2026 [IPMI.notice]: 0696 | 02 | EVT: 6fc21fff | System_FW_Status | Assertion Event, "System Firmware restarting"
- Node does not automatically recover and BMC is inaccessible
