Skip to main content
NetApp Knowledge Base

SP HBT STOPPED on A900 / FAS9500 - Unresponsive Node

Views:
883
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
hw
Last Updated:

Applies to

  • ONTAP 9
  • AFF A900
  • FAS9500
  • BMC 16.3, 16.4, 16.5, and 16.6

Issue

  • Node down due to a SP heartbeat missed/stopped:

Sat Jan 27 07:42:58 -0800 [netapp-n01: spmgrd: sp.heartbeat.stopped:error]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 600 seconds.
Sat Jan 27 07:55:13 -0800 [netapp-n01: spmgrd: callhome.sp.hbt.missed:notice]: Call home for SP HBT MISSED
Sat Jan 27 08:05:27 -0800 [netapp-n01: spmgrd: callhome.sp.hbt.stopped:alert]: Call home for SP HBT STOPPED
Sat Jan 27 08:08:32 -0800 [netapp-n01: env_mgr: sp.ipmi.lost.shutdown:EMERGENCY]: SP heartbeat stopped and cannot be recovered. To prevent hardware damage and data loss, the system will shut down in 10 minutes.
Sat Jan 27 08:18:32 -0800 [netapp-n01: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (System reboot to recover the SP)

  • In down node's collected logs error messages can be observed in SP-SYSTEM-EVENTS-LOG for multiple sensors at "Fault" during BMC attempts to recover:

Record 2351: Mon Feb 23 18:01:05.394956 2026 [IPMI.notice]: 0630 | 02 | EVT: 6fc824ff | System_Watchdog | Assertion Event, "Timer interrupt"
Record 2352: Mon Feb 23 18:01:05.824853 2026 [IPMI Event.critical]: NMI
Record 2353: Mon Feb 23 18:01:05.825968 2026 [IPMI.notice]: 0631 | 02 | EVT: 6f00ffff | CriticalInt | Assertion Event, "NMI/Diag Interrupt"
Record 2354: Mon Feb 23 18:12:33.848642 2026 [BMC.critical]: Heartbeat stopped
Record 2355: Mon Feb 23 18:12:55.865260 2026 [ASUP.notice]: First notification email | (HEARTBEAT_LOSS) WARNING | Send failed
Record 2356: Mon Feb 23 18:27:33.916604 2026 [ASUP.notice]: Reminder email | (HEARTBEAT_LOSS) WARNING | Send failed
Record 2357: Mon Feb 23 17:58:14.000000 2026 [Controller.notice]: Appliance user command panic.
Record 2358: Mon Feb 23 18:35:59.697651 2026 [IPMI Event.critical]: NMI
Record 2359: Mon Feb 23 18:35:59.258289 2026 [IPMI.notice]: 0632 | 02 | EVT: 6fc824ff | System_Watchdog | Assertion Event, "Timer interrupt"
Record 2360: Mon Feb 23 18:35:59.698718 2026 [IPMI.notice]: 0633 | 02 | EVT: 6f00ffff | CriticalInt | Assertion Event, "NMI/Diag Interrupt"
Record 2361: Mon Feb 23 18:36:03.781190 2026 [BMC.critical]: Filer Reboots
Record 2362: Mon Feb 23 18:36:15.843657 2026 [IPMI.notice]: 0634 | 02 | EVT: 6fc21fff | System_FW_Status | Assertion Event, "System Firmware restarting"
Record 2363: Mon Feb 23 18:36:21.850964 2026 [IPMI.notice]: 0635 | 02 | EVT: 6fc201ff | System_FW_Status | Assertion Event, "Memory initialization in progress"
Record 2364: Mon Feb 23 18:36:37.865775 2026 [IPMI.notice]: 0636 | 02 | EVT: 6fc203ff | System_FW_Status | Assertion Event, "Memory Initialization done"
Record 2365: Mon Feb 23 18:36:39.437815 2026 [BMC.notice]: PCIE Config Info received from BIOS
Record 2366: Mon Feb 23 18:36:39.444435 2026 [BMC.notice]: ScratchPad Config Info received from BIOS
Record 2367: Mon Feb 23 18:33:58.000000 2026 [SysFW.notice]: BIOS Version: 18.9
Record 2368: Mon Feb 23 18:36:51.310384 2026 [IPMI.notice]: (PUA) Enable power to all PCIe slots
Record 2369: Mon Feb 23 18:36:51.383530 2026 [IPMI.notice]: (PUA) Enable power to all PCIe on board device
Record 2370: Mon Feb 23 18:36:51.448605 2026 [IPMI.notice]: (PUA) P_stat :slots=0x0,onboard_devs=0x0,final
Record 2371: Mon Feb 23 18:36:51.448658 2026 [IPMI.notice]: (PUA) Power status of all PCIe slots unchanged
Record 2372: Mon Feb 23 18:38:50.001792 2026 [IPMI.notice]: 0637 | 02 | EVT: 6f02ffff | PCM_Status | Assertion Event, "Fault"
Record 2373: Mon Feb 23 18:38:52.002939 2026 [IPMI.notice]: 0638 | 02 | EVT: 6fc220ff | System_FW_Status | Assertion Event, "Bootloader is running"
Record 2374: Mon Feb 23 18:39:04.015211 2026 [IPMI.notice]: 0639 | 02 | EVT: ef02ffff | PCM_Status | Deassertion Event, "Fault"
Record 2375: Mon Feb 23 18:39:06.019958 2026 [IPMI.notice]: 063a | 02 | EVT: 6fc22fff | System_FW_Status | Assertion Event, "Storage OS Running"

Record 2391: Mon Feb 23 19:09:34.048152 2026 [IPMI Event.critical]: System power cycle
Record 2392: Mon Feb 23 19:09:34.680415 2026 [IPMI.notice]: 063b | 02 | EVT: 0150e1ff | Bat_Curr | Assertion Event, "Lower Non-critical going low " | Reading: -1.550 | Threshold: -0.050
Record 2393: Mon Feb 23 19:09:34.694096 2026 [IPMI.notice]: 063c | 02 | EVT: 0152e1fe | Bat_Curr | Assertion Event, "Lower Critical going low " | Reading: -1.550 | Threshold: -0.100

Record 2401: Mon Feb 23 19:09:35.792043 2026 [IPMI.notice]: 0644 | 02 | EVT: 6f02ffff | PCM_Status | Assertion Event, "Fault"
Record 2402: Mon Feb 23 19:09:35.793305 2026 [IPMI.notice]: 0645 | 02 | EVT: ef03ffff | PCM_Status | Deassertion Event, "Power Good"
Record 2403: Mon Feb 23 19:09:36.565127 2026 [IPMI.notice]: 0646 | 02 | EVT: 6f02ffff | Bat_Status | Assertion Event, "Fault"
Record 2404: Mon Feb 23 19:09:37.012771 2026 [IPMI.notice]: 0647 | 02 | EVT: 6f02ffff | NVS_Status | Assertion Event, "Fault"
Record 2405: Mon Feb 23 19:09:37.023414 2026 [IPMI.notice]: 0648 | 02 | EVT: 6f02ffff | IO2_Status | Assertion Event, "Fault"
Record 2406: Mon Feb 23 19:09:37.043489 2026 [IPMI.notice]: 0649 | 02 | EVT: 6f02ffff | IO3_Status | Assertion Event, "Fault"
Record 2407: Mon Feb 23 19:09:37.052011 2026 [IPMI.notice]: 064a | 02 | EVT: 6f02ffff | IO4_Status | Assertion Event, "Fault"
Record 2408: Mon Feb 23 19:09:37.061193 2026 [IPMI.notice]: 064b | 02 | EVT: 6f02ffff | IO8_Status | Assertion Event, "Fault"
Record 2409: Mon Feb 23 19:09:37.077093 2026 [IPMI.notice]: 064c | 02 | EVT: 6f02ffff | IO9_Status | Assertion Event, "Fault"
Record 2410: Mon Feb 23 19:09:37.094519 2026 [IPMI.notice]: 064d | 02 | EVT: 6f02ffff | IO10_Status | Assertion Event, "Fault"

Record 2421: Mon Feb 23 19:09:40.798052 2026 [IPMI.notice]: 0656 | 02 | EVT: 6f03ffff | PCM_Status | Assertion Event, "Power Good"
Record 2422: Mon Feb 23 19:09:41.542885 2026 [BMC.critical]: Filer Reboots
Record 2423: Mon Feb 23 19:09:41.708995 2026 [IPMI.notice]: 0657 | 02 | EVT: 815200fe | Bat_Curr | Deassertion Event, "Lower Critical going low " | Reading: 0.000 | Threshold: -0.100
Record 2424: Mon Feb 23 19:09:41.730374 2026 [IPMI.notice]: 0658 | 02 | EVT: 815000ff | Bat_Curr | Deassertion Event, "Lower Non-critical going low " | Reading: 0.000 | Threshold: -0.050
Record 2425: Mon Feb 23 19:09:42.018386 2026 [IPMI.notice]: 0659 | 02 | EVT: ef02ffff | NVS_Status | Deassertion Event, "Fault"
Record 2426: Mon Feb 23 19:09:42.304005 2026 [IPMI.notice]: 065a | 02 | EVT: ef02ffff | IO2_Status | Deassertion Event, "Fault"
Record 2427: Mon Feb 23 19:09:42.316424 2026 [IPMI.notice]: 065b | 02 | EVT: ef02ffff | IO3_Status | Deassertion Event, "Fault"
Record 2428: Mon Feb 23 19:09:42.324208 2026 [IPMI.notice]: 065c | 02 | EVT: ef02ffff | IO4_Status | Deassertion Event, "Fault"
Record 2429: Mon Feb 23 19:09:42.346029 2026 [IPMI.notice]: 065d | 02 | EVT: ef02ffff | IO8_Status | Deassertion Event, "Fault"
Record 2430: Mon Feb 23 19:09:42.359455 2026 [IPMI.notice]: 065e | 02 | EVT: ef02ffff | IO9_Status | Deassertion Event, "Fault"

Record 2494: Mon Feb 23 19:57:47.628019 2026 [BMC.critical]: Filer Reboots

Record 2498: Mon Feb 23 19:57:51.852162 2026 [IPMI.notice]: 068c | 02 | EVT: ef02ffff | NVS_Status | Deassertion Event, "Fault"
Record 2499: Mon Feb 23 19:57:52.119932 2026 [IPMI.notice]: 068d | 02 | EVT: ef02ffff | IO2_Status | Deassertion Event, "Fault"
Record 2500: Mon Feb 23 19:57:52.136129 2026 [IPMI.notice]: 068e | 02 | EVT: ef02ffff | IO3_Status | Deassertion Event, "Fault"
Record 2501: Mon Feb 23 19:57:52.143927 2026 [IPMI.notice]: 068f | 02 | EVT: ef02ffff | IO4_Status | Deassertion Event, "Fault"
Record 2502: Mon Feb 23 19:57:52.153597 2026 [IPMI.notice]: 0690 | 02 | EVT: ef02ffff | IO8_Status | Deassertion Event, "Fault"
Record 2503: Mon Feb 23 19:57:52.161655 2026 [IPMI.notice]: 0691 | 02 | EVT: ef02ffff | IO9_Status | Deassertion Event, "Fault"
Record 2504: Mon Feb 23 19:57:52.180685 2026 [IPMI.notice]: 0692 | 02 | EVT: ef02ffff | IO10_Status | Deassertion Event, "Fault"
Record 2505: Mon Feb 23 19:57:54.641572 2026 [IPMI.notice]: 0693 | 02 | EVT: 6fc21fff | System_FW_Status | Assertion Event, "System Firmware restarting"
Record 2506: Mon Feb 23 19:57:58.632766 2026 [IPMI.notice]: 0694 | 02 | EVT: 6fc201ff | System_FW_Status | Assertion Event, "Memory initialization in progress"
Record 2507: Mon Feb 23 19:58:00.634962 2026 [IPMI.notice]: 0695 | 02 | EVT: 6fc203ff | System_FW_Status | Assertion Event, "Memory Initialization done"
Record 2508: Mon Feb 23 19:58:04.652462 2026 [IPMI.notice]: 0696 | 02 | EVT: 6fc21fff | System_FW_Status | Assertion Event, "System Firmware restarting"

  • Node does not automatically recover and BMC is inaccessible

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.