System fails to boot with "Resetting SP from primary FW" or "SP IPMI failure"
- Views:
- 9,601
- Visibility:
- Public
- Votes:
- 4
- Category:
- fas-systems
- Specialty:
- hw
- Last Updated:
- 1/5/2025, 12:34:33 PM
Applies to
- FAS2620, FAS2650
- FAS2720, FAS2750
- AFF C190
- AFF A150, A220
- AFF A300 / FAS8200
- AFF A900
Issue
- Node goes down with EMS logs showing SP HBT MISSED or SP HBT STOPPED.
[nodename: monitor: monitor.globalStatus.critical:EMERGENCY]: Multiple fans has failed: Sysfan1 F1, Sysfan1 F2, Sysfan2 F1, Sysfan2 F2. Power Supply Status Critical: PSU1.
[nodename: monitor: monitor.globalStatus.critical:EMERGENCY]: Power Supply Status Critical: PSU1.
[nodename: cphmd: hm.alert.cleared:notice]: Alert Id = CriticalFruMultiFaultAlert , Alerting Resource = XXXXXXXXXXXX cleared by monitor chassis
[Nodename: spsm_listener: callhome.sp.hbt.missed:notice]: Call home for SP HBT MISSED
[Nodename: spsm_listener: callhome.sp.hbt.stopped:alert]: Call home for SP HBT STOPPED
- Or Service Processor (SP) reports
SP load is high
error in the console log and a node goes down.
[SP.notice]: SP load is high: 3.12 2.59 2.02
[SP.notice]: SP load is high: 3.54 2.90 2.21
[IPMI.notice]: e601 | 02 | EVT: 0301ffff | Attn_Sensor1 | Assertion Event, "State Asserted"
[SP.emergency]: SP reset initiated by storage controller
[IPMI.notice]: e701 | c0 | OEM: ffff70005000 | ManufId: 150300 | SP Reset Externally
[IPMI.notice]: e801 | c0 | OEM: fcff70000000 | ManufId: 150300 | POS Register: Unexpected Reset
- Node boot fails from SP:
Warning: Unable to list entries on node-01. RPC: Couldn't make connection [from mgwd on
node "Node-02" (VSID: -1) to mgwd at xxx.xxx.xxx.xxx]
Error: command failed: RPC: Couldn't make connection [from mgwd on node "Node-02" (VSID: -1) to
mgwd at xxx.xxx.xxx.xxx]
- Node boot fails from Loader, Error:
LOADER-A> boot_ontap
Loading X86_64/freebsd・・・
Loading X86_64/freebsd・・・
Starting program at ・・・
NetApp Data ONTAP 9.3P4
***************************************
This platform is not supported in this release.
The system will now halt
***************************************
BIOS Version: 11.1
Portions Copyright (C) 2014-2017 NetApp, Inc. All Rights Reserved.
Initializing System Memory ...
Loading Device Drivers ...
Waiting for SP ...
SP failure. Resetting SP from primary FW. This can take a few minutes
-OR-
Failed to recover SP
IPMI:Get controller FRU inventory:failed
IPMI:Get midplane FRU 0 inventory:failed
Configuring Devices ...
IPMI PCI Slot Control failed.
CPU = 1 Processor(s) Detected.
Intel(R) Xeon(R) CPU D-1587 @ 1.70GHz (CPU 0)
CPUID: 0x00050664. Cores per Processor = 16
131072 MB System RAM Installed.
SATA (AHCI) Device: SV9MST6D120GLM41NP
Boot Loader version 6.0.10
Copyright (C) 2000-2003 Broadcom Corporation.
Portions Copyright (C) 2002-2020 NetApp, Inc. All Rights Reserved.
BIOS POST Failure(s) detected: SP IPMI failure. Abort AUTOBOOT
- Down controller fails to boot with same error even post MotherBoard replacement
- SP
events all
messages:
Record 231: Sun Aug 1 00:25:04 2021 [SysFW.notice]: Failed to recover SP
Record 232: Sun Aug 1 00:25:04 2021 [SysFW.critical]: IPMI:Get controller FRU inventory:failed
Record 233: Sun Aug 1 00:25:04 2021 [SysFW.notice]: IPMI:Get midplane FRU 0 inventory:failed
Record 234: Thu Jan 1 00:05:00 1970 [Trap Event.critical]: hwassist post_error (26)
- SP
events all
log on the partner node messages:
Sat Oct 15 13:05:38 2016 [Agent.notice]: Local Serial Exchange Error Internal MLER[4] asserted
Mon Oct 17 08:38:52 2016 [Agent.notice]: Local Invalid Serial Exchange Bus Internal MLER[5] asserted
Thu Jan 01 00:00:36 1970 [Agent.notice]: Midplane I2C Local Buffers Not Ready Internal MLER[6] de-asserted
Mon Oct 17 08:52:11 2016 [Agent.notice]: Midplane Local Grant Timeout Internal MLER[2] asserted