Skip to main content
NetApp Knowledge Base

CHW-2351: AFF A90/AFF C80 halts after eSPI Fatal Error

Views:
3
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
HW
Last Updated:

Issue

  • Node goes through unexpected reboot as follows:

eSPI Fatal Error 
HALT!
System recovered from eSPI fatal error
PCH eSPI LnkErr0 = 0x0060FF00
PCH eSPI LnkErr1 = 0x8073FF25
Configuring Devices ...
 
CPU = 2 Processor(s) Detected.
  Intel(R) Xeon(R) Gold 5416S (CPU 0)
  CPUID: 0x000806F8. Cores per Processor = 16
  Intel(R) Xeon(R) Gold 5416S (CPU 1)
  CPUID: 0x000806F8. Cores per Processor = 16
131072 MB System RAM Installed.
NVME Device: 0X331111900329D0SAM000PM9A30001T00025000
 
Boot Loader version 8.2.0IR
Copyright (C) 2000-2003 Broadcom Corporation.
Portions Copyright (C) 2002-2024 NetApp, Inc. All Rights Reserved.
 
ACPI RSDP Found at 0x777fe014

  • The node is able to recover after the failure
  • NMI errors can be observed in BMC logs after the eSPI Fatal Error recovery:

Record 1855: Tue May 28 06:44:34.787314 2024 [BMC CLI.notice]: admin "events all "
Record 1856: Tue May 28 07:16:27.066671 2024 [IPMI.notice]: 0d99 | 02 | EVT: 6fc824ff | System_Watchdog | Assertion Event, "Timer interrupt"
Record 1857: Tue May 28 07:16:27.477566 2024 [IPMI Event.critical]: NMI
Record 1858: Tue May 28 07:16:27.478002 2024 [IPMI.notice]: 0d9a | 02 | EVT: 6f00ffff | CriticalInt | Assertion Event, "NMI/Diag Interrupt"
Record 1859: Tue May 28 07:16:28.309502 2024 [IPMI.notice]: 0d9b | 02 | EVT: 6fc124ff | System_Watchdog | Assertion Event, "Hard reset"
Record 1860: Tue May 28 07:16:28.456602 2024 [IPMI Event.critical]: L2 watchdog timeout hard reset
Record 1861: Tue May 28 07:16:28.487454 2024 [IPMI Event.critical]: System reset
Record 1862: Tue May 28 07:16:28.488306 2024 [IPMI.notice]: 0d9c | 02 | EVT: 0301ffff | SysReset | Assertion Event, "State Asserted"
Record 1863: Tue May 28 07:16:28.492843 2024 [IPMI Event.critical]: L2 watchdog action completed
Record 1864: Tue May 28 07:16:28.492644 2024 [IPMI.notice]: L2 to L1 is 1(s) 10179(us)
Record 1865: Tue May 28 07:16:49.493705 2024 [IPMI.notice]: 0d9d | 02 | EVT: 6f0500ff | Sensor 255 | Assertion Event, "Timestamp Clock Sync"
Record 1866: Tue May 28 07:16:50.000430 2024 [IPMI.notice]: 0d9e | 02 | EVT: 6f0580ff | Sensor 255 | Assertion Event, "Timestamp Clock Sync"
Record 1867: Tue May 28 07:16:50.086813 2024 [IPMI.notice]: (PUA) Enable power to all PCIe slots
Record 1868: Tue May 28 07:16:50.147757 2024 [IPMI.notice]: (PUA) Enable power to all PCIe on board device
Record 1869: Tue May 28 07:16:50.172322 2024 [IPMI.notice]: (PUA) P_stat :slots=0x1,onboard_devs=0x0,final
Record 1870: Tue May 28 07:16:50.172359 2024 [IPMI.notice]: (PUA) Atleast one PCIe slot's power status cha
Record 1871: Tue May 28 07:16:51.000000 2024 [SysFW.notice]: BIOS Version: 20.0IR
Record 1872: Tue May 28 07:16:51.363071 2024 [BMC.notice]: ScratchPad Config Info received from BIOS
Record 1873: Tue May 28 07:16:52.000000 2024 [SysFW.notice]: System recovered from eSPI fatal error
Record 1874: Tue May 28 07:16:52.000000 2024 [SysFW.notice]: PCH eSPI LnkErr0 = 0x0060FF00
Record 1875: Tue May 28 07:16:52.000000 2024 [SysFW.notice]: PCH eSPI LnkErr1 = 0x8073FF25
Record 1876: Tue May 28 07:16:55.779460 2024 [BMC.notice]: Delaying L2_WDOG ASUP email for 120 seconds

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.