AFF A800 node goes down due to watchdog NMI panic
Applies to
- ONTAP 9
- AFF A800, AFF C800, ASA A800, ASA C800
Issue
- Node went down with watchdog nmi
PANIC : watchdog nmi on cpu 34, hang cpu is 2
version: 9.13.1P6: Tue Dec 5 11:06:25 EST 2023
conf : x86_64.optimize
cpuid = 34
KDB: stack backtrace:
vpanic() at vpanic+0x3b2/frame 0xfffffe00c6c43c60
- While booting up, system panics with below panic message:
PANIC : System is going to shutdown because NVMEM subsystem failed initialization or configuration check.
- From the shared SP logs, NVDIMM was in Error Condition and event reported as NVMEM initialization was failed due to SDRAM not in self-refresh mode.
DIMM F1: NVDIMM in ERROR condition (0x00000801).
DIMM M1: NVDIMM in ERROR condition (0x00000801).
Tue Aug 20 04:56:52 2024 [nvdimm.nvmem.initfail:FAULT]:NVMEM subsystem initialization failed because NVDIMM device in slot-11 lost persistence due to SDRAM not in self-refresh mode , andNVDIMM HARDWARE error.. To prevent NVMEM data loss, halt the system.
Tue Aug 20 04:56:52 2024 [nvdimm.nvmem.initfail:FAULT]:NVMEM subsystem initialization failed because NVDIMM device in slot-23 lost persistence due to SDRAM not in self-refresh mode , andNVDIMM HARDWARE error.. To prevent NVMEM data loss, halt the system.
Tue Aug 20 04:56:52 2024 [nvdimm.nvmem.destage.failure:ALERT]:NVMEM subsystem failed to copy nonvolatile data to flash memory on NVDIMM device because SDRAM is not in self-refresh mode, and because of a hardware error..