DIMM F0: NVDIMM pending and destage progressing
Applies to
- AFF-A400
- ONTAP 9
Issue
- One node boot failed with below error.
multiple recursive panics - rebooting.
cpuid = 3
Uptime:[m[2J[HBIOS Version: 16.3
PEI start.
CPU PEI initialization.
Wait BMC self-test result.
BMC self-test: OK.
UPI initialization.
CPU initialization.
Running full memory initialization.
CPU reset.
[m[2J[HBIOS Version: 16.3
PEI start.
CPU PEI initialization.
Wait BMC self-test result.
BMC self-test: OK.
UPI initialization.
CPU initialization.
Running full memory initialization.
DIMM F0: NVDIMM pending and destage progressing.....
-----------------> Slot 11 nvdimm
System log sel
in BMC :
8b8 | 06/21/2024 | 18:41:44 | Memory #0x08 | Uncorrectable ECC | Asserted
8b9 | 06/23/2024 | 02:35:39 | Power Unit #0xb2 | Power off/down | Asserted | from channel 1
8ba | 06/23/2024 | 02:35:49 | Power Unit #0xb2 | Power on | Asserted | from channel 1
8bb | 06/23/2024 | 02:38:50 | System Event | Timestamp Clock Sync | Asserted
8bc | 06/23/2024 | 02:38:50 | System Event #0xff | Timestamp Clock Sync | Asserted
8bd | 06/22/2024 | 21:57:24 | System Event #0xff | Timestamp Clock Sync | Asserted
8be | 06/22/2024 | 21:57:24 | System Event | Timestamp Clock Sync | Asserted
8bf | 06/22/2024 | 21:57:24 | System Firmware Progress | Secondary CPU Initialization | Asserted
8c0 | 06/22/2024 | 21:57:24 | System Firmware Progress | USB resource configuration | Asserted
8c1 | 06/22/2024 | 21:57:27 | System Firmware Progress | PCI resource configuration | Asserted
8c2 | 06/22/2024 | 21:57:28 | System Firmware Progress | Video initialization | Asserted
8c3 | 06/22/2024 | 21:57:28 | System Firmware Progress | Keyboard controller initialization | Asserted
8c4 | 06/22/2024 | 21:57:28 | System Firmware Progress | Hard-disk initialization | Asserted
8c5 | 06/22/2024 | 21:57:47 | System Event | | Asserted
8c6 | 06/22/2024 | 22:00:07 | System Event #0xff | Timestamp Clock Sync | Asserted
8c7 | 06/23/2024 | 02:41:33 | System Event #0xff | Timestamp Clock Sync | Asserted
8c8 | 06/23/2024 | 03:12:52 | Memory #0x08 | Uncorrectable ECC | Asserted
- After replace slot 11 nvdimm, Same node panic again and boot failed.
PANIC : ECC error at DIMM-4: 00-00-0000-00000000,ADDR 0x4207d080,(Node(0), Memory controller(0), CH(0), DIMM(0), Rank(1), Bank Group(0), Bank(0x0), Row(0x843), Col(0x348)) Uncorrectable Machine Check Error at CPU6.