Even after replacing the DIMM the event log still displays the DIMM error
Applies to
- FAS 2750
- ONTAP 9.5
Issue
-
DIMM-1 was replaced due to the following panic error.
PANIC: ECC error at DIMM-1: CE-03-1843-24156E02,ADDR 0x79e200080,(Node(0), Memory controller(0), CH(0), DIMM(0), Rank(0), Bank Group(0), Bank(0x2), Row(0xe780), Col(0x0), Uncorrectable Machine Check Error at CPU11. BDWL_HA0 Error: STATUS<0xbe00000000010090>(Val,UnCor,Enable,MiscV,AddrV,PCC,CorrSts(0),CorrCnt(0),ExtErr(0x1),ErrCode(Channel 0, Read)ErrCode(0x90))MISC<0x0000000040169686>(HaDbBank(0),PE(0),ReqOpcode(0x2),RNID(0),RTID(0xb),HTID(0x4b))ADDR<0x000000079e200080>((0x79e200080)). in process ECC scrubber on release 9.5 (C) on Sun May 5 20:09:02 JST 2024
version: 9.5:
- Even after replacement the DIMM-1, the system's Fault LED remains lit.
- The "
service-event show
" command indicates that DIMM-1 is still in an error state.
Cluster::*> system controller service-event show
Node ID Event Location Event Description
---------------- --- ---------------------------------- ----------------------
****-01 1 DIMM in slot 1 in Controller A Uncorrectable DRAM ECC
****-01 2 DIMM in slot 1 in Controller A DIMM error recorded in SRAM
2 entries were displayed.
- Even after using the "
delete
" command to remove this event, the same error reappears immediately.
::*> system controller service-event delete -event-id *
2 entries were deleted.
- The BMC command "
system fru led show all
" shows that the Attention LED for DIMM-1 is on.
BMC ****-01*> system fru led show all
FRU LED ID 1 is off
FRU LED ID 2 is on. Set by BMC
FRU LED ID 3 is on
FRU LED ID 4 is on
FRU LED ID 5 is off
FRU LED ID 6 is off
FRU LED ID 7 is off
FRU LED ID 8 is off
FRU LED ID 9 is off
FRU LED ID 10 is off
FRU LED ID 11 is off
FRU LED ID 12 is off
FRU LED ID 13 is on
FRU LED ID 14 is off
FRU LED ID 15 is off
FRU LED ID 16 is off
FRU LED ID 17 is off
FRU LED ID 18 is off
BMC ****-01*> system fru led show
<FRU-LED-ID>:
1 = BMC Locate LED
2 = BMC System LED
3 = BMC Controller Attention LED
4 = BMC Controller Active LED
5 = BMC SAS Port A Attention LED
6 = BMC SAS Port B Attention LED
7 = BMC CNA Port 1 Attention LED
8 = BMC CNA Port 2 Attention LED
9 = BMC CNA Port 3 Attention LED
10 = BMC CNA Port 4 Attention LED
11 = BMC 10G Port 1 Attention LED
12 = BMC 10G Port 2 Attention LED
13 = BMC DIMM Slot 1 Attention LED
14 = BMC DIMM Slot 2 Attention LED
15 = BMC NVMEM 1 Attention LED
16 = BMC BOOT DISK Attention LED
17 = BMC NV BATTERY Attention LED
18 = BMC Coin Cell Attention LED
BMC cltbmnas-01*>
- Attempting to reseat or replace the DIMM again does not resolve the issue.
- Issue persists after implementing a Takeover/Giveback and BMC reboot.