Multiple PANICs in same DIMM slot after its replacement
Applies to
AFF A800
Issue
- A first PANIC is triggered in a DIMM slot:
PANIC: Uncorrectable Machine Check Error at CPU5. ECC error at DIMM-17: 2C-0F-2007-2664E452,ADDR 0x19da7c2c80,(Node(0), Memory controller(0), CH(0), DIMM(0), Rank(0), Bank Group(0), Bank(0x3), Row(0x4785), Col(0x38)) SKL_IMC0 Error: STATUS<0xfe10000001010090>(VALID,OVERFLOW,UC,EN,MISCV,ADDRV,PCC,CORR_ERR_STATUS(0),CORR_ERR_CNT(0x4000),OTHER_INFO(0),MscodDdrType(0x1),MscodDataRdErr,MCACOD(0x90))MISC<0x200000c000202086>(DataErrorChunk(0x2),McCmdChnl(0),McCmdMemRegion(0),McCmdOpcode(0),McCmdVld,SmiAD,SmiMsgClass(0),SmiOpcode(0),TrkId(0x1),Error_Type(0x4),ADDRMODE(0x2),ADDRLSB(0x6))ADDR<0x00000019da7c2c80>(HIPHYADDR(0x19),LOPHYADDR(0x369f0b2))(Node(0), Memory controller(0), CH(0), DIMM(0), Rank(0), Bank Group(0), Bank(0x3), Row(0x4785), Col(0x38), in SK process xor_cp_thread_2 on release 9.7P6 (C) on Mon Aug 24 16:24:07 CEST 2020
- A second PANIC is triggered, after the DIMM replacement, in the same slot:
PANIC: Uncorrectable Machine Check Error at CPU22. ECC error at DIMM-17: 2C-0F-2007-2664E2F4,ADDR 0x9d5e622040,(Node(0), Memory controller(0), CH(0), DIMM(0), Rank(1), Bank Group(3), Bank(0x0), Row(0x1f76c), Col(0x40)) SKL_IMC0 Error: STATUS<0xfe10000001010090>(VALID,OVERFLOW,UC,EN,MISCV,ADDRV,PCC,CORR_ERR_STATUS(0),CORR_ERR_CNT(0x4000),OTHER_INFO(0),MscodDdrType(0x1),MscodDataRdErr,MCACOD(0x90))MISC<0x20000ac507e02086>(DataErrorChunk(0x2),McCmdChnl(0),McCmdMemRegion(0),McCmdOpcode(0xa),McCmdVld,SmiAD,SmiMsgClass(0),SmiOpcode(0xa),TrkId(0x3f),Error_Type(0x4),ADDRMODE(0x2),ADDRLSB(0x6))ADDR<0x0000009d5e622040>(HIPHYADDR(0x9d),LOPHYADDR(0x1798881))(Node(0), Memory controller(0), CH(0), DIMM(0), Rank(1), Bank Group(3), Bank(0x0), Row(0x1f76c), Col(0x40), in process NwkThd_00 on release 9.7P6 (C) on Fri Aug 28 14:44:47 CEST 2020