CHW-3352: Uncorrectable Machine Check Error pointing to NETAPP NVRAM12 in slot 4
Issue
System reboots with Uncorrectable Machine Check Error (UMCE) pointing to the card in Slot 4 (NVRAM12).
---
Example 1:Uncorrectable Machine Check Error at CPU14. SPR_UBOX Error: STATUS<0xfa00000000000e0b>(VALID,OVER,UC,EN,MISCV,PCC,CESI(0),CERR_CNT(0),OTHER_INFO(0),MSCOD(0),MCACOD(0xe0b))MISC<0x00000000480200 00>(BUS_LOG(0x48),DEVICE_LOG(0),FUNCTION_LOG(0x2),SEGMENT_LOG(0)) IIO Machine Check from devices(s): SPR:Socket0:IIO-Stack5:RAS(72,0,2):M2IOS <0x00000015>(RasFuncNerr(0),RasSevNerr(0),RasFuncFerr(0x2),RasSevFerr(0x2),RasStsFerr), M2IOSSTS <0x00000100>(IRPSev2), IRPRING <0x00000001>(BLPErr), IRPRINGFF <0x00000001>(BLPErr), IRPRINGMISC <0x0000001b>(BLPBit4,BLPBit3,BLPBit1,BLPBit0), IRPPoisonLog <0xe0002082>(PoiLogOv,PoiLogTtype(0),PoiLogLen(0x10),PoiLogRid(0x20),PoiLogType(0x2)), ADDRL <0xa8935840>((0xa8935840)), ADDRH <0x00002048>((0x2048)), {*}NETAPP NVRAM12 in slot 4 on Controller{*}, SPR:Socket0:IIO-Stack5:RPT(72,1,0): Status(SigSysErr,DtParErr), SecStatus(DataPar,RcvSysErr), ErrSrcID(CorrSrc(0),UCorrSrc(0x4900)), . SPR_BANK8_MDF Error: STATUS<0xba00000000400405>(VALID,UC,EN,MISCV,PCC,ERR_STATUS(0),MSCOD(0),MCCOD(0x40),MCC
—
Example 2:System console output:Initializing System Memory ...Loading Device Drivers ...Configuring Devices ...Device 80/0/0 (IO4) missingDevice(s) in slot missing, attempting to recover it.System is resetting....PANIC: Uncorrectable Machine Check Error at CPU17. SPR_UBOX Error: STATUS<0xba00000000000e0b>(VALID,UC,EN,MISCV,PCC,CESI(0),CERR_CNT(0),OTHER_INFO(0),MSCOD(0),MCACOD(0xe0b))MISC<0x000000004f080000>(BUS_LOG(0x4f),DEVICE_LOG(0x1),FUNCTION_LOG(0),SEGMENT_LOG(0)) IIO Machine Check from devices(s): {*}Br[352a](79,1,0): Link down{*}, SPR:Socket0:IIO-Stack5:RPT(79,1,0): Status(SigSysErr), DevStatus(Corr,NFatal), RootErr(Corr,UCor,NFatal), ErrSrcID(CorrSrc(0x4f08),UCorrSrc(0x4f08)), CorrErr(Rcvr), UCorrErr(LnkDn), FirstUCorrErr(LnkDn), ErrSrcID(CorrSrc(0x4f08),UCorrSrc(0x4f08)), . in process reserved: cpu17 on release 9.16.1P3 (C) on Sun Dec 28 20:15:08 EST 2025
—
- UMCE is indicating device Link down, SPR:Socket0:IIO-Stack5:RPT(79,1,0) is reporting the fault
- This is device Br[352a](79,1,0): PCI Device 8086:352a on Controller which is connected to the device in slot 4: Dv[000e](80,0,0) in slot 4: NETAPP NVRAM12 in slot 4 on Controller
PCI hierarchy:
1 Br[352a](79,1,0): PCI Device 8086:352a on Controller LinkCap(MaxLkSp(5),MaxLkWd(16),ASPM(0),L0(4),L1(4),SurpDn,DLAct,Port(25)) LinkStatus(LkSp(4),LkWd(16),SClk,DLAct),2 Dv[000e](80,0,0) in slot 4: NETAPP NVRAM12 in slot 4 on Controller LinkCap(MaxLkSp(4),MaxLkWd(16),ASPM(0),L0(0),L1(0),Port(1)) LinkStatus(LkSp(4),LkWd(16),SClk),