PANIC: Uncorrectable Machine Check Error at CPUX on AFF A400 after NIC replacement
Applies to
- AFF A400
- Uncorrectable Machine Check Error
Issue
After replacing the NIC in slot 3 that is called out by the PCI Analysis tool, the node experiences a new Uncorrectable Machine Check Error
- The NIC is replaced twice and that still does not resolve the issue
- First panic calls that called out slot 3 by the tool:
Uncorrectable Machine Check Error at CPU18. SKL_IIO Error: STATUS<0xbb80000000000e0b>(VALID,UC,EN,MISCV,PCC,S,AR,CORR_ERR_STATUS(0),CORR_ERR_CNT(0),MSCOD(0),MCACOD(0xe0b))MISC<0x00000000ae000000>(UCR_BUS_LOG(174),UCR_DEVICE_LOG(0),UCR_FUNCTION_LOG(0), UCR_SEGMENT_LOG(0))I
Panic after the NIC replacement in slot 3:
Uncorrectable Machine Check Error at CPU10. SKL_IIO Error: STATUS <0xbb80000000000e0b>(VALID,UC,EN,MISCV,PCC,S,AR,CORR_ERR_STATUS(0),CORR_ERR_CNT(0),MSCOD(0),MCACOD(0xe0b))MISC<0x00000000ae000000> (UCR_BUS_LOG(174),UCR_DEVICE_LOG(0),UCR_FUNCTION_LOG(0),UCR_SEGMENT_LOG(0))IIO Machine Check from device(s):RPT(174,0,0):ErrSrcID (CorrSrc(0),UCorrSrc(0xb080)), PLX PCIE 8796 switch on Controller, Br[8796](176,16,0): Link down.