PANIC: Uncorrectable Machine Check Error at CPUX on AFF A400 after NIC replacement
Applies to
- AFF A400
- Uncorrectable Machine Check Error
Issue
After replacing the NIC in slot 3 that was called out by the PCI Analysis tool, the node experienced a new Uncorrectable Machine Check Error
- The NIC was replaced twice and that still did not resolve the issue
- First panic called that called out slot 3 by the tool:
Uncorrectable Machine Check Error at CPU18. SKL_IIO Error: STATUS<0xbb80000000000e0b>(VALID,UC,EN,MISCV,PCC,S,AR,CORR_ERR_STATUS(0),CORR_ERR_CNT(0),MSCOD(0),MCACOD(0xe0b))MISC<0x00000000ae000000>(UCR_BUS_LOG(174),UCR_DEVICE_LOG(0),UCR_FUNCTION_LOG(0), UCR_SEGMENT_LOG(0))I
- Panic after the NIC replacement in slot 3:
Uncorrectable Machine Check Error at CPU10. SKL_IIO Error: STATUS <0xbb80000000000e0b>(VALID,UC,EN,MISCV,PCC,S,AR,CORR_ERR_STATUS(0),CORR_ERR_CNT(0),MSCOD(0),MCACOD(0xe0b))MISC<0x00000000ae000000> (UCR_BUS_LOG(174),UCR_DEVICE_LOG(0),UCR_FUNCTION_LOG(0),UCR_SEGMENT_LOG(0))IIO Machine Check from device(s):RPT(174,0,0):ErrSrcID (CorrSrc(0),UCorrSrc(0xb080)), PLX PCIE 8796 switch on Controller, Br[8796](176,16,0): Link down.