CONTAP-588247: NVME disk caused panic of HA PAIR (Poisoned Transaction Layer Packet (PTLP))
Issue
- Both Nodes panicked because of NVMe disk
[Node-02: send_boot_msg_thread: mgr.stack.string:notice]: Panic string: Uncorrectable Machine Check Error at CPU9. SKL_IIO Error: STATUS<0xf780000000010405>(VALID,OVERFLOW,UC,EN,ADDRV,PCC,S,AR,CORR_ERR_STATUS(0),CORR_ERR_CNT(0),MS SSRAM shows an issue with Disk PCI
SRAM record type(LOG) from Data ONTAP: rstUCorrErr(PTLP), TLPType(CfgWrRq),Hdr[0]<0x4500c001>(HdrLen(1),AddrType(0),Attr(0),Ep,Td,Tc(0),Type(5),Fmt(2)), Hdr[1]<0x16000001>(RqBusNum(22),RqDvNum(0),ReqFncNum(0),Tag(0),LstDwBe(0),1stDwBe(1)), Hdr[2]<0x1c00000c>({color:#ffab00}*BusNum(28),DvNum(0),FncNum(0)*{color},CfgAddr(0xc)), Hdr[3]((0x10000000)); Br[9797](24,3,0): Status(DtParErr), SecStatus(DataPar), DevStatus(Corr), CorrErr(AdvsNF,HdrOvf), UCorrErr(PTLP), FirstUCorrErr(PTLP), TLPType(CfgWrRq),Hdr[0]<0x4500c001>(HdrLen(1),AddrType(0),