Node goes down due to PCI Error NMI panic on root port
Applies to
- ONTAP 9
- FAS/AFF Systems
Issue
- Node goes down with a panic similar to the following:
PANIC: PCI Error NMI from device(s):ErrSrcID(CorrSrc(0),UCorrSrc(0x1a)), RPT(0,3,2): in process idle: cpu7 on release 9.5P6 (C)
PANIC: PCI Error NMI from device(s):ErrSrcID(CorrSrc(0),UCorrSrc(0x8)), RPT(0,1,0): in process idle: cpu9 on release 9.9.1P15 (C)
- The following additional errors may also be seen prior to the panic string:
000000 2021 [SysFW.notice]: Device 10/0/0 (CNA0) failed to train at max link width - retraining
000000 2021 [SysFW.notice]: - Expected x4, actual x1
000000 2021 [SysFW.notice]: Device 10/0/0 (CNA0) failed to retrain at max link width
- The following output can be confirmed from the
PCI-HIERARCHY.XML
log:
PCI Device Level:1
PCI Device :Br[6f0a](0,3,2): PCI Device 8086:6f0a on Controller
Link Capability:LinkCap(MaxLkSp(3),MaxLkWd(4),ASPM(0),L0(3),L1(4),SurpDn,DLAct,Port(9))
Link Status: LinkStatus(LkSp(3),LkWd(4),SClk,DLAct),
PCI Device Level:2
PCI Device :Dv[1563](10,0,0): Intel Dual 10G NIC on Controller
Link Capability:LinkCap(MaxLkSp(3),MaxLkWd(4),ASPM(3),L0(5),L1(4),Port(0))
Link Status:LinkStatus(LkSp(3),LkWd(4),SClk),