PCIE Stealth errors on X91152A card leads to node rebooting
Applies to
- AFF A900
- ONTAP 9.13.1P9
- X91152A Ethernet Storage Controller
Issue
- Node reboots without a clear cause in the EMS or SP logs.
- See errors in EMS like below repeated constantly in the logs:
[?] Fri Jun 14 12:38:43 +0900 [node_1: ICL error: pcie.stealth.errors:debug]: params: {'pcie_errors': 'IIO0: RPT(46,2,0): Microchip PCI-E Switch on Controller, Br[4000](48,1,0): DevStatus(Corr), CorrErr(Rcvr); '}
[?] Sat Jun 15 08:48:47 +0900 [node_1: ICL error: pcie.stealth.errors:debug]: params: {'pcie_errors': 'IIO0: RPT(46,2,0): Microchip PCI-E Switch on Controller, Br[4000](48,1,0): DevStatus(Corr), CorrErr(Rcvr); '}
[?] Sun Jun 16 17:34:53 +0900 [node_1: ICL error: pcie.stealth.errors:debug]: params: {'pcie_errors': 'IIO0: RPT(46,2,0): Microchip PCI-E Switch on Controller, Br[4000](48,1,0): DevStatus(Corr), CorrErr(Rcvr); '}
- From PCI-HIERARCHY.XML, Br[4000](48,1,0) is a slot 1 card.
1 | Br[347a](46,2,0): PCI Device 8086:347a on Controller | LinkCap(MaxLkSp(4),MaxLkWd(16),ASPM(0),L0(4),L1(4),SurpDn,DLAct,Port(5)) | LinkStatus(LkSp(4),LkWd(16),SClk,DLAct), |
3 | Br[4000](48,1,0): Microchip PCI-E Switch on Controller | LinkCap(MaxLkSp(4),MaxLkWd(16),ASPM(0),L0(0),L1(0),SurpDn,DLAct,Port(2)) | LinkStatus(LkSp(4),LkWd(16),SClk,DLAct), |
4 | Br[4036](52,0,0) in slot 1: Microchip PCI-E Switch in slot 1 on Controller | LinkCap(MaxLkSp(4),MaxLkWd(16),ASPM(0),L0(0),L1(0),Port(0)) | LinkStatus(LkSp(4),LkWd(16),SClk), |
- From sysconfig -ac, the card in slot 1 can be identified as X91152A.