StorageGRID node down with PCIe errors and links down
Applies to
- StorageGRID
- SG6000
- E2860
Issue
- Node down
- Unable to communicate with StorageGRID node
- STATE-CAPTURE-DATA in Support Bundle under fcDump reports links as Src Down:
Executing fcDump(0,0,0,0,0,0,0,0,0,0) on controller A:
fcAll (Tick 1882944621) ==> 10/05/23-22:48:21
2806-A Our Num ::...Exchange Counts...:: Num ..Link Up..
Chip LinkStat Port Port :: :: Link Bad Bad
ID Logi ::Open Total Errors:: Down Char Frame
2-Src Down--- 2 1 :: 0 6304138 0:: 1 21 0
3-Src Down--- 2 1 :: 0 6207122 4:: 1 0 0
Executing fcDump(0,0,0,0,0,0,0,0,0,0) on controller B:
fcAll (Tick 1882915947) ==> 10/05/23-22:48:23
2806-B Our Num ::...Exchange Counts...:: Num ..Link Up..
Chip LinkStat Port Port :: :: Link Bad Bad
ID Logi ::Open Total Errors:: Down Char Frame
2-Src Down--- 2 1 :: 0 6109169 1:: 1 109 0
3-Src Down--- 2 1 :: 0 6090199 5:: 1 89 0
- BMC logs report PCIe errors:
10 Oct/5/2023 08:39:31 [Information] [Extended PCIe Error] [OEM Record C0] ManufacturerID:001C4C/ VID:8086/ DID:2032/ ErrorID 1:21/ SlotNo : 3-2
9 Oct/5/2023 08:39:31 [Critical] [PCIe Error] [Critical Interrupt] Bus Fatal (BusAE/Dev2/Fun0) - Asserted