StorageGRID SG1000, SG6000 or SG6100 appliance node went to unknown state with 'Bus Fatal Error'
Applies to
StorageGRID Appliances
- SG100
- SG1000
- SG6000
- SG6100
Issue
- Alert triggered
unable to communicate with node. - Node went to maintenance mode and unable to bring it out of maintenance mode
- From the IPMI event log, 'Bus Fatal Error' detects when node reboots.
67 | 03/31/2023 | 04:57:13 | Critical Interrupt #0xa1 | Bus Fatal Error | Asserted68 | 03/31/2023 | 04:57:14 | OEM record c0 | 000315 | b3151710112869 | 03/31/2023 | 04:57:14 | Critical Interrupt #0xa1 | Bus Fatal Error | Asserted6a | 03/31/2023 | 04:57:14 | OEM record c0 | 000315 | b315171011286b | 03/31/2023 | 04:57:14 | Critical Interrupt #0x90 | Software NMI | Asserted6c | 03/31/2023 | 05:52:49 | OEM record c3 | 000000 | 06ff0afc6882[Critical][Critical INT][Critical Interrupt] Software NMI - Asserted
[Information][Extended PCIe Error][OEM Record C0] ManufacturerID:001C4C/ VID:8086/ DID:2032/ ErrorID 1:52/ SlotNo : 3-2
[Information][Extended PCIe Error][OEM Record C0] ManufacturerID:001C4C/ VID:8086/ DID:2032/ ErrorID 1:21/ SlotNo : 3-2
[Critical][PCIe Error][Critical Interrupt] Bus Fatal (BusAE/Dev2/Fun0) - Asserted