StorageGRID SG1000 or SG6000 appliance node went to unknown state with 'Bus Fatal Error'
Applies to
StorageGRID SG100/SG1000/SG6000 Appliance
Issue
- Alert triggered
unable to communicate with node
. - Node went to maintenance mode and unable to bring it out of maintenance mode
- From the IPMI event log, 'Bus Fatal Error' detects when node reboots.
67 | 03/31/2023 | 04:57:13 | Critical Interrupt #0xa1 | Bus Fatal Error | Asserted
68 | 03/31/2023 | 04:57:14 | OEM record c0 | 000315 | b31517101128
69 | 03/31/2023 | 04:57:14 | Critical Interrupt #0xa1 | Bus Fatal Error | Asserted
6a | 03/31/2023 | 04:57:14 | OEM record c0 | 000315 | b31517101128
6b | 03/31/2023 | 04:57:14 | Critical Interrupt #0x90 | Software NMI | Asserted
6c | 03/31/2023 | 05:52:49 | OEM record c3 | 000000 | 06ff0afc6882
[Critical][Critical INT][Critical Interrupt] Software NMI - Asserted
[Information][Extended PCIe Error][OEM Record C0] ManufacturerID:001C4C/ VID:8086/ DID:2032/ ErrorID 1:52/ SlotNo : 3-2
[Information][Extended PCIe Error][OEM Record C0] ManufacturerID:001C4C/ VID:8086/ DID:2032/ ErrorID 1:21/ SlotNo : 3-2
[Critical][PCIe Error][Critical Interrupt] Bus Fatal (BusAE/Dev2/Fun0) - Asserted