Compute node stuck in unresponsive state in vCenter after multiple ECC errors
Applies to
NetApp H700E
Issue
- Node boots up
- Multiple errors of ECC shown at the System Event Log (SEL) (x and y values depends on the DIMM location):
ID,Critical,<DATE TIME>,BIOS OEM(Memory Error),Failing DIMM: DIMM location (Correctable memory component found) (Px-DIMMy) - Assertion
ID,Critical,<DATE TIME>,BIOS OEM(Memory Error),(runtime) Failing DIMM: DIMM location. (Px-DIMMy) - Assertion
Id,Warning,<DATE TIME>,Processor(OEM),Configuration Error - Assertion
- Host becomes unresponsive at the vCenter