NetApp H610S node reboots unexpectedly with machine check error
Applies to
- NetApp H610S
- NetApp Element software
- All currently supported versions of BIOS
Issue
- A node in an Element cluster logs a nodeOffline event for approximately 7 to 15 minutes
- Logs indicate that the node has rebooted unexpectedly
- Entries for
Uncorrectable Machine Check Exception
orCorrectable machine check error
are found in the BMC system event log around the time of the nodeOffline event - Examples of BMC SEL events:
SEL Record ID : 0053 Record Type : 02 Timestamp : 11/22/2020 13:18:25 Generator ID : 0020 EvM Revision : 04 Sensor Type : Processor Sensor Number : 74 Event Type : Sensor-specific Discrete Event Direction : Assertion Event Event Data : 0bffff Description : Uncorrectable machine check exception ========================= SEL Record ID : 0076 Record Type : 02 Timestamp : 04/04/2021 11:21:35 Generator ID : 0001 EvM Revision : 04 Sensor Type : Processor Sensor Number : a8 Event Type : Sensor-specific Discrete Event Direction : Assertion Event Event Data : ac032b Description : Correctable machine check error