IPMI unresponsive and nodeOffline repeatedly after the periodic BMC cold reset
Applies to
- NetApp Element software 11.x, 12.0 and 12.2
- NetApp SolidFire SF-Series product line
Issue
NodeOffline
alert shortly after BMC reset- Possible errors in NetApp SolidFire Active IQ:
nodeOffline - The SolidFire Application cannot communicate with node ID {ID}.
sensorReadingFailed - IPMI diagnostics are currently unresponsive. Please contact support if this problem persists.
unresponsiveService - A master service is not responding.
- Event in Active IQ:
Beginning BMC cold reset and setting new reset date
Setting BMC cold reset date
- Entry from
sf-master.info
master-1[30228]: [Event] 30325 GlobalPool-0 serviceshared/EventReporter.cpp:582:ReportEvent|Successfully reported event={id=569216 type=PlatformHardwareEvent nodeID=6 serviceID=107 message=[Beginning BMC cold reset and setting new reset date] details={"bmcResetDate":"2021-09-02T12:49:41","bmcResetDurationMinutes":20160} reported=2021-08-19T12:49:41.644056Z published=2021-08-19T12:49:41.644104Z} mNumEventsPublished=21
core.HangDetect
can be generated