AFF A700s node reports SP HBT STOPPED and Emergency shutdown to recover the BMC
Applies to
- AFF A700s
- BMC (baseboard management controller)
Issue
- After ONTAP upgrade, nodes attempt to auto-update BMC
- Auto-updates fail on one or more nodes:
[cluster-01: servprocd: sp.servprocd.upd.evts:debug]: params: {'reason': 'BMC update - Pre-update checks passed.'}[cluster-01: servprocd: sp.servprocd.upd.evts:debug]: params: {'reason': 'SP Firmware network update from 1.89 to 1.91 has been triggered.'}[cluster-01: servprocd: sp.servprocd.upd.unexpt.evts:debug]: params: {'reason': 'BMC update - Update failed after timeout.'}[cluster-01: servprocd: sp.servprocd.upd.error:error]: SP update error: SP firmware update failure has been detected.[cluster-01: servprocd: sp.servprocd.upd.unexpt.evts:debug]: params: {'reason': 'BMC update pre-update checks failed.'}[cluster-01: servprocd: sp.servprocd.upd.error:error]: SP update error: SP firmware update failure has been detected.- This results in AutoSupport notifications of SP missed and stopped heartbeat
[cluster-01: env_mgr: callhome.sp.hbt.missed:notice]: Call home for SP HBT MISSED[cluster-01: env_mgr: callhome.sp.hbt.stopped:alert]: Call home for SP HBT STOPPED- After several days of this condition, nodes halt and are unreachable via BMC remotely:
[cluster-01: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (System reboot to recover the BMC)- Console connection to node indicates no BMC logs (
system log console,system log console bak,system log selare all empty or contain only a single entry) - Attempting to boot the node from LOADER results in:
***************************************************This platform is not supported in this release.The system will now halt***************************************************