Node reboots with Environmental Reason Shutdown (Temperature critical) due to SP firmware image reset
Applies to
- FAS
- AFF
Issue
- Node goes down with the following events:
Wed May 25 00:48:24 UTC [pv35p45im-filerm45003: env_mgr: monitor.temp.unreadable:info]: The controller temperature (CPU1 Temp Margin) is not readable.
Wed May 25 00:48:24 UTC [pv35p45im-filerm45003: env_mgr: monitor.power.unreadable:info]: A power sensor PVCCP CPU0 in the controller module is not readable.
Wed May 25 00:51:16 UTC [pv35p45im-filerm45003: spsm_listener: sp.heartbeat.stopped:warning]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 20 seconds.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: callhome.c.fan.fru.fault:error]: Call home for CHASSIS FAN FRU FAILED: SysFan3 F1
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.chassisFan.stop:alert]: Chassis fan contains at least one stopped fan: SysFan3 F1 (failed)
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.temp.unreadable:info]: The controller temperature (In Flow Temp) is not readable.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.chassisTemperature.state.unknown:warning]: Chassis temperature state is unknown: Multiple Temp sensors are unreadable. System will be shutdown in 2 minutes.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.power.unreadable:info]: A power sensor PCH Hot in the controller module is not readable.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.power.unreadable:info]: A power sensor P5V STBY in the controller module is not readable.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.power.unreadable:info]: A power sensor P3.3V STBY in the controller module is not readable.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.power.unreadable:info]: A power sensor P1.8V STBY in the controller module is not readable.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.power.unreadable:info]: A power sensor P0.9V STBY in the controller module is not readable.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.power.unreadable:info]: A power sensor PVDDQ DDR3 AB in the controller module is not readable.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.power.unreadable:info]: A power sensor PVTT DDR3 AB in the controller module is not readable.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Multiple chassis fans have failed.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: callhome.fans.failed:EMERGENCY]: Call home for MULTIPLE FAN FAILURE
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: statd: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (Temperature critical)
- Node reboots and comes to waiting for giveback status.
- Service processor status during the reboot shows unknown.
Node>sysconfig -a
Service Processor
Status: Unknown
IPMI: unknown
PKT: unknown
- Could see SP reset event from the SP logs.
sp>events all
SP recovered successfully after a reset from primary FW image.
- Noticed Service processor got auto rebooted, during the node reboot.
sp>sp uptime
02:17:43 up 2:27, load average: 1.29, 1.15, 1.10