AFF a700s reboots due to multiple dimms being at warning low threashold
Applies to
- AFF a700s
- BMC FW versions 1.89 and 1.91
Issue
- Node reboots when the dimms report warning low threshold reports constantly.
Wed Apr 20 19:19:39 -0700 [node1: env_mgr: monitor.chassisTemperature.cool:alert]: Chassis temperature is too cool: Dimm G0 Temp is warning low (16 C).
Wed Apr 20 19:25:39 -0700 [node1: env_mgr: monitor.chassisTemperature.cool:alert]: Chassis temperature is too cool: Dimm A1 Temp is warning low (16 C).
Wed Apr 20 19:25:39 -0700 [node1: env_mgr: monitor.chassisTemperature.cool:alert]: Chassis temperature is too cool: Dimm G1 Temp is warning low (16 C).
Wed Apr 20 19:26:39 -0700 [node1: env_mgr: monitor.chassisTemperature.cool:alert]: Chassis temperature is too cool: Dimm A0 Temp is warning low (16 C).
Wed Apr 20 19:27:38 -0700 [node1: env_mgr: monitor.chassisTemperature.cool:alert]: Chassis temperature is too cool: Dimm B0 Temp is warning low (16 C).
Wed Apr 20 19:45:39 -0700 [node1: env_mgr: monitor.chassisTemperature.cool:alert]: Chassis temperature is too cool: Dimm H0 Temp is warning low (16 C).
Wed Apr 20 19:59:39 -0700 [node1: env_mgr: monitor.chassisTemperature.cool:alert]: Chassis temperature is too cool: Dimm B1 Temp is warning low (16 C).
- Node then panics for undertemp.
Sun May 08 15:56:17 -0700 [node1: env_mgr: callhome.chassis.undertemp:EMERGENCY]: Call home for CHASSIS UNDER TEMPERATURE SHUTDOWN
- Panic reported through ASUPs and system manager is recorded as over temp.
- Checking system sensor on the node all other sensors reporot the same temp range as the dimm.