Multiple fan failure alerts reported on node due to incorrect sensor readings
- Views:
- 755
- Visibility:
- Public
- Votes:
- 0
- Category:
- aff-series
- Specialty:
- hw
- Last Updated:
- 12/23/2024, 11:46:44 AM
Applies to
- ONTAP 9
- AFF/FAS Systems
Issue
- The following errors are reported on only one node of the HA pair:
[Node-01: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 4 Temp) is not readable.
[Node-01: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 3 Temp) is not readable.
[Node-01: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 2 Temp) is not readable.
[Node-01: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 1 Temp) is not readable.
[Node-01: env_mgr: monitor.fan.warning:notice]: multiple fans have failed. Replace it to avoid overheating
[Node-01: monitor: monitor.globalStatus.critical:EMERGENCY]: Multiple fans has failed. Chassis temperature is too high..
[Node-01: env_mgr: callhome.c.fan.fru.fault:error]: Call home for CHASSIS FAN FRU FAILED: Multiple fans have failed
[Node-01: env_mgr: callhome.c.fan.fru.fault:error]: Call home for CHASSIS FAN FRU FAILED: FanB2 F1
[Node-01: env_mgr: callhome.c.fan.fru.fault:error]: Call home for CHASSIS FAN FRU FAILED: FanB2 F2
[Node-01: env_mgr: callhome.c.fan.fru.fault:error]: Call home for CHASSIS FAN FRU FAILED: FanB2 F3
[Node-01: env_mgr: callhome.c.fan.fru.fault:error]: Call home for CHASSIS FAN FRU FAILED: FanB2 F4
[Node-01: env_mgr: callhome.c.fan.fru.shut:error]: Call home for MULTIPLE FAN FAILED: System will shut down in 2 minutes
- The fan details may not be fetched in the output of
SYSCONFIG-M
section in the autosupport logs:
!FAN1!031717000424!441-00058!B0!
!FAN2: Error reading FRU EEPROM
!FAN3!032037001370!441-00058!B0!
- The SP sensor values on the node that reports errors are as below:
Sensor Name State Current Critical Warning Warning Critical
Reading Low Low High High
-------------------------------------------------------------------------------------------------
SNMP Bad Fan Count MULTI_FAILED
Chassis is Under Temp invalid --
Chassis is Over Temp YES
PSU2 Bad invalid --
PSU1 Bad invalid --
PSU2 invalid --
PSU1 invalid --
PSU2 ON invalid --
PSU1 ON invalid --
PSU1 INFO FAILED
PSU1 INFO FAILED
PSU1 FRU MULTIFAULT
PSU2 FRU MULTIFAULT
Partner Status failed --
Module B Expander Temp init_failed -- C -- -- -- --
Module A Expander Temp init_failed -- C -- -- -- --
Midplane 4 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 3 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 2 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 1 Temp failed -- C 0 C 5 C 47 C 52 C
Ambient Temp init_failed -- C -- -- -- --
Internal Shelf failed --
- The SP/BMC firmware is already on the latest version.