One node reports multiple fans have failed
Applies to
- FAS2650
- FAS2750
- FAS2720
- AFF-A220
- ONTAP 9
- Service Processor (SP)
- Baseboard Management Controller (BMC)
Issue
- One node in the HA pair reports multiple fan failures in event logs:
[Node-02: dsa_worker2: ses.status.temperatureWarning:alert]: DS224-12 (S/N SHFGDXXXX000045) shelf 0 on channel 0b temperature warning for Temperature sensor 12: not installed or failed. Current temperature: 22 C (71 F). This module is on the rear of the shelf at the top left, on shelf module A.
[Node-02: env_mgr: monitor.fan.ok:notice]: All fans are OK.
[Node-02: dsa_worker1: ses.status.temperatureInfo:info]: DS224-12 (S/N SHFGDXXXX000045) shelf 0 on channel 0b temperature information for Temperature sensor 12: normal status.
[Node-02: env_mgr: monitor.fan.ok:notice]: All fans are OK.
[Node-02: monitor: monitor.globalStatus.critical:EMERGENCY]: Chassis temperature is too high..
[Node-02: env_mgr: monitor.fan.warning:notice]: multiple fans have failed. Replace it to avoid overheating
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Module B Expander Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Module A Expander Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 4 Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 3 Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 2 Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 1 Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Ambient Temp) is not readable.
[Node-02: monitor: monitor.globalStatus.critical:EMERGENCY]: Multiple fans has failed. Chassis temperature is too high..
[Node-02: env_mgr: callhome.c.fan.fru.fault:error]: Call home for CHASSIS FAN FRU FAILED: Multiple fans have failed
- The partner node does not trigger any such alerts.
- All the power supplies are blinking green, with an amber LED light on the front of the node.
- From the node that reports the errors, the PSU and fan sensors are as follows:
Sensor Name State Current Critical Warning Warning Critical
Reading Low Low High High
-------------------------------------------------------------------------------------------------
SNMP Bad Fan Count MULTI_FAILED
Chassis is Under Temp invalid --
Chassis is Over Temp YES
PSU1 INFO FAILED
PSU1 INFO FRU_AVAIL
PSU1 FRU MULTIFAULT
PSU2 FRU MULTIFAULT
Module B Expander Temp failed -- C 0 C 5 C 80 C 90 C
Module A Expander Temp failed -- C 0 C 5 C 80 C 90 C
Midplane 4 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 3 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 2 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 1 Temp failed -- C 0 C 5 C 47 C 52 C
Ambient Temp failed -- C 0 C 5 C 47 C 52 C
Internal Shelf not_available --
CPU0 Temp Margin init_failed -- C -- -- 0 C -1 C
PSU1 Present PRESENT
PSU1 5V not_available -- mV -- -- -- --
PSU1 12V not_available -- mV -- -- -- --
PSU1 5V Curr not_available -- mA -- -- -- --
PSU1 12V Curr not_available -- mA -- -- -- --
PSU1 Fan 1 not_available -- RPM -- -- -- --
PSU1 Fan 2 not_available -- RPM -- -- -- --
PSU1 Inlet Temp not_available -- C 0 C 5 C 57 C 62 C
PSU1 Hotspot Temp not_available -- C 0 C 5 C 90 C 100 C
PSU2 Present PRESENT
PSU2 5V not_available -- mV -- -- -- --
PSU2 12V not_available -- mV -- -- -- --
PSU2 5V Curr not_available -- mA -- -- -- --
PSU2 12V Curr not_available -- mA -- -- -- --
PSU2 Fan 1 not_available -- RPM -- -- -- --
PSU2 Fan 2 not_available -- RPM -- -- -- --
PSU2 Inlet Temp not_available -- C 0 C 5 C 57 C 62 C
PSU2 Hotspot Temp not_available -- C 0 C 5 C 90 C 100 C
PSU_FAN not_available --
Module B Expander Temp failed -- C 0 C 5 C 80 C 90 C
Module A Expander Temp failed -- C 0 C 5 C 80 C 90 C
Midplane 4 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 3 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 2 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 1 Temp failed -- C 0 C 5 C 47 C 52 C
Ambient Temp failed -- C 0 C 5 C 47 C 50 C
Internal Shelf not_available
- The occurrence node degrades from Multi-Path HA to Single-Path HA, and the internal module on the occurrence side displays the firmware as
----
.
Shelf 0: DS212-12 Firmware rev. IOM12E A: 0230 B: ----