One node reports multiple fans have failed
Applies to
- FAS2650
- FAS2620
- FAS2750
- FAS2720
- AFF A220
- AFF A200
- ONTAP 9
- Service Processor (SP)
- Baseboard Management Controller (BMC)
Issue
- One node in the HA pair reports multiple fan failures in event logs:
[Node-02: dsa_worker2: ses.status.temperatureWarning:alert]: DS224-12 (S/N SHFGDXXXX000045) shelf 0 on channel 0b temperature warning for Temperature sensor 12: not installed or failed. Current temperature: 22 C (71 F). This module is on the rear of the shelf at the top left, on shelf module A.
[Node-02: env_mgr: monitor.fan.ok:notice]: All fans are OK.
[Node-02: dsa_worker1: ses.status.temperatureInfo:info]: DS224-12 (S/N SHFGDXXXX000045) shelf 0 on channel 0b temperature information for Temperature sensor 12: normal status.
[Node-02: env_mgr: monitor.fan.ok:notice]: All fans are OK.
[Node-02: monitor: monitor.globalStatus.critical:EMERGENCY]: Chassis temperature is too high.. 
[Node-02: env_mgr: monitor.fan.warning:notice]: multiple fans have failed. Replace it to avoid overheating
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Module B Expander Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Module A Expander Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 4 Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 3 Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 2 Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 1 Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Ambient Temp) is not readable.
[Node-02: monitor: monitor.globalStatus.critical:EMERGENCY]: Multiple fans has failed. Chassis temperature is too high.. 
[Node-02: env_mgr: callhome.c.fan.fru.fault:error]: Call home for CHASSIS FAN FRU FAILED: Multiple fans have failed
- May also see
[Node-02: env_mgr:  power_low_monitor: callhome.chassis.power:error]: Call home for CHASSIS POWER DEGRADED: Power Supply Status Critical: PSU1, PSU2.
- The partner node does not trigger any such alerts.
- All the power supplies are blinking green, with an amber LED light on the front of the node.
- From the node that reports the errors, the PSU and fan sensors are as follows:
Sensor Name              State          Current    Critical     Warning     Warning    Critical
                                        Reading       Low         Low         High       High
-------------------------------------------------------------------------------------------------
SNMP Bad Fan Count                      MULTI_FAILED
Chassis is Under Temp    invalid            --
Chassis is Over Temp                       YES
PSU1 INFO                               FAILED
PSU1 INFO                               FRU_AVAIL
PSU1 FRU                                MULTIFAULT
PSU2 FRU                                MULTIFAULT
Module B Expander Temp   failed            -- C         0 C         5 C        80 C        90 C     
Module A Expander Temp   failed            -- C         0 C         5 C        80 C        90 C     
Midplane 4 Temp          failed            -- C         0 C         5 C        47 C        52 C     
Midplane 3 Temp          failed            -- C         0 C         5 C        47 C        52 C     
Midplane 2 Temp          failed            -- C         0 C         5 C        47 C        52 C     
Midplane 1 Temp          failed            -- C         0 C         5 C        47 C        52 C     
Ambient Temp             failed            -- C         0 C         5 C        47 C        52 C     
Internal Shelf           not_available      --
CPU0 Temp Margin         init_failed       -- C        --          --           0 C        -1 C    
PSU1 Present                            PRESENT
PSU1 5V                  not_available     -- mV       --          --          --          --
PSU1 12V                 not_available     -- mV       --          --          --          --
PSU1 5V Curr             not_available     -- mA       --          --          --          --
PSU1 12V Curr            not_available     -- mA       --          --          --          --
PSU1 Fan 1               not_available     -- RPM      --          --          --          --
PSU1 Fan 2               not_available     -- RPM      --          --          --          --
PSU1 Inlet Temp          not_available     -- C         0 C         5 C        57 C        62 C
PSU1 Hotspot Temp        not_available     -- C         0 C         5 C        90 C       100 C
PSU2 Present                            PRESENT
PSU2 5V                  not_available     -- mV       --          --          --          --
PSU2 12V                 not_available     -- mV       --          --          --          --
PSU2 5V Curr             not_available     -- mA       --          --          --          --
PSU2 12V Curr            not_available     -- mA       --          --          --          --
PSU2 Fan 1               not_available     -- RPM      --          --          --          --
PSU2 Fan 2               not_available     -- RPM      --          --          --          --
PSU2 Inlet Temp          not_available     -- C         0 C         5 C        57 C        62 C
PSU2 Hotspot Temp        not_available     -- C         0 C         5 C        90 C       100 C
PSU_FAN                  not_available      --
Module B Expander Temp   failed            -- C         0 C         5 C        80 C        90 C
Module A Expander Temp   failed            -- C         0 C         5 C        80 C        90 C
Midplane 4 Temp          failed            -- C         0 C         5 C        47 C        52 C
Midplane 3 Temp          failed            -- C         0 C         5 C        47 C        52 C
Midplane 2 Temp          failed            -- C         0 C         5 C        47 C        52 C
Midplane 1 Temp          failed            -- C         0 C         5 C        47 C        52 C
Ambient Temp             failed            -- C         0 C         5 C        47 C        50 C
Internal Shelf           not_available
- The occurrence node degrades from Multi-Path HA to Single-Path HA, and the internal module on the occurrence side displays the firmware as ----.
  Shelf 0: DS212-12  Firmware rev. IOM12E A: 0230   B: ----  
