Multiple fan failure alert with system making noise
Applies to
- AFF and FAS systems
- ONTAP 9
- Disk Shelves
Issue
- The following alerts are reported in the event logs frequently from both nodes:
[Node-01: statd: monitor.shelf.fault:debug]: Critical fault reported on disk storage shelf attached to channel 0a. Check fans, power supplies, disks, and temperature sensors.
[Node-01: statd: monitor.fan.failed:debug]: Multiple fans has failed.
[Node-01: env_mgr: monitor.fan.warning:debug]: multiple fans have failed. Replace it to avoid overheating
[Node-01: env_mgr: callhome.c.fan.fru.fault:debug]: Call home for CHASSIS FAN FRU FAILED: Multiple fans have failed
- The output of storage show faultreveals that one of the power supplies is not being detected:
::> system node run -node * -command storage show fault
Enclosure Status: unrecoverable
Channel: 0a
Shelf: 0
Shelf Type: DS224-12
Product Serial Number: 952240001855
Module Type: IOM12E
Power Supplies:
Element Status         Status Bytes  Status Descriptions
  1: OK                01,00,00,20   RQSTED ON
  2: NOT INSTALLED     05,00,00,20   
Fans:
Element Status         Status Bytes  Status Descriptions
  1: OK                01,02,EC,26   
  2: OK                01,02,EC,26   
  3: NOT INSTALLED     05,00,00,20   
  4: NOT INSTALLED     05,00,00,20   
Input Power Monitor:
Element Status         Status Bytes  Status Descriptions
  1: OK                01,00,29,07   
  2: NOT INSTALLED     05,00,00,00   
Power Crest Factor:
Element Status         Status Bytes  Status Descriptions
  1: OK                01,00,29,07   
  2: NOT INSTALLED     05,00,00,00
- The SP sensors are unable to report the readings even after PSU replacement:
Sensor Name              State          Current    Critical     Warning     Warning    Critical
                                        Reading       Low         Low         High       High
-------------------------------------------------------------------------------------------------
SNMP Bad Fan Count                      MULTI_FAILED
Chassis is Under Temp                       NO
Chassis is Over Temp                        NO
PSU2 Bad                 invalid            --
PSU1 Bad                                 FALSE
PSU2                     invalid            --
PSU1                                      GOOD
PSU2 ON                                     ON
PSU1 ON                                     ON
PSU1 INFO                               FRU_AVAIL
PSU1 INFO                               FRU_AVAIL
PSU1 FRU                                  GOOD
PSU2 FRU                                MULTIFAULT
Partner Status                          A_SIDE_PRESENT
PSU1 Present                            PRESENT  
PSU2 Present             not_available      --
PSU2 5V                  not_available     -- mV       --          --          --          --       
PSU2 12V                 not_available     -- mV       --          --          --          --       
PSU2 5V Curr             not_available     -- mA       --          --          --          --       
PSU2 12V Curr            not_available     -- mA       --          --          --          --       
PSU2 Fan 1               not_available     -- RPM      --          --          --          --       
PSU2 Fan 2               not_available     -- RPM      --          --          --          --       
PSU2 Inlet Temp          not_available     -- C         0 C         5 C        57 C        62 C     
PSU2 Hotspot Temp        not_available     -- C         0 C         5 C        90 C       100 C     
PSU_FAN                                 FAIL_2
- As one PSU fan is not detected, the other PSU fans start spinning faster, making noise.
- The SP/BMC is already on the latest firmware version.
- Reboot of the SP/BMC does not stop the alerts.
- The e0M port is not subjected to high traffic as described in KB: CHASSIS FAN FRU FAILED: Multiple fans have failed even after upgrading SP/BMC
- Issue persists despite performing takeover/giveback of the nodes one by one.
