Multiple fan failure alert with system making noise
Applies to
- AFF and FAS systems
- ONTAP 9
- Disk Shelves
Issue
- The following alerts are reported in the event logs frequently from both nodes:
[Node-01: statd: monitor.shelf.fault:debug]: Critical fault reported on disk storage shelf attached to channel 0a. Check fans, power supplies, disks, and temperature sensors.
[Node-01: statd: monitor.fan.failed:debug]: Multiple fans has failed.
[Node-01: env_mgr: monitor.fan.warning:debug]: multiple fans have failed. Replace it to avoid overheating
[Node-01: env_mgr: callhome.c.fan.fru.fault:debug]: Call home for CHASSIS FAN FRU FAILED: Multiple fans have failed
- The output of
storage show fault
reveals that one of the power supplies is not being detected:
::> system node run -node * -command storage show fault
Enclosure Status: unrecoverable
Channel: 0a
Shelf: 0
Shelf Type: DS224-12
Product Serial Number: 952240001855
Module Type: IOM12E
Power Supplies:
Element Status Status Bytes Status Descriptions
1: OK 01,00,00,20 RQSTED ON
2: NOT INSTALLED 05,00,00,20
Fans:
Element Status Status Bytes Status Descriptions
1: OK 01,02,EC,26
2: OK 01,02,EC,26
3: NOT INSTALLED 05,00,00,20
4: NOT INSTALLED 05,00,00,20
Input Power Monitor:
Element Status Status Bytes Status Descriptions
1: OK 01,00,29,07
2: NOT INSTALLED 05,00,00,00
Power Crest Factor:
Element Status Status Bytes Status Descriptions
1: OK 01,00,29,07
2: NOT INSTALLED 05,00,00,00
- The SP sensors are unable to report the readings even after PSU replacement:
Sensor Name State Current Critical Warning Warning Critical
Reading Low Low High High
-------------------------------------------------------------------------------------------------
SNMP Bad Fan Count MULTI_FAILED
Chassis is Under Temp NO
Chassis is Over Temp NO
PSU2 Bad invalid --
PSU1 Bad FALSE
PSU2 invalid --
PSU1 GOOD
PSU2 ON ON
PSU1 ON ON
PSU1 INFO FRU_AVAIL
PSU1 INFO FRU_AVAIL
PSU1 FRU GOOD
PSU2 FRU MULTIFAULT
Partner Status A_SIDE_PRESENT
PSU1 Present PRESENT
PSU2 Present not_available --
PSU2 5V not_available -- mV -- -- -- --
PSU2 12V not_available -- mV -- -- -- --
PSU2 5V Curr not_available -- mA -- -- -- --
PSU2 12V Curr not_available -- mA -- -- -- --
PSU2 Fan 1 not_available -- RPM -- -- -- --
PSU2 Fan 2 not_available -- RPM -- -- -- --
PSU2 Inlet Temp not_available -- C 0 C 5 C 57 C 62 C
PSU2 Hotspot Temp not_available -- C 0 C 5 C 90 C 100 C
PSU_FAN FAIL_2
- As one PSU fan is not detected, the other PSU fans start spinning faster, making noise.
- The SP/BMC is already on the latest firmware version.
- Reboot of the SP/BMC does not stop the alerts.
- The e0M port is not subjected to high traffic as described in KB: CHASSIS FAN FRU FAILED: Multiple fans have failed even after upgrading SP/BMC
- Issue persists despite performing takeover/giveback of the nodes one by one.