Skip to main content
NetApp Knowledge Base

Fan failure detected on both nodes even after replacing controller

Views:
85
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
hw
Last Updated:

Applies to

FAS2750

Issue

  • ses.status.fanError:EMERGENCY and  monitor.globalStatus.critical:EMERGENCY]: Multiple fans has failed recorded on both nodes

[?]  Sun Jul 13 04:44:28 +0900 [node-A: dsa_worker3: ses.status.fanError:EMERGENCY]: DS224-12 (S/N SHJXXXXXXXXXX32) shelf 0 on channel 0b cooling fan error for Cooling element 2: critical status. This module is on the rear of the shelf on the lower left power supply.

[?]  Sun Jul 13 04:44:43 +0900 [node-A: env_mgr: monitor.fan.warning:notice]: multiple fans have failed. Replace it to avoid overheating

[?]  Sun Jul 13 04:44:46 +0900 [node-A: dsa_worker1: ses.status.fanError:EMERGENCY]: DS224-12 (S/N SHJXXXXXXXXXX32) shelf 0 on channel 0b cooling fan error for Cooling element 3: critical status. This module is on the rear of the shelf on the lower right power supply.

[?]  Sun Jul 13 04:45:00 +0900 [node-A: monitor: monitor.globalStatus.critical:EMERGENCY]: Multiple fans has failed. Disk shelf fault.

[?]  Sun Jul 13 04:45:13 +0900 [node-A: dsa_worker3: ses.status.fanError:EMERGENCY]: DS224-12 (S/N SHJXXXXXXXXXX32) shelf 0 on channel 0b cooling fan error for Cooling element 4: critical status. This module is on the rear of the shelf on the lower right power supply.

[?]  Sun Jul 13 04:45:43 +0900 [node-A: env_mgr: callhome.c.fan.fru.fault:error]: Call home for CHASSIS FAN FRU FAILED: Multiple fans have failed

 

[?]  Sun Jul 13 04:44:28 +0900 [node-B: dsa_worker2: ses.status.fanError:EMERGENCY]: DS224-12 (S/N SHJXXXXXXXXXX32) shelf 0 on channel 0a cooling fan error for Cooling element 2: critical status. This module is on the rear of the shelf on the lower left power supply.

[?]  Sun Jul 13 04:44:33 +0900 [node-B: env_mgr: monitor.fan.warning:notice]: multiple fans have failed. Replace it to avoid overheating

[?]  Sun Jul 13 04:44:46 +0900 [node-B: dsa_worker0: ses.status.fanError:EMERGENCY]: DS224-12 (S/N SHJXXXXXXXXXX32) shelf 0 on channel 0a cooling fan error for Cooling element 3: critical status. This module is on the rear of the shelf on the lower right power supply.

[?]  Sun Jul 13 04:45:00 +0900 [node-B: monitor: monitor.globalStatus.critical:EMERGENCY]: Multiple fans has failed. Disk shelf fault.

[?]  Sun Jul 13 04:45:13 +0900 [node-B: dsa_worker2: ses.status.fanError:EMERGENCY]: DS224-12 (S/N SHJXXXXXXXXXX32) shelf 0 on channel 0a cooling fan error for Cooling element 4: critical status. This module is on the rear of the shelf on the lower right power supply.

[?]  Sun Jul 13 04:45:33 +0900 [node-B: env_mgr: callhome.c.fan.fru.fault:error]: Call home for CHASSIS FAN FRU FAILED: Multiple fans have failed

  • Critical fan errors are also reported from STORAGE-FAULT and ENVIRONMENT on both nodes

STORAGE-FAULT:

Enclosure Status: critical

Channel: 0a

Shelf: 0

Shelf Type: DS224-12

Product Serial Number: SHJXXXXXXXXXX32

Module Type: IOM12E

 

Fans:

Element Status         Status Bytes  Status Descriptions

  1: CRITICAL          02,02,EB,A7

  2: CRITICAL          02,02,EB,A7

  3: CRITICAL          02,02,EB,A7

  4: CRITICAL          02,02,EB,A7

ENVIRONMENT:

Channel: 0a
Shelf: 0
SES device path: local access: 0b.00.99
Module type: IOM12E; monitoring is active
Shelf status: critical condition


Cooling Unit installed element list: 1, 2, 3, 4; with error: 1, 2, 3, 4
Cooling Units by element:
   [1] 7470 RPM
   [2] 7470 RPM
   [3] 7470 RPM
   [4] 7470 RPM

 

  • PSU fans show normal status in SP-LATEST-IPMI, but SNMP Bad Fan Count indicates MULTI_FAILED and PSU_FAN shows FAIL_4.

======================================

hsamcmd --fault-show-all

===============================

tag  origin  fld                                     fault reason                                    count  time

---- ------- ----                                    -------------                                   ------ -----

1    bmc     /chassis-1                              SAS Expander has set the Chassis LED ON         1      Sat Jul 12 19:43:36 2025

 

SNMP Bad Fan Count                      MULTI_FAILED

 

PSU1 Fan 1               normal          7470 RPM      --          --          --          --

PSU1 Fan 2               normal          7470 RPM      --          --          --          --

PSU1 Inlet Temp          normal            23 C         0 C         5 C        57 C        62 C

PSU1 Hotspot Temp        normal            24 C         0 C         5 C        90 C       100 C

PSU2 Present                            PRESENT

PSU2 5V                  normal          5110 mV       --          --          --          --

PSU2 12V                 normal         12260 mV       --          --          --          --

PSU2 5V Curr             normal          3350 mA       --          --          --          --

PSU2 12V Curr            normal          7770 mA       --          --          --          --

PSU2 Fan 1               normal          7470 RPM      --          --          --          --

PSU2 Fan 2               normal          7470 RPM      --          --          --          --

PSU2 Inlet Temp          normal            29 C         0 C         5 C        57 C        62 C

PSU2 Hotspot Temp        normal            30 C         0 C         5 C        90 C       100 C

PSU_FAN                                 FAIL_4

  • Actions taken but not resolved:
    1. Rebooted BMC on both nodes.
    2. Replaced PSU1/PSU2.
    3. Replaced Controller A.
    4. Reinserted Controller B

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.