Fan failure detected on both nodes even after replacing controller
Applies to
FAS2750
Issue
ses.status.fanError:EMERGENCYandmonitor.globalStatus.critical:EMERGENCY]: Multiple fans has failedrecorded on both nodes
[?] Sun Jul 13 04:44:28 +0900 [node-A: dsa_worker3: ses.status.fanError:EMERGENCY]: DS224-12 (S/N SHJXXXXXXXXXX32) shelf 0 on channel 0b cooling fan error for Cooling element 2: critical status. This module is on the rear of the shelf on the lower left power supply.
[?] Sun Jul 13 04:44:43 +0900 [node-A: env_mgr: monitor.fan.warning:notice]: multiple fans have failed. Replace it to avoid overheating
[?] Sun Jul 13 04:44:46 +0900 [node-A: dsa_worker1: ses.status.fanError:EMERGENCY]: DS224-12 (S/N SHJXXXXXXXXXX32) shelf 0 on channel 0b cooling fan error for Cooling element 3: critical status. This module is on the rear of the shelf on the lower right power supply.
[?] Sun Jul 13 04:45:00 +0900 [node-A: monitor: monitor.globalStatus.critical:EMERGENCY]: Multiple fans has failed. Disk shelf fault.
[?] Sun Jul 13 04:45:13 +0900 [node-A: dsa_worker3: ses.status.fanError:EMERGENCY]: DS224-12 (S/N SHJXXXXXXXXXX32) shelf 0 on channel 0b cooling fan error for Cooling element 4: critical status. This module is on the rear of the shelf on the lower right power supply.
[?] Sun Jul 13 04:45:43 +0900 [node-A: env_mgr: callhome.c.fan.fru.fault:error]: Call home for CHASSIS FAN FRU FAILED: Multiple fans have failed
[?] Sun Jul 13 04:44:28 +0900 [node-B: dsa_worker2: ses.status.fanError:EMERGENCY]: DS224-12 (S/N SHJXXXXXXXXXX32) shelf 0 on channel 0a cooling fan error for Cooling element 2: critical status. This module is on the rear of the shelf on the lower left power supply.
[?] Sun Jul 13 04:44:33 +0900 [node-B: env_mgr: monitor.fan.warning:notice]: multiple fans have failed. Replace it to avoid overheating
[?] Sun Jul 13 04:44:46 +0900 [node-B: dsa_worker0: ses.status.fanError:EMERGENCY]: DS224-12 (S/N SHJXXXXXXXXXX32) shelf 0 on channel 0a cooling fan error for Cooling element 3: critical status. This module is on the rear of the shelf on the lower right power supply.
[?] Sun Jul 13 04:45:00 +0900 [node-B: monitor: monitor.globalStatus.critical:EMERGENCY]: Multiple fans has failed. Disk shelf fault.
[?] Sun Jul 13 04:45:13 +0900 [node-B: dsa_worker2: ses.status.fanError:EMERGENCY]: DS224-12 (S/N SHJXXXXXXXXXX32) shelf 0 on channel 0a cooling fan error for Cooling element 4: critical status. This module is on the rear of the shelf on the lower right power supply.
[?] Sun Jul 13 04:45:33 +0900 [node-B: env_mgr: callhome.c.fan.fru.fault:error]: Call home for CHASSIS FAN FRU FAILED: Multiple fans have failed
- Critical fan errors are also reported from
STORAGE-FAULTandENVIRONMENTon both nodes
STORAGE-FAULT:
Enclosure Status: critical
Channel: 0a
Shelf: 0
Shelf Type: DS224-12
Product Serial Number: SHJXXXXXXXXXX32
Module Type: IOM12E
Fans:
Element Status Status Bytes Status Descriptions
1: CRITICAL 02,02,EB,A7
2: CRITICAL 02,02,EB,A7
3: CRITICAL 02,02,EB,A7
4: CRITICAL 02,02,EB,A7
ENVIRONMENT:
Channel: 0a
Shelf: 0
SES device path: local access: 0b.00.99
Module type: IOM12E; monitoring is active
Shelf status: critical condition
Cooling Unit installed element list: 1, 2, 3, 4; with error: 1, 2, 3, 4
Cooling Units by element:
[1] 7470 RPM
[2] 7470 RPM
[3] 7470 RPM
[4] 7470 RPM
- PSU fans show normal status in
SP-LATEST-IPMI, butSNMP Bad Fan CountindicatesMULTI_FAILEDandPSU_FANshowsFAIL_4.
======================================
hsamcmd --fault-show-all
===============================
tag origin fld fault reason count time
---- ------- ---- ------------- ------ -----
1 bmc /chassis-1 SAS Expander has set the Chassis LED ON 1 Sat Jul 12 19:43:36 2025
SNMP Bad Fan Count MULTI_FAILED
PSU1 Fan 1 normal 7470 RPM -- -- -- --
PSU1 Fan 2 normal 7470 RPM -- -- -- --
PSU1 Inlet Temp normal 23 C 0 C 5 C 57 C 62 C
PSU1 Hotspot Temp normal 24 C 0 C 5 C 90 C 100 C
PSU2 Present PRESENT
PSU2 5V normal 5110 mV -- -- -- --
PSU2 12V normal 12260 mV -- -- -- --
PSU2 5V Curr normal 3350 mA -- -- -- --
PSU2 12V Curr normal 7770 mA -- -- -- --
PSU2 Fan 1 normal 7470 RPM -- -- -- --
PSU2 Fan 2 normal 7470 RPM -- -- -- --
PSU2 Inlet Temp normal 29 C 0 C 5 C 57 C 62 C
PSU2 Hotspot Temp normal 30 C 0 C 5 C 90 C 100 C
PSU_FAN FAIL_4
- Actions taken but not resolved:
- Rebooted BMC on both nodes.
- Replaced PSU1/PSU2.
- Replaced Controller A.
- Reinserted Controller B
