ShelfPSUFailure_Alert reported by the Health Monitor
- Views:
- 404
- Visibility:
- Public
- Votes:
- 0
- Category:
- disk-shelves
- Specialty:
- hw
- Last Updated:
- 3/19/2025, 1:03:45 PM
Applies to
- FAS/AFF Systems
- Disk shelves
- Power Supply Unit (PSU)
- Health Monitor process schm: ShelfPSUFailure_Alert
Issue
- Sporadic call homes for power issues can be seen
- The following alerts are reported in the event logs:
[Node-02: schmd: hm.alert.raised:alert]: Alert Id = ShelfPSUFailure_Alert , Alerting Resource = 16350XXXXXXXX448 raised by monitor system-connect
[Node-02: statd: monitor.shelf.fault:alert]: Critical fault reported on disk storage shelf attached to channel 0b. Check fans, power supplies, disks, and temperature sensors.
- The output of
storage show fault
shows:
Enclosure Status: unrecoverable
Channel: 0b
Shelf: 11
Shelf Type: DS224-12
Product Serial Number: SHFFGXXXXXXXXX
Module Type: IOM12
Power Supplies:
Element Status Status Bytes Status Descriptions
1: CRITICAL 02,00,00,F3 DC FAIL, AC FAIL, OFF, RQSTED ON, FAIL
2: OK 01,00,00,20 RQSTED ON
- Issue with PSU in shelf logs:
Tue Mar 5 02:05:34 2024 ( 730+01:52:26.196); 030B004F; M0; ENC_MGT; power_manager; 02; HAL indicates PSU FAILURE fault on PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.196); 030B0050; M0; ENC_MGT; power_manager; 02; HAL indicates PSU TURNED OFF fault on PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.196); 030B0053; M0; ENC_MGT; power_manager; 02; HAL indicates PSU AC FAILURE fault on PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B006F; M0; ENC_MGT; power_manager; 02; PCM 1 PCM FAILURE Fault Detected Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B0070; M0; ENC_MGT; power_manager; 02; Re-asserting FAIL NON REDUNDANT alarm for PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B006F; M0; ENC_MGT; power_manager; 02; PCM 1 TURNED OFF Fault Detected Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B0072; M0; ENC_MGT; power_manager; 02; Setting AC MISSING NON REDUNDANT alarm for PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B006F; M0; ENC_MGT; power_manager; 02; PCM 1 AC FAILURE Fault Detected Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B0070; M0; ENC_MGT; power_manager; 02; Re-asserting AC MISSING NON REDUNDANT alarm for PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B005D; M0; ENC_MGT; power_manager; 04; PCM 1 faults indicate loss of local fan power Tue Mar 5 02:05:39 2024 ( 730+01:52:31.233); 030B0060; M0; ENC_MGT; power_manager; 04; PCM 1 local fan power restored Tue Mar 5 02:05:39 2024 ( 730+01:52:31.233); 030B0084; M0; ENC_MGT; power_manager; 02; Clearing PSU AC Missing (non-redundant) alarm
- Issue persists even after reseating PSU.