ShelfPSUFailure_Alert reported by the Health Monitor
Applies to
- FAS/AFF Systems
- Disk shelves
- Power Supply Unit (PSU)
- Health Monitor process schm: ShelfPSUFailure_Alert
Issue
- Sporadic call homes for power issues can be seen
- The following alerts are reported in the event logs:
[Node-02: schmd: hm.alert.raised:alert]: Alert Id = ShelfPSUFailure_Alert , Alerting Resource = 16350XXXXXXXX448 raised by monitor system-connect
[Node-02: statd: monitor.shelf.fault:alert]: Critical fault reported on disk storage shelf attached to channel 0b. Check fans, power supplies, disks, and temperature sensors.
Or
[Node-02: statd: monitor.shelf.fault:alert]: Critical fault reported on disk storage shelf attached to channel 0a. Check fans, power supplies, disks, and temperature sensors.
[Node-02: statd: callhome.shlf.fault:error]: Call home for SHELF_FAULT
[Node-02: dsa_worker4: ses.status.psError:alert]: DS212-12 (S/N SHFNCXXXXXXXXXX) shelf 0 on channel 0b power error for Power supply 2: critical status; AC Fail. This module is on the rear of the shelf at the bottom right.
[Node-02: dsa_worker4: callhome.shlf.power.intr:error]: Call home for SHELF POWER INTERRUPTED
- The output of
storage show fault
shows:
Enclosure Status: unrecoverable
Channel: 0b
Shelf: 11
Shelf Type: DS224-12
Product Serial Number: SHFFGXXXXXXXXX
Module Type: IOM12
Power Supplies:
Element Status Status Bytes Status Descriptions
1: CRITICAL 02,00,00,F3 DC FAIL, AC FAIL, OFF, RQSTED ON, FAIL
2: OK 01,00,00,20 RQSTED ON
- Issue with PSU in shelf logs:
Tue Mar 5 02:05:34 2024 ( 730+01:52:26.196); 030B004F; M0; ENC_MGT; power_manager; 02; HAL indicates PSU FAILURE fault on PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.196); 030B0050; M0; ENC_MGT; power_manager; 02; HAL indicates PSU TURNED OFF fault on PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.196); 030B0053; M0; ENC_MGT; power_manager; 02; HAL indicates PSU AC FAILURE fault on PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B006F; M0; ENC_MGT; power_manager; 02; PCM 1 PCM FAILURE Fault Detected Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B0070; M0; ENC_MGT; power_manager; 02; Re-asserting FAIL NON REDUNDANT alarm for PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B006F; M0; ENC_MGT; power_manager; 02; PCM 1 TURNED OFF Fault Detected Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B0072; M0; ENC_MGT; power_manager; 02; Setting AC MISSING NON REDUNDANT alarm for PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B006F; M0; ENC_MGT; power_manager; 02; PCM 1 AC FAILURE Fault Detected Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B0070; M0; ENC_MGT; power_manager; 02; Re-asserting AC MISSING NON REDUNDANT alarm for PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B005D; M0; ENC_MGT; power_manager; 04; PCM 1 faults indicate loss of local fan power Tue Mar 5 02:05:39 2024 ( 730+01:52:31.233); 030B0060; M0; ENC_MGT; power_manager; 04; PCM 1 local fan power restored Tue Mar 5 02:05:39 2024 ( 730+01:52:31.233); 030B0084; M0; ENC_MGT; power_manager; 02; Clearing PSU AC Missing (non-redundant) alarm
- Issue persists even after reseating PSU.