Skip to main content
NetApp Knowledge Base

Both nodes in HA pair reboot due to power loss

Views:
727
Visibility:
Public
Votes:
0
Category:
aff-series
Specialty:
hw
Last Updated:

Applies to

  • FAS Systems
  • AFF Systems

Issue

  • Both nodes in an HA pair reboot at the same time.
  • EMS logs example (repeated in both nodes at the same time) for DC undervoltage and AC FAil in both PSUs:

[node_name: dsa_worker3: ses.status.psWarning:error]: DS224-12 (S/N 012345678910) shelf 0 on channel 0b power warning for Power supply 1: warning status; DC undervoltage. This module is on the rear of the shelf at the bottom left.
[node_name: dsa_worker4: ses.status.psError:alert]: DS224-12 (S/N 012345678910) shelf 0 on channel 0b power error for Power supply 1: critical status; AC Fail. This module is on the rear of the shelf at the bottom left.
[node_name: dsa_worker4: callhome.shlf.power.intr:error]: Call home for SHELF POWER INTERRUPTED
[node_name: statd: monitor.shelf.fault:alert]: Critical fault reported on disk storage shelf attached to channel 0b. Check fans, power supplies, disks, and temperature sensors.
[node_name: power_low_monitor: monitor.chassisPower.degraded:alert]: Chassis power is degraded: Power Supply Status Critical: PSU1.
[node_name: power_low_monitor: callhome.chassis.power:error]: Call home for CHASSIS POWER DEGRADED: Power Supply Status Critical: PSU1.
[node_name: monitor: monitor.globalStatus.critical:EMERGENCY]: Power Supply Status Critical: PSU1. Disk shelf fault.
[node_name: dsa_worker2: ses.status.psInfo:info]: DS224-12 (S/N 9872957495809) shelf 0 on channel 0b power supply information for Power supply 1: normal status.
[node_name: dsa_worker0: ses.status.psWarning:error]: DS224-12 (S/N 012345678910) shelf 0 on channel 0b power warning for Power supply 2: warning status; DC undervoltage. This module is on the rear of the shelf at the bottom right.
[node_name: dsa_worker2: callhome.shlf.ps.fault:error]: Call home for SHELF POWER SUPPLY WARNING

  • BMC/SP events report power loss (repeated in both nodes at the same time):

Record 2435: Mon Dec 05 22:33:43.000000 2022 [BMC.emergency]: System input power lost
Record 2436: Sun Jan 01 00:00:22.310000 2017 [IPMI.notice]: 05f2 | c0 | OEM: ffff7000ff00 | ManufId: 150300 | BMC Power Reset
Record 2437: Sun Jan 01 00:00:22.330000 2017 [IPMI.notice]: 05f3 | c0 | OEM: fcff70560000 | ManufId: 150300 | POS Register: Power on Reset(Normal Power Cycle)

OR

Record 1596: Sat Sep 11 08:03:16 2021 [SP.emergency]: System input power lost
Record 1597: Thu Jan  1 00:00:32 1970 [IPMI.notice]: ce01 | c0 | OEM: ffff7000ff00 | ManufId: 150300 | SP Power Reset
Record 1598: Thu Jan  1 00:00:32 1970 [IPMI.notice]: cf01 | c0 | OEM: fcff70560000 | ManufId: 150300 | POS Register: Power on Reset(Normal Power Cycle)

  • BMC/SP system log reporting power issues (repeated in both nodes at the same time) Example:

BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: Power Action:needed(0) action(0); Alert Action: needed(1) action(17)
BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: EventFilter: event on sensor(#0x32 dir:3) match (15) ALERT
BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: Power Action:needed(0) action(0); Alert Action: needed(1) action(17)
BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: EventFilter: event on sensor(#0x34 dir:3) match (15) ALERT
BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: Power Action:needed(0) action(0); Alert Action: needed(1) action(17)
BMC hsam[1426]: FRU /chassis-1 LED on
BMC hsam[1426]: FRU /chassis-1/controller-b/cna-3 LED on
BMC hsam[1426]: HSAM OS(bmc):cmd(set) FLD(cna-4):fault(Overcurrent Protection Fault)
BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: EventFilter: event on sensor(#0x5b dir:3) match (15) ALERT
BMC hsam[1426]: FRU /chassis-1 LED on
BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: Power Action:needed(0) action(0); Alert Action: needed(1) action(17)
BMC hsam[1426]: FRU /chassis-1/controller-b/cna-4 LED on
BMC hsam[1426]: HSAM OS(bmc):cmd(set) FLD(cna-1):fault(Overcurrent Protection Fault)
BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: EventFilter: event on sensor(#0x5d dir:3) match (15) ALERT
BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: Power Action:needed(0) action(0); Alert Action: needed(1) action(17)
BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: EventFilter: event on sensor(#0x5e dir:3) match (15) ALERT
BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: Power Action:needed(0) action(0); Alert Action: needed(1) action(17)

  • The issue remains after PSUs and/or controller re-seat or replacement.

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.