Skip to main content
NetApp Knowledge Base

A400 node is unable to power on after Watchdog 2 Timer expired (OEM)

Views:
230
Visibility:
Public
Votes:
0
Category:
fas-systems
Specialty:
HW
Last Updated:

Applies to

  • AFF A400
  • FAS8300
  • ONTAP 9
  • BMC (Baseboard management Controller)

Issue

  • The node unexpectedly shuts down and the partner node takes over due to loss of heartbeat:
Tue Jun 17 17:38:08 [partner_node: kltp: clam.heartbeat.state.change:info]: Heartbeats to node (name=source_node, ID=1001) are Failing.
Tue Jun 17 17:38:19 [partner_node: cf_main: cf.fsm.takeover.noHeartbeat:alert]: Failover monitor: Takeover initiated after no heartbeat was detected from the partner node.
Tue Jun 17 17:38:19 [partner_node: cf_main: cf.fsm.stateTransit:info]: Failover monitor: UP --> TAKEOVER
  • BMC  system log sel  shows watchdog messages:
a7 | 06/17/2025 | 12:18:14 | Power Unit #0xb2 | Power on | Asserted | from channel 1
a8 | 06/17/2025 | 12:20:39 | Watchdog 2 #0xb1 | Timer expired (OEM) | Asserted
a9 | 06/17/2025 | 12:20:45 | Power Unit #0xb2 | Power on | Asserted | from channel 1
aa | 06/17/2025 | 12:21:10 | Power Unit #0xb2 | Power on | Asserted | from channel 1
ab | 06/17/2025 | 12:21:57 | Cli_Reboot #0xb8 | bmc cli command reboot | Asserted
ac | 01/01/2000 | 00:03:15 | Power Unit #0xb2 | Power on | Asserted | from channel 1
ad | 01/01/2000 | 00:05:39 | Watchdog 2 #0xb1 | Timer expired (OEM) | Asserted
ae | 01/01/2000 | 00:07:49 | Watchdog 2 #0xb1 | Timer expired (OEM) | Asserted
af | 01/01/2000 | 00:15:44 | Power Unit #0xb2 | Power on | Asserted | from channel 1
b0 | 01/01/2000 | 00:18:10 | Watchdog 2 #0xb1 | Timer expired (OEM) | Asserted
b1 | 01/01/2000 | 00:20:21 | Watchdog 2 #0xb1 | Timer expired (OEM) | Asserted
b2 | 01/01/2000 | 00:29:53 | Power Unit #0xb2 | Power on | Asserted | from channel 1
b3 | 01/01/2000 | 00:32:22 | Watchdog 2 #0xb1 | Timer expired (OEM) | Asserted
b4 | 01/01/2000 | 00:34:33 | Watchdog 2 #0xb1 | Timer expired (OEM) | Asserted
  • Multiple sensors display No Reading status after shutdown

From BMC system log debug output:

PVCCIN_CPU0      | 01h | ns  | 21.1 | No Reading
PVCCIN_CPU1      | 02h | ns  | 21.1 | No Reading
PVDDQ_ABC        | 03h | ns  | 21.1 | No Reading
PVDDQ_DEF        | 04h | ns  | 21.1 | No Reading
PVDDQ_GHI        | 05h | ns  | 21.1 | No Reading
PVDDQ_KLM        | 06h | ns  | 21.1 | No Reading
P1V05_PCH        | 07h | ns  | 21.1 | No Reading
CX5_Temp1        | 14h | ns  |  7.5 | No Reading
CX5_Temp2        | 15h | ns  |  7.6 | No Reading

  • The node fails to power on after executing the system power on command from the BMC prompt.
  • The node does not boot even after performing a system power cycle from the BMC.
  • The node fails to power on after motherboard reseat.

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.