Takeover after no heartbeat was detected from the partner node on AFF A400,FAS8700,FAS8300
Applies to
- ONTAP 9
- AFF A400
- FAS8700
- FAS8300
Issue
- Unexpected Node down
- Partner node reports a takeover due to loss of heartbeat
[Node-02: kltp: clam.heartbeat.state.change:info]: Heartbeats to node (name=Node-01, ID=1000) are Failing.
[Node-02: cf_main: cf.fsm.takeover.noHeartbeat:alert]: Failover monitor: Takeover initiated after no heartbeat was detected from the partner node.
- Event logs from BMC report the following
b8b | 03/19/2024 | 13:00:38 | Battery Learning #0xc4 | In progress | Asserted
b8c | 03/19/2024 | 17:14:26 | System Event #0xff | Timestamp Clock Sync | Asserted
b8d | 03/19/2024 | 17:14:29 | System Event #0xff | Timestamp Clock Sync | Asserted
b8e | 03/20/2024 | 00:36:49 | Battery Learning #0xc4 | In progress | Asserted
b8f | 03/20/2024 | 06:38:09 | Power Unit #0xb2 | Power on | Asserted | from channel 1
b90 | 03/20/2024 | 06:40:34 | Watchdog 2 #0xb1 | Timer expired (OEM) | Asserted
b91 | 03/20/2024 | 06:41:08 | Power Unit #0xb2 | Power on | Asserted | from channel 1
b92 | 03/20/2024 | 06:42:45 | Watchdog 2 #0xb1 | Timer expired (OEM) | Asserted
b93 | 03/20/2024 | 06:44:54 | Watchdog 2 #0xb1 | Timer expired (OEM) | Asserted
b94 | 03/20/2024 | 07:11:18 | Power Unit #0xb2 | Power on | Asserted | from channel 1
b95 | 03/20/2024 | 07:13:44 | Watchdog 2 #0xb1 | Timer expired (OEM) | Asserted
b96 | 03/20/2024 | 07:15:54 | Watchdog 2 #0xb1 | Timer expired (OEM) | Asserted
- Node still down after motherboard reseat