Multiple unexpected node reboots on a single node due to watchdog power cycle
Applies to
Issue
- Multiple unexpected reboots of compute node
- If the node has not been deployed, cannot access the Installer web page
- From
status.jsonin/base-os-logs/var/local/gemini-system-statusdirectory
{
"id": "91",
"timestamp": "06-Feb-2025 23:59:36",
"sensor": "Watchdog 2 Watchdog",
"event": "Power Cycle",
"details": "Timer use at expiration = SMS/OS ; Interrupt type = none"
},
- From BMC logs:
Event ID Time Stamp Severity Sensor Name Sensor Type Description
91 Feb/6/2025 23:59:36 [Warning] [Watchdog] [Watchdog 2] Power Cycle(Timer use at expiration: SMS/OS) - Asserted
87 Mar/22/2024 07:47:25 [Warning] [Watchdog] [Watchdog 2] Power Cycle(Timer use at expiration: SMS/OS) - Asserted
84 Oct/20/2023 06:04:11 [Warning] [Watchdog] [Watchdog 2] Power Cycle(Timer use at expiration: SMS/OS) - Asserted
81 Jul/7/2023 09:56:48 [Warning] [Watchdog] [Watchdog 2] Power Cycle(Timer use at expiration: SMS/OS) - Asserted
36 Oct/14/2022 18:31:34 [Warning] [Watchdog] [Watchdog 2] Power Cycle(Timer use at expiration: SMS/OS) - Asserted
12 Jun/22/2021 00:07:33 [Information] [Watchdog] [Watchdog 2] Timer Expired(Timer use at expiration: SMS/OS) - Asserted
OR
5636 Sep/4/2025 02:24:50 [Warning] [Watchdog] [Watchdog 2] Hard Reset(Timer use at expiration: OS Load) - Asserted
5635 Sep/4/2025 02:11:59 [Warning] [Watchdog] [Watchdog 2] Hard Reset(Timer use at expiration: OS Load) - Asserted
5634 Sep/4/2025 01:59:08 [Warning] [Watchdog] [Watchdog 2] Hard Reset(Timer use at expiration: OS Load) - Asserted
5633 Sep/4/2025 01:46:14 [Warning] [Watchdog] [Watchdog 2] Hard Reset(Timer use at expiration: OS Load) - Asserted
5632 Sep/4/2025 01:33:24 [Warning] [Watchdog] [Watchdog 2] Hard Reset(Timer use at expiration: OS Load) - Asserted
5631 Sep/4/2025 01:20:21 [Warning] [Watchdog] [Watchdog 2] Hard Reset(Timer use at expiration: OS Load) - Asserted
