StorageGRID appliance SG5700 reboots unexpectedly due to watchdog timeout
- Views:
- 688
- Visibility:
- Public
- Votes:
- 0
- Category:
- storagegrid
- Specialty:
- sgrid
- Last Updated:
- 4/25/2025, 10:22:32 PM
Applies to
- NetApp StorageGRID 11.5 and above
- StorageGRID Appliances SG5700
Issue
- Alert email of unexpected reboot for one or more nodes:
Unexpected node reboot (1 alert)
A node rebooted unexpectedly within the last 24 hours.
Recommended actions
1. Monitor this alert. The alert will be cleared after 24 hours. However, if the node reboots unexpectedly again, this alert will be triggered again.
2. If you cannot resolve the alert, there might be a hardware failure. Contact technical support.
________________________________________
dc1-sn1
Node dc1-sn1
Site DC1
Severity Major
Time triggered WKD MMM DD hh:mm:ss UTC YYYY
Job miscd
- High CPU utilization reported on the node in question leading up to the reboot. The CPU usage can be viewed under the Nodes view in StorageGRID Grid Manager, and under the (Support > Metrics) Node or Node (Internal Use) views.
- The node is rebooted due to watchdog timeout:
- Collecting log files and system data on affected nodes
- Extract the .tar.gz log file
- Locate the crash dmesg file:
base-os-logs\run\mount-tmp\pge-actv-root\var\log\storagegrid_crash_dmesg.YYYYMMDDhhmmss.log.gz
- Extract and open the crash dmesg file, verify the node is rebooted due to watchdog timeout:
[sss.uuuuuu] fpga_pci: fpgaIsr: fpgaIsr fired, interruptCount = 1
[sss.uuuuuu] fpga_pci: fpgaIsr: fpgaIsr logging memory usage
...
[sss.uuuuuu] fpga_pci: fpgaIsr: fpgaIsr logging CPU backtrace
...
[sss.uuuuuu] fpga_pci: fpgaIsr: fpgaIsr logging blocked tasks
...
[sss.uuuuuu] fpga_pci: fpgaIsr: fpgaIsr logging all ftrace buffers
...
[sss.uuuuuu] fpga_pci: fpgaIsr: fpgaIsr serviced watchdog