StorageGRID appliance SG5700 reboots unexpectedly due to watchdog timeout

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 356

Visibility:: Public

Votes:: 0

Category:: storagegrid

Specialty:: sgrid

Last Updated:

Applies to

NetApp StorageGRID 11.5 and above
StorageGRID Appliance SG5712

Issue

Alert email of unexpected reboot for one or more nodes:

Unexpected node reboot (1 alert) 

A node rebooted unexpectedly within the last 24 hours.

Recommended actions

1. Monitor this alert. The alert will be cleared after 24 hours. However, if the node reboots unexpectedly again, this alert will be triggered again.

2. If you cannot resolve the alert, there might be a hardware failure. Contact technical support.

________________________________________

dc1-sn1 

Node    dc1-sn1

Site    DC1

Severity    Major

Time triggered    WKD MMM DD hh:mm:ss UTC YYYY

Job    miscd

The node is rebooted due to watchdog timeout:
- Collecting log files and system data on affected nodes
- Extract the .tar.gz log file
- Locate the crash dmesg file: base-os-logs\run\mount-tmp\pge-actv-root\var\log\storagegrid_crash_dmesg.YYYYMMDDhhmmss.log.gz
- Extract and open the crash dmesg file, verify the node is rebooted due to watchdog timeout:

[sss.uuuuuu] fpga_pci: fpgaIsr: fpgaIsr fired, interruptCount = 1 [sss.uuuuuu] fpga_pci: fpgaIsr: fpgaIsr logging memory usage ... [sss.uuuuuu] fpga_pci: fpgaIsr: fpgaIsr logging CPU backtrace ... [sss.uuuuuu] fpga_pci: fpgaIsr: fpgaIsr logging blocked tasks ... [sss.uuuuuu] fpga_pci: fpgaIsr: fpgaIsr logging all ftrace buffers ... [sss.uuuuuu] fpga_pci: fpgaIsr: fpgaIsr serviced watchdog