Skip to main content
NetApp Knowledge Base

Handling watchdog resets (WDR)

Views:
10,543
Visibility:
Public
Votes:
17
Category:
fas-systems
Specialty:
hw
Last Updated:

 

Applies to

Watchdog reset

Answer

What is a watchdog reset?

A watchdog is an independent timer that monitors the progress of the main controller running Data ONTAP. Its function is to serve as an automatic server restart in the event the system encounters an unrecoverable system error.

The watchdog implemented by NetApp uses a two-level timer with different actions associated with each level of time.

  • Level 1: Timeout: The storage appliance attempts to panic and dump the core in response to a non-maskable interrupt. Once a L1 watchdog is successfully issued, the system returns to service and a core file is written, allowing NetApp to determine the root cause of the hang. A L1 watchdog is issued if the timer is not reset within 1.5 seconds.
     
  • Level 2: Reset: The storage appliance resets through a hard reset signal sent from the timer. A L2 watchdog is issued if the watchdog timer is not reset within two seconds after the L1 watchdog. The L2 watchdog does not generate a Core dump

It is not necessary to ‘recover’ from a watchdog timeout or watchdog reset, as both of these events are recovery mechanisms for other failures. The objective instead is to identify the failure(s) that caused the watchdog event.

What is the appropriate response to a watchdog timeout (L1 Watchdog Event)?

A watchdog timeout should be treated just like any other system panic. The associated backtrace and/or the core should be analyzed for the possible root cause(s). A giveback should be performed if necessary.

What is the appropriate response to a watchdog reset (L2 Watchdog Event)?
DO NOT SIMPLY GIVEBACK AND MONITOR as data collection is required

Please collect the following data to help diagnose the cause of a watchdog reset:

  • AutoSupport messages
  • Console logs before, during, and after the watchdog event (if possible)
  • ssram log (/etc/log/ssram/ssram.log or /mroot/etc/log/ssram/ssram.log) - FAS62xx, FAS80x0 only
  • Collection of SP and BMC logs will be required for proper analysis

Note: No hardware should be replaced unless the root cause is a hardware issue based on the available log analysis.

Platform Article
AFF A80X0, FAS80X0 Handling L2 Watchdog Resets on the FAS 80X0 and AFF A80X0 platforms
FAS25XX Handling L2 Watchdog Resets on the FAS 25XX platforms
AFF A700, FAS9000 Handling L2 Watchdog Resets on the AFF A700 and FAS9000 platforms
AFF A200, FAS26XX Handling L2 Watchdog Resets on the FAS26XX and AFF A200 platforms
AFF A220, AFF C190, FAS27XX Handling L2 Watchdog Resets on the FAS27XX, AFF A220, and AFF C190 platforms
AFF A400, FAS8300, FAS8700 Handling L2 Watchdog Resets on the AFF A400, FAS8300, and FAS8700
AFF A700s Handling L2 Watchdog Resets on the AFF A700s Platform
AFF A300, FAS8200 Handling L2 Watchdog Resets on the FAS8200 and AFF A300 platforms
AFF A800 Handling L2 Watchdog Resets on the AFF A800 Platform
AFF A320 Handling L2 Watchdog Resets on the AFF A320 Platform
AFF A900, FAS9500 Handling L2 Watchdog Resets on the AFF A900 and FAS9500 Platform
AFF A250, FAS500f, AFF C250 Handling L2 Watchdog Resets on the AFF A250, FAS500f, and Aff C250

 

Additional Information

For further assistance, contact NetApp Technical Support and reference this article along with the data collected.

 

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.
Scan to view the article on your device