Handling L2 Watchdog Resets on the AFF A700s Platform
- Views:
- 1,357
- Visibility:
- Public
- Votes:
- 0
- Category:
- aff-series
- Specialty:
- HW
- Last Updated:
- 3/13/2025, 10:52:04 AM
Applies to
- AFF A700s
Issue
- Node reboots unexpectedly
- Node does not reboot after an unexpected shutdown
- BMC logs on the impacted node show the following:
453 | 05/10/2022 | 23:21:58 | CriticalInt | Software NMI | Asserted
454 | 05/10/2022 | 23:21:58 | Watchdog2 | Timer interrupt | Asserted
455 | 05/10/2022 | 23:21:59 | Watchdog2 | Hard reset | Asserted
456 | 05/10/2022 | 23:21:59 | SysReset | State Asserted | Asserted
- If node reboots, the following error can be seen in the EMS log files
Wed May 11 00:21:59 +0100 [NetApp: cf_hwassist: cf.hwassist.takeoverTrapRecv:notice]: hw_assist: Received takeover hw_assist alert from partner(n4-nht-fas-c03-02), system_down because l2_watchdog_reset.
Wed May 11 00:21:59 +0100 [NetApp: cf_hwassist: cf.hwassist.takeoverTrapRecv:notice]: hw_assist: Received takeover hw_assist alert from partner(n4-nht-fas-c03-02), system_down because reset_via_sp.
Wed May 11 00:22:00 +0100 [NetApp: cf_main: cf.fsm.stateTransit:info]: Failover monitor: UP --> TAKEOVER