AFF A800 reboot due to L2 watchdog reset
Applies to
- AFF A800
- ONTAP 9
- BMC 10.2P1 or above
Issue
- L2 watchdog reset causes a node to reboot unexpectedly.
- The partner node may report the below event during takeover:
[Node-01: cf_hwassist: cf.hwassist.takeoverTrapRecv:notice]: hw_assist: Received takeover hw_assist alert from partner(Node-02), system_down because l2_watchdog_reset.
- The BMC log contains the following information.
Example:
Wed May 04 20:25:15.500000 2022 [IPMI.notice]: 03f6 | 02 | EVT: 6fc824ff | System_Watchdog | Assertion Event, "Timer interrupt"
Wed May 04 20:25:15.940000 2022 [IPMI Event.critical]: NMI
Wed May 04 20:25:15.940000 2022 [IPMI.notice]: 03f7 | 02 | EVT: 6f00ffff | CriticalInt | Assertion Event, "NMI/Diag Interrupt"
Wed May 04 20:25:16.720000 2022 [IPMI.notice]: 03f8 | 02 | EVT: 6fc124ff | System_Watchdog | Assertion Event, "Hard reset"
Wed May 04 20:25:16.920000 2022 [IPMI Event.critical]: L2 watchdog timeout hard reset
Wed May 04 20:25:16.950000 2022 [IPMI Event.critical]: System reset
Wed May 04 20:25:16.950000 2022 [IPMI Event.critical]: L2 watchdog action completed
Wed May 04 20:25:16.950000 2022 [IPMI.notice]: 03f9 | 02 | EVT: 0301ffff | SysReset | Assertion Event, "State Asserted"
Wed May 04 20:25:16.950000 2022 [IPMI.notice]: L2 to L1 is 1(s) 10000(us)