CONTAP-170136: FAS8200 and AFF A300 systems might experience non-responsive CPUs followed by multiple watchdog controller disruptions
Issue
- FAS8200 and AFF A300 storage systems might experience non-responsive CPUs followed by watchdog controller disruptions.
For example:
watchdog nmi on cpu 0, hang cpu is 0 in process idle: cpu0
Record 1108: Sat Apr 30 05:01:38 2022 [IPMI Event.critical]: NMI
Record 1109: Sat Apr 30 05:01:38 2022 [IPMI.notice]: e800 | 02 | EVT: 6fc824ff | System_Watchdog | Assertion Event, "Timer interrupt"
Record 1110: Sat Apr 30 05:01:39 2022 [IPMI Event.critical]: L2 watchdog timeout hard reset
Record 1111: Sat Apr 30 05:01:39 2022 [Trap Event.critical]: hwassist l2_watchdog_reset (29)
Record 1112: Sat Apr 30 05:01:45 2022 [IPMI.notice]: e900 | 02 | EVT: 6fc104ff | System_Watchdog | Assertion Event, "Hard reset"
- This L2 WDG is due to x86 CPU cores and might be caused by a transient CPU issue.