CONTAP-170136: FAS8200 and AFF A300 systems might experience non-responsive CPUs followed by multiple watchdog controller disruptions
Issue
- Many sensor information cannot be read correctly.
 
> system sensors
Sensor Name      | Current    | Unit       | Status     | LCR       | LNC       | UNC       | UCR
-----------------+------------+------------+------------+-----------+-----------+-----------+-----------
CPU0_Temp_Margin | na         | degrees C  | na         | na        | na        | -11.000   | -1.000  
In_Flow_Temp     | 20.000     | degrees C  | ok         | 0.000     | 5.000     | 50.000    | 55.000
Out_Flow_Temp    | 27.000     | degrees C  | ok         | 0.000     | 5.000     | 65.000    | 75.000
PCI_Slot_Temp    | 25.000     | degrees C  | ok         | 0.000     | 5.000     | 60.000    | 70.000
Smart_Bat_Temp   | 22.000     | degrees C  | ok         | 0.000     | 5.000     | 60.000    | 70.000
CPU0_Error       | 0x0        | discrete   | Asserted   | na        | na        | na        | na      
CPU0_Therm_Trip  | 0x0        | discrete   | Asserted   | na        | na        | na        | na     
Wrench_Port_Up   | 0x0        | discrete   | Enabled    | na        | na        | na        | na
Attn_Sensor1     | 0x0        | discrete   | Asserted   | na        | na        | na        | na  - FAS8200 and AFF A300 storage systems might experience non-responsive CPUs followed by watchdog controller disruptions.
 
watchdog nmi on cpu 0, hang cpu is 0 in process idle: cpu0 
Record 1108: Sat Apr 30 05:01:38 2022 [IPMI Event.critical]: NMI
Record 1109: Sat Apr 30 05:01:38 2022 [IPMI.notice]: e800 | 02 | EVT: 6fc824ff | System_Watchdog | Assertion Event, "Timer interrupt"
Record 1110: Sat Apr 30 05:01:39 2022 [IPMI Event.critical]: L2 watchdog timeout hard reset
Record 1111: Sat Apr 30 05:01:39 2022 [Trap Event.critical]: hwassist l2_watchdog_reset (29)
Record 1112: Sat Apr 30 05:01:45 2022 [IPMI.notice]: e900 | 02 | EVT: 6fc104ff | System_Watchdog | Assertion Event, "Hard reset"
