CHW-185: AFF A800 controller might unexpectedly perform a power reset
Issue
An AFF A800 controller might experience an unexpected power cycle with an NMI and an "L2 watchdog reset" event. The contoller experiences a power cycle and system event log information appears that is similar to the following example:
Record 1185: Sun Oct 07 04:25:47.600000 2018 [IPMI.notice]: 00b3 | 02 | EVT: 01500202 | PVCCIN_CPU1 | Assertion Event, "Lower Non-critical going low " | Reading: 0.021 | Threshold: 0.021
Record 1186: Sun Oct 07 04:25:47.620000 2018 [IPMI.notice]: 00b4 | 02 | EVT: 01500003 | PVCCIO_CPU0 | Assertion Event, "Lower Non-critical going low " | Reading: 0.000 | Threshold: 0.020
Record 1187: Sun Oct 07 04:25:47.620000 2018 [IPMI.notice]: 00b5 | 02 | EVT: 01520002 | PVCCIO_CPU0 | Assertion Event, "Lower Critical going low " | Reading: 0.000 | Threshold: 0.014
Record 1188: Sun Oct 07 04:25:47.740000 2018 [IPMI.notice]: 00b6 | 02 | EVT: 01500003 | PVTT_ABC | Assertion Event, "Lower Non-critical going low " | Reading: 0.000 | Threshold: 0.020
Record 1189: Sun Oct 07 04:25:47.740000 2018 [IPMI.notice]: 00b7 | 02 | EVT: 01520002 | PVTT_ABC | Assertion Event, "Lower Critical going low " | Reading: 0.000 | Threshold: 0.014
Record 1190: Sun Oct 07 04:25:47.760000 2018 [IPMI.notice]: 00b8 | 02 | EVT: 01500003 | PVTT_DEF | Assertion Event, "Lower Non-critical going low " | Reading: 0.000 | Threshold: 0.020
Record 1191: Sun Oct 07 04:25:47.760000 2018 [IPMI.notice]: 00b9 | 02 | EVT: 01520002 | PVTT_DEF | Assertion Event, "Lower Critical going low " | Reading: 0.000 | Threshold: 0.014
Record 1192: Sun Oct 07 04:25:47.780000 2018 [IPMI.notice]: 00ba | 02 | EVT: 01500003 | PVTT_GHJ | Assertion Event, "Lower Non-critical going low " | Reading: 0.000 | Threshold: 0.020
Record 1193: Sun Oct 07 04:25:47.780000 2018 [IPMI.notice]: 00bb | 02 | EVT: 01520002 | PVTT_GHJ | Assertion Event, "Lower Critical going low " | Reading: 0.000 | Threshold: 0.014
Record 1194: Sun Oct 07 04:25:47.800000 2018 [IPMI.notice]: 00bc | 02 | EVT: 01500003 | PVTT_KLM | Assertion Event, "Lower Non-critical going low " | Reading: 0.000 | Threshold: 0.020
Record 1195: Sun Oct 07 04:25:47.800000 2018 [IPMI.notice]: 00bd | 02 | EVT: 01520002 | PVTT_KLM | Assertion Event, "Lower Critical going low " | Reading: 0.000 | Threshold: 0.014
Record 1196: Sun Oct 07 04:25:48.980000 2018 [IPMI.notice]: 00be | 02 | EVT: 6fc824ff | System_Watchdog | Assertion Event, "Timer interrupt"
Record 1197: Sun Oct 07 04:25:49.420000 2018 [IPMI Event.critical]: NMI
Record 1198: Sun Oct 07 04:25:49.420000 2018 [IPMI.notice]: 00bf | 02 | EVT: 6f00ffff | CriticalInt | Assertion Event, "NMI/Diag Interrupt"
Record 1199: Sun Oct 07 04:25:50.080000 2018 [IPMI.notice]: 00c0 | 02 | EVT: 6fc124ff | System_Watchdog | Assertion Event, "Hard reset"
Record 1200: Sun Oct 07 04:25:50.420000 2018 [IPMI Event.critical]: L2 watchdog timeout hard reset <<<<<
Record 1201: Sun Oct 07 04:25:50.450000 2018 [IPMI Event.critical]: System reset
Record 1202: Sun Oct 07 04:25:50.450000 2018 [IPMI Event.critical]: L2 watchdog action completed
Record 1203: Sun Oct 07 04:25:50.450000 2018 [IPMI.notice]: 00c1 | 02 | EVT: 0301ffff | SysReset | Assertion Event, "State Asserted"
Record 1204: Sun Oct 07 04:25:50.450000 2018 [IPMI.notice]: L2 to L1 is 1(s) 30000(us)
