Handling L2 Watchdog Resets on the FAS8200 and AFF A300 platforms
- Views:
- 3,382
- Visibility:
- Public
- Votes:
- 2
- Category:
- aff-series
- Specialty:
- HW
- Last Updated:
- 3/5/2025, 12:15:12 PM
Applies to
- AFF A300
- FAS8200
Issue
- Node reboots unexpectedly.
- Or node does not reboot after an unexpected shutdown.
- Service Processor logs on the impacted node show the following:
Record 454: Mon Feb 08 11:49:20.924775 2021 [IPMI Event.critical]: L2 watchdog timeout hard reset
Record 455: Mon Feb 08 11:49:20.984259 2021 [Trap Event.critical]: hwassist l2_watchdog_reset (29)
Record 456: Mon Feb 08 11:49:23.000822 2021 [SP.critical]: Filer Reboot
- If node reboots, the following error can be seen in the EMS log files:
[cluster-01:mgr.boot.reason_abnormal:EMERGENCY]: System rebooted due to a watchdog reset.
- If node is unable to reboot,
system sensors
from the SP may show sensors unavailble (na
) or faulted (Fault
):
Sensor Name | Current | Unit | Status | LCR | LNC | UNC | UCR
-----------------+------------+------------+------------+-----------+-----------+-----------+-----------
SYSTEM:
System_FW_Status | na | discrete | na | na | na | na | na
System_Watchdog | 0x0 | discrete | | na | na | na | na
Wrench_Port_Up | na | discrete | na | na | na | na | na
CONTROLLER_A:
PCM_Status | 0x0 | discrete | Fault | na | na | na | na
Attn_Sensor1 | 0x0 | discrete | Asserted | na | na | na | na
CPU-1_DTS_Temp | na | degrees C | na | na | na | -10.000 | 0.000
CPU-2_DTS_Temp | na | degrees C | na | na | na | -10.000 | 0.000
CPU0_PVCCP | na | Volts | na | 1.580 | 1.670 | 1.920 | 2.010
CPU1_PVCCP | na | Volts | na | 1.580 | 1.670 | 1.920 | 2.010