Node down and multiple sensors display "No Reading"
Applies to
- FAS 8700
- ONTAP 9.7P8
Issue
Watchdog2
timer expired causes a node to reboot unexpectedly.
- The
BMC
log contains the following information.
c92 | 08/21/2024 | 06:08:35 | Entity Presence PSU1_VIN_TYPE_DC | Absent | Asserted
c93 | 08/21/2024 | 06:08:35 | Entity Presence PSU2_VIN_TYPE_DC | Absent | Asserted
c94 | 08/21/2024 | 06:08:35 | Event Logging Disabled SEL_Full | Log almost full | Asserted
c95 | 08/21/2024 | 06:08:36 | Event Logging Disabled SEL_Full | Log almost full | Asserted
c96 | 08/21/2024 | 06:08:58 | System Event | Timestamp Clock Sync | Asserted
c97 | 08/21/2024 | 06:08:58 | System Event #0xff | Timestamp Clock Sync | Asserted
c98 | 08/21/2024 | 06:09:01 | System Event #0xff | Timestamp Clock Sync | Asserted
c99 | 08/21/2024 | 06:09:01 | System Event | Timestamp Clock Sync | Asserted
c9a | 08/21/2024 | 06:09:01 | System Firmware Progress | Secondary CPU Initialization | Asserted
c9b | 08/21/2024 | 06:09:01 | System Firmware Progress | USB resource configuration | Asserted
c9c | 08/21/2024 | 06:09:07 | System Firmware Progress | PCI resource configuration | Asserted
c9d | 08/21/2024 | 06:09:08 | System Firmware Progress | Video initialization | Asserted
c9e | 08/21/2024 | 06:09:08 | System Firmware Progress | Keyboard controller initialization | Asserted
c9f | 08/21/2024 | 06:09:08 | System Firmware Progress | Hard-disk initialization | Asserted
ca0 | 08/21/2024 | 06:10:00 | Event Logging Disabled SEL_Full | Log almost full | Asserted
ca1 | 08/21/2024 | 06:10:02 | Event Logging Disabled SEL_Full | Log almost full | Asserted
ca2 | 08/21/2024 | 06:15:40 | Watchdog 2 Watchdog2 | Timer expired (BIOS FRB2) 0x00 | Asserted
ca3 | 08/21/2024 | 06:17:54 | Watchdog 2 Watchdog2 | Timer expired (OEM) | Asserted
- The multiple sensors display "
No Reading
".
Fan1_Present | 9Ch | ok | 29.1 | Present
Fan2_Present | 9Dh | ok | 29.2 | Present
Fan3_Present | 9Eh | ok | 29.3 | Present
Fan4_Present | 9Fh | ok | 29.4 | Present
CPU0_Present | A4h | ns | 3.1 | No Reading
CPU1_Present | A5h | ns | 3.2 | No Reading
RiserL_Present | A6h | ns | 11.1 | No Reading
RiserM_Present | A7h | ns | 11.2 | No Reading
RiserR_Present | A8h | ns | 11.3 | No Reading
Watchdog2 | B1h | ok | 6.1 |
SEL_Full | FEh | ok | 6.3 | Log almost full
Bat_Dsg_FET_Flt | C1h | ok | 40.1 | Presence Detected
Bat_Chg_FET_Flt | C2h | ok | 40.1 | Presence Detected
Bat_Pack_Invalid | C3h | ok | 40.1 | Presence Detected
Bat_Lrn_Active | C4h | ok | 40.1 | Not in progress
PSU1_VIN_TYPE_DC | B6h | ns | 0.0 | No Reading
PSU2_VIN_TYPE_DC | B7h | ns | 0.0 | No Reading
- The Primary BIOS timed out, using the backup BIOS.
[240804031137][BIOS]BIOSMonitorTask initalize to ENABLE_WATCHDOG (BMC is just on)
[240804031150][BIOS]BIOSMonitorTask change from ENABLE_WATCHDOG to NONE_TIMER because BIOS is completed
[240821060835][BIOS]TimerTask Go to ENABLE_WATCHDOG because power turns good
[240821060836][BIOS]BIOSMonitorTask change from ENABLE_WATCHDOG to MONITOR
[240821060836][BIOS]BIOS already set the timer (count = 3331, TmrUse = 0x41, ExpirationFlag = 0x0), t_bios_load_time=0
[240821060838][BIOS]Monitor: (count = 3307, TmrUse = 0x41, ExpirationFlag = 0x0), t_bios_load_time=330
[240821060847][BIOS]Monitor: (count = 3216, TmrUse = 0x41, ExpirationFlag = 0x0), t_bios_load_time=320
[240821060857][BIOS]Monitor: (count = 3115, TmrUse = 0x41, ExpirationFlag = 0x0), t_bios_load_time=310
[240821060910][BIOS]Monitor: (count = 3019, TmrUse = 0x41, ExpirationFlag = 0x0), t_bios_load_time=300
~
[240821061425][BIOS]Monitor: (count = 710, TmrUse = 0x41, ExpirationFlag = 0x0), t_bios_load_time=70
[240821061436][BIOS]Monitor: (count = 610, TmrUse = 0x41, ExpirationFlag = 0x0), t_bios_load_time=60
[240821061446][BIOS]Monitor: (count = 510, TmrUse = 0x41, ExpirationFlag = 0x0), t_bios_load_time=50
[240821061457][BIOS]Monitor: (count = 410, TmrUse = 0x41, ExpirationFlag = 0x0), t_bios_load_time=40
[240821061507][BIOS]Monitor: (count = 310, TmrUse = 0x41, ExpirationFlag = 0x0), t_bios_load_time=30
[240821061519][BIOS]Monitor: (count = 210, TmrUse = 0x41, ExpirationFlag = 0x0), t_bios_load_time=20
[240821061529][BIOS]Monitor: (count = 110, TmrUse = 0x41, ExpirationFlag = 0x0), t_bios_load_time=10
[240821061539][BIOS]Monitor: (count = 10, TmrUse = 0x41, ExpirationFlag = 0x0), t_bios_load_time=0
[240821061539][BIOS]BIOSMonitorTask change from MONITOR to CHANGE_BIOS because time out;
[240821061540][BIOS]BIOSMonitorTask change from CHANGE_BIOS to NONE_TIMER
[240821061540][BIOS]BIOSMonitorTask: Add SEL for BIOS FRB2 Timeout
[240821061541][BIOS]BIOSMonitorTask: ready to set up backup BIOS and power cycle
[240821061543][BIOS]PDK_PowerCycleChassis change to ENABLE_WATCHDOG
[240821061543][BIOS]BIOSMonitorTask: lock and stop watchdog
[240821061603][BIOS]BIOSMonitorTask change from ENABLE_WATCHDOG to MONITOR
[240821061603][BIOS]Set WDT for 100 sec
[240821061615][BIOS]Monitor: (count = 910, TmrUse = 0x45, ExpirationFlag = 0x4), t_bios_load_time=90
[240821061627][BIOS]Monitor: (count = 810, TmrUse = 0x45, ExpirationFlag = 0x4), t_bios_load_time=80
[240821061637][BIOS]Monitor: (count = 710, TmrUse = 0x45, ExpirationFlag = 0x4), t_bios_load_time=70
[240821061648][BIOS]Monitor: (count = 610, TmrUse = 0x45, ExpirationFlag = 0x4), t_bios_load_time=60
[240821061659][BIOS]Monitor: (count = 510, TmrUse = 0x45, ExpirationFlag = 0x4), t_bios_load_time=50
[240821061710][BIOS]Monitor: (count = 410, TmrUse = 0x45, ExpirationFlag = 0x4), t_bios_load_time=40
[240821061720][BIOS]Monitor: (count = 310, TmrUse = 0x45, ExpirationFlag = 0x4), t_bios_load_time=30
[240821061732][BIOS]Monitor: (count = 210, TmrUse = 0x45, ExpirationFlag = 0x4), t_bios_load_time=20
[240821061743][BIOS]Monitor: (count = 110, TmrUse = 0x45, ExpirationFlag = 0x4), t_bios_load_time=10
[240821061753][BIOS]Monitor: (count = 10, TmrUse = 0x45, ExpirationFlag = 0x4), t_bios_load_time=0
[240821061753][BIOS]BIOSMonitorTask change from MONITOR to CHANGE_BIOS because time out;
[240821061754][BIOS]BIOSMonitorTask change from CHANGE_BIOS to NONE_TIMER
[240821061754][BIOS]BIOSMonitorTask: Add SEL for OEM Timeout
[240821061754][BIOS]BIOSMonitorTask: already using backup BIOS
- During bootup, Console log displays "
NVDIMM in ERROR condition (status = 00000801).
"
login: BIOS Version: 16.7
PEI start.
CPU PEI initialization.
Wait BMC self-test result.
BMC self-test: OK.
BIOS Version: 16.7
PEI start.
CPU PEI initialization.
Wait BMC self-test result.
BMC self-test: OK.
UPI initialization.
CPU initialization.
Running full memory initialization.
CPU reset.
BIOS Version: 16.7
PEI start.
CPU PEI initialization.
Wait BMC self-test result.
BMC self-test: OK.
UPI initialization.
CPU initialization.
Running full memory initialization.
DIMM F0: NVDIMM in ERROR condition (status = 00000801).
SPI FLASH: Primary BIOS
PEI end.
DXE start.
USB initialization.
PCI host bridge initialization.
CSM initialization.
PCI Bus initialization start.
BDS start.
Console output devices connect.
BIOS Version: 16.7
PEI start.
CPU PEI initialization.
Wait BMC self-test result.
BMC self-test: OK.
UPI initialization.
CPU initialization.
Running full memory initialization.