CONTAP-416215: Disks failing after NSM module rebooting due to Watchdog reset
Issue
Disk marked failed due to faulty NSM100 module.
In EMS (multiple) Watchdog reset reboots are reported:
Tue Dec 17 2024 03:43:53 GMT [NETAPP-FLASH-02: storlog_admin: sla.shelf.message: DEBUG] params: {'type': 'SEVERITY', 'log': 'Tue Dec 17 03:42:44 2024 ( 0+00:00:53.210); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 4-Watchdog reset (regVal:0xC4)'}
Tue Dec 17 2024 03:55:55 GMT [NETAPP-FLASH-02: storlog_admin: sla.shelf.message: DEBUG] params: {'type': 'SEVERITY', 'log': 'Tue Dec 17 03:54:39 2024 ( 0+00:00:53.244); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 4-Watchdog reset (regVal:0xC4)'}
Tue Dec 17 2024 04:19:39 GMT [NETAPP-FLASH-02: storlog_admin: sla.shelf.message: DEBUG] params: {'type': 'SEVERITY', 'log': 'Tue Dec 17 04:18:12 2024 ( 0+00:00:53.239); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 4-Watchdog reset (regVal:0xC4)'}
Next drives get marked for failure:
Tue Dec 17 2024 04:22:49 GMT [NETAPP-FLASH-02: hamsg: scsi.: DEBUG] shm_setup_for_failure disk e1b.01.3.20 (S/N XXX3327) error 80000h
Tue Dec 17 2024 04:22:49 GMT [NETAPP-FLASH-02: disk_server_0: scsi.: DEBUG] shm_setup_for_failure disk e2a.01.2.20 (S/N XXX3327) error 40000000h
Tue Dec 17 2024 04:22:59 GMT [NETAPP-FLASH-02: disk_server_1: scsi.: DEBUG] shm_setup_for_failure disk e2a.01.2.8 (S/N XXX3322) error 40000000h
Tue Dec 17 2024 04:22:59 GMT [NETAPP-FLASH-02: disk_server_1: scsi.: DEBUG] shm_setup_for_failure disk e2a.01.2.2 (S/N XXX3318) error 40000000h