E-Series BMC unresponsive and might trigger false positive hardware alerts
Applies to
- Netapp E-Series
- SANtricity OS version is between 11.70.1R1 and 11.70.4 (BMC firmware is earlier than 14.10)
- Netapp EF300 and EF600
Issue
- The controller's BMC (Baseboard Management Controller) is unresponsive reported in
MEL
(Major Event Log):
A:10/29/21, 12:35:33 PM (12:35:33) 2800 2868 The controller's BMC was unresponsive and the recovery process successfully
recovered the BMC - Shelf 99, Bay A
A:10/29/21, 12:34:31 PM (12:34:31) 2799 2867 The controller's BMC is unresponsive - Shelf 99, Bay A
- The Major Event Log may also report false-positive hardware alerts like:
A:11/22/21, 11:16:25 AM (11:16:25) 1676 280b Controller shelf component failed - Shelf 99, Controller 1, Fan canister 5, Bay 1 <--CRITICAL
- E-Series support bundle and AutoSupport (DOM0-BMC-LOGS-%.7Z) include the following BMC event (sp_system_event_log.txt) indicating a wathcdog timeout reset was triggered:
740 | 01/01/2000 | 00:00:30 | Power Supply #0x72 | Presence detected | Asserted
741 | 01/01/2000 | 00:00:30 | Power Supply #0x73 | Presence detected | Asserted
742 | OEM record f2 | Watchdog1 Timeout
743 | OEM record f2 | Pilot Software reset
744 | 01/01/2000 | 00:00:36 | Battery #0x4f | State Deasserted
745 | 01/01/2000 | 00:00:38 | System Event #0xff | Timestamp Clock Sync | Asserted
746 | 11/16/2022 | 19:37:07 | System Event #0xff | Timestamp Clock Sync | Asserted