ONTAP upgrade failure on AFF-A900 node due to BMC communication issue
Applies to
- AFF A900 all-flash storage system
- ONTAP 9.12.1P11 to 9.15.1P16 upgrade
- MCC-IP (Multi-Cluster Consistency IP) environments
- Baseboard Management Controller (BMC)
Issue
- During an ONTAP upgrade from 9.12.1P11 to 9.15.1P16, the second node became unresponsive at boot and upgrade process stalled for over two hours..
console log:
---<<BOOT>>---NetApp Data ONTAP 9.15.1P16random: registering fast source Intel Secure Key RNGnvme0: doorbell stride #2.nvme0: 0% of timeout was used waiting for RDY.nvme0: 0% of timeout was used waiting for RDY.nvme0: Waiting on ctrlr at end of enable.nvme0: 0% of timeout was used waiting for RDY.nvd0: <0X331511900503A0SAM000PM9A30002T00025000> NVMe namespace sn:(S668NE0T301378)nvd0: 1831420MB (3750748848 512 byte sectors)IPMI device unit 0 rev. 1, firmware rev. 16.08, version 2.0, device support mask 0xbfIPMI device unit 1 rev. 1, firmware rev. 16.08, version 2.0, device support mask 0xbf
- From the BMC console, attempting a
system power cycle, but the system failed to boot and displayed the same message as before. - After waiting for two and a half hours, the system eventually booted up to the
Waiting for Givebackstate without any errors. - After checking the
EMS logs,splogserrors are occurring repeatedly, and SP-related logs are missing from both the Weekly and Full Autosupport reports.
EMS logs:
Mon Jan 26 05:30:00 +0900 [node-01: splog_main: splog_warnings_1:error]: params: {'sp_type': 'BMC', 'reason': 'splogd is running in degraded mode and having difficulty getting splogs from the SP FW'}
