CONTAP-492512: Node reboot during NS224 shelf firmware upgrade
Issue
- After upgrading NSM100 firmware, a module may reboot into the new firmware without paths to disks.
- Storage port will be up, but no disks seen:
slot 4: Dual 40G/100G Ethernet Controller CX5
e4a MAC Address: 01:23:45:67:89:01 (auto-100g_sr4-fd-up)
QSFP Vendor: AVAGO
QSFP Part Number: xxxx-xxxxxxxx
QSFP Serial Number: xxxxxxxxxxxxx - DIMM error on the NSM100 may be seen:
[Node1: dsa_worker3: ses.status.dimm.error:error]: NS224NSM100 (S/N SHxxxxxxxxx) shelf 10 on channel 0x DIMM failure for Dimm Element 4: not installed or failed. This element is on the DIMM slot 4 in the top shelf module (A).
- If the first module upgraded hits the issue, one or both nodes may reboot unexpectedly when the second module is upgraded:
[Node1: fmmbx_instanceWorker: cf.multidisk.fatalProblem:error]: Node encountered a multidisk error or other fatal error while waiting to be taken over. Permanent errors on all HA mailbox disks (while marshalling header).