Sudden shelf module reboot causing both nodes to panic
Applies to
- NS224 Disk Shelf
- NVMe shelf module (NSM) firmware version below 0151
Issue
- Shortly after ONTAP upgrade to 9.8P6/P7 both nodes reboot due to multi-disk panic
- Multiple
NoPathToNSMA_Alert
reported over a regular basis -
NSM100 fw upgrade 0141 -> 0151 causing panic
-
Dec 26th all NSM firmware is on rev 0141
Shelf 0: NS224NSM100 Firmware rev. NSM100 A: 0141 NSM100 B: 0141
Shelf 1: NS224NSM100 Firmware rev. NSM100 A: 0141 NSM100 B: 0141 -
NSM-A firmware upgrade started 0141 -> 0151
Mon Dec 27 06:03:19 +0100 [node_name: dsa_worker0: sfu.downloadingController:info]: [storage download shelf]: Downloading NSM100.0151.SFW on disk shelf controller module A on 0x.shelf
-
Node panic occurred
Mon Dec 27 06:05:25 +0100 [node_name: config_thread: sk.panic:alert]: Panic String: aggr aggr_root: raid volfsm, fatal multi-disk error.. Raid type - raid_dp Group name plex0/rg0 state NORMAL. 10 disks failed in the group.
-
HA Node rebooted
Mon Dec 27 06:11:48 +0100 [node_name2: send_boot_msg_thread: mgr.boot.reason_ok:notice]: System rebooted after power-on.
-
DEC 28th all NSM firmware on rev 0151
Shelf 0: NS224NSM100 Firmware rev. NSM100 A: 0151 NSM100 B: 0151
Shelf 1: NS224NSM100 Firmware rev. NSM100 A: 0151 NSM100 B: 0151