Node stuck in boot loop during ONTAP upgrade causes up-node reboot
Applies to
-
AFF-A250
-
ONTAP upgrade
Issue
- During ONTAP upgrade, one node does not reboot properly and enters a boot loop.
- Node shows errors to a card that hosts disk shelves:
Device Bus:23 Dev:0 Fun:0 (slot 1) failed to train at max link speed/width
- Errors seen against disks behind the card called out:
[node1:diskown.errorDuringIO:error]: error 23 (adapter error prevents command from being sent to device) on disk 1d.00.11 (S/N xxxxxxxx) while reading reservation state
[node1:raid.config.filesystem.disk.not.responding:notice]: File system Disk 1a.00.11 Shelf 0 Bay 11 [NETAPP X357_S163A3T8ATE NA54] S/N [xxxxxxxx] is not responding.
[node1:scsi.cmd.abortedByHost:error]: Unknown device 1d.00.11: Command aborted by host adapter: HA status 0x4: cdb 0x12.
- When attempting to work on disks or boot the node into ONTAP, the up-node reboots unexpectedly. Example:
Node node2 encountered PANIC: aggr aggr0_node2: raid volfsm, fatal multi-disk error.
- EMS logs from up-node show a SK halt:
[node2: shutdown_thread0: kern.shutdown.initiator:debug]: SK halt was initiated by "maytag.ko::shutdown_appliance_real+8270"
- Issue persists through Card and Motherboard replacements.
- System confirmed to have proper power source.
