Upgrade from 10.3.x to 10.4 stalls
Applies to
StorageGRID Webscale 10.3.x
Issue
The overall upgrade process is stalling at the stage of 'Upgrade Grid Nodes' from the GMI (Grid Management Interface):
One storage node is in gray status from the grid topology:
Alarms of 'State Changed' for services like DDS, SSM, LDR are reported with the Trigger Value as Administratively Down.
/var/local/log/gdu-server.log
on the admin node shows the whole upgrade process is waiting for one storage node to finish its node-scope upgrade job :
I, [2018-03-30T04:05:19.932678 0008183] INFO -- gdu-server: Attempting to upgrade from 10.3.0.4 to 10.4.0...
I, [2018-03-30T04:05:20.924058 0008183] INFO -- gdu-server: Stopping all services
I, [2018-03-30T04:06:15.996732 0008183] INFO -- gdu-server: Backing up resource files
I, [2018-03-30T04:06:16.051361 0008183] INFO -- gdu-server: Packing persistent data
I, [2018-03-30T04:06:16.736361 0008183] INFO -- gdu-server: No config dir found. Could not write version file.
I, [2018-03-30T04:06:16.736722 0008183] INFO -- gdu-server: Upgrade phase 1 is complete.
I, [2018-03-30T04:06:16.736979 0008183] INFO -- gdu-server: This platform base-os is owned by storagegrid. Updating base-os packages
I, [2018-03-30T04:06:17.283990 0008183] INFO -- gdu-server: Starting base-os upgrade. This node will be stopped in 30 seconds.
I, [2018-03-30T04:06:17.564760 0008183] INFO -- gdu-server: Executing command `rm -f /etc/DoNotStartNode` on <IP_Address>
I, [2018-03-30T04:06:17.568264 0008183] INFO -- gdu-server: updategrid completed. Waiting for node to update base-os and reboot
The following errors are seen from /var/log/upgrade.log
on the storage node that is shown as gray in the grid topology:
Setting up pge-updater (10.4.0-20170324.1849.f3f1236) ...
/var/lib/dpkg/info/pge-updater.postinst:16:in `run_cmd': /sbin/pge_image_updater install failed: (RuntimeError)
lsblk: /dev/disk/by-label/pge-actv-root: not a block device
lsblk: /dev/disk/by-label/pge-actv-root: not a block device
from /var/lib/dpkg/info/pge-updater.postinst:58:in `<main>'
dpkg: error processing package pge-updater (--configure):
subprocess installed post-installation script returned error exit status 1
......
Errors were encountered while processing:
pge-updater
E: Sub-process /usr/bin/dpkg returned an error code (1)
Package install completed with exit code 100
Failed to install required base-os packages
Not cleaning up upgrade-stage
Rebooting...
WARNING: Ignoring /sbin/shutdown request while base-os upgrade flag exists.
Complete the base-os upgrade to restore /sbin/shutdown command behaviour