StorageGRID hotfix times out on primary admin node and storage nodes get stuck in queue
Applies to
- StorageGRID 11.8.0.6 and below
- StorageGRID 11.7.0.11 and below
Issue
- Applying hotfix fails with a timeout error on primary admin node but completes successfully if it is reapplied.
- The rest of the nodes get stuck in queue after hotfix completes on admin node.
- Removing and adding nodes to apply the hotfix does not fix the issue.
- Execution failure is observed in
gdu-server.log:
Failed to execute sh command with 'run-host-command /var/local/dkr/0/tmp/hotfix-stage/hotfix/base-os-install --apply 2>&1 2>&1' on localhost - 127, bash: line 1: /var/local/dkr/0/tmp/hotfix-stage/hotfix/base-os-i nstall: No such file or directory
[2024-07-23T14:59:15.467] ERROR -- Hotfix file execution failed:
Hotfix file execution failed:
E, [2024-07-23T14:59:15.473742 #7875] ERROR -- gdu-server: Hotfix file execution failed:
(Bycast::Tasks::GduServer::SoftwareUpdate::InstallError)
E, [2024-07-23T14:59:15.473791 #7875] ERROR -- gdu-server: /usr/local/lib/site_ruby/bycast/tasks/gdu-server/software-update.rb:879:in `install'
E, [2024-07-23T14:59:15.473852 #7875] ERROR -- gdu-server: /usr/local/lib/site_ruby/bycast/tasks/gdu-server/software-update.rb:798:in `apply_to_pa_node'
E, [2024-07-23T14:59:15.473863 #7875] ERROR -- gdu-server: /usr/local/lib/site_ruby/bycast/tasks/gdu-server/software-update.rb:777:in `call'
E, [2024-07-23T14:59:15.473882 #7875] ERROR -- gdu-server: Failed to apply StorageGRID hotfix to samplean01: Hotfix file execution failed:
I, [2024-07-23T14:59:15.474082 #7875] INFO -- gdu-server: Software update task complete at 2024-07-23T14:59:15.474Z
E, [2024-07-23T14:59:15.474362 #7875] ERROR -- gdu-server: Failure occurred during Software Update
