StorageGRID hotfix upgrade stuck and unable to communicate with node due to unresponsive services: ssm
Applies to
- StorageGRID
Issue
When applying a hotfix the process gets stuck on Unresponsive services: ssm
.
Node Alerts:
Hotfix update page reports:
gdu-server.latest.log:
INFO -- install: Executing command `/var/local/tmp/hotfix-stage/hotfix/post-hotfix-install --apply 2>&1` on localhost
ERROR -- install: Error processing hotfix
ERROR -- install: No such file or directory @ rb_sysopen - /var/local/tmp/hotfix-stage/install
ERROR -- install: /var/local/tmp/hotfix-stage/install:327:in `initialize'
ERROR -- install: /var/local/tmp/hotfix-stage/install:327:in `open'
ERROR -- install: /var/local/tmp/hotfix-stage/install:327:in `install'
ERROR -- install: /var/local/tmp/hotfix-stage/install:673:in `<main>'
No such file or directory @ rb_sysopen - /var/local/tmp/hotfix-stage/install
ERROR -- Hotfix file execution failed:
Hotfix file execution failed:
ERROR -- gdu-server: Hotfix file execution failed:
(Bycast::Tasks::GduServer::SoftwareUpdate::InstallError)
ERROR -- gdu-server: /usr/local/lib/site_ruby/bycast/tasks/gdu-server/software-update.rb:879:in `install'
ERROR -- gdu-server: /usr/local/lib/site_ruby/bycast/tasks/gdu-server/software-update.rb:798:in `apply_to_pa_node'
ERROR -- gdu-server: /usr/local/lib/site_ruby/bycast/tasks/gdu-server/software-update.rb:777:in `call'
ERROR -- gdu-server: Failed to apply StorageGRID hotfix to ul1sgan01: Hotfix file execution failed:
INFO -- gdu-server: Software update task complete at 2024-07-25T21:52:12.393Z
ERROR -- gdu-server: Failure occurred during Software Update
ERROR -- gdu-server: #<Bycast::Tasks::GduServer::SoftwareUpdate::LocalizedError: key: "software_update.error", options: {:type=>"StorageGRID hotfix", :reason=>"Failed to apply the update to one or more nodes"} >
ERROR -- gdu-server: /usr/local/lib/site_ruby/bycast/tasks/gdu-server/software-update.rb:477:in `finish'
ERROR -- gdu-server: /usr/local/lib/site_ruby/bycast/tasks/gdu-server/software-update.rb:811:in `rescue in apply_to_pa_node'
ERROR -- gdu-server: /usr/local/lib/site_ruby/bycast/tasks/gdu-server/software-update.rb:787:in `apply_to_pa_node'
ERROR -- gdu-server: /usr/local/lib/site_ruby/bycast/tasks/gdu-server/software-update.rb:777:in `call'
ERROR -- gdu-server: /usr/local/lib/site_ruby/bycast/gdu-server.rb:585:in `process_request'
ERROR -- gdu-server: /usr/local/lib/site_ruby/bycast/gdu-server.rb:293:in `block in serve'
ERROR -- gdu-server: /usr/local/lib/site_ruby/bycast/gdu-server.rb:292:in `each'
ERROR -- gdu-server: /usr/local/lib/site_ruby/bycast/gdu-server.rb:292:in `serve'
ERROR -- gdu-server: /usr/local/lib/site_ruby/bycast/gdu-server.rb:143:in `block (2 levels) in start'
WARN -- gdu-server: Unable to send response to the client because of a closed connection.
WARN -- gdu-server: Details: #<Errno::EPIPE: Broken pipe>
WARN -- gdu-server: [{"action":"Software Update","message":"","key":"software_update.error","options":{"type":"StorageGRID hotfix","reason":"Failed to apply the update to one or more nodes","developer_message":"Failure occurred during Software Update"},"status":1}]
INFO -- gdu-server: Executing command `/usr/sbin/execute-hotfix.py -src /var/local/hotfix/current-hotfix -destDir /var/local/tmp/hotfix-stage -cleanup -- --version 2>&1` on localhost
INFO -- gdu-server: Executing command `/usr/sbin/execute-hotfix.py -src /var/local/hotfix/current-hotfix -destDir /var/local/tmp/hotfix-stage -cleanup: