Upgrade to StorageGRID 11.7 stuck on a Storage appliance in "Preparing for upgrade" step
Applies to
NetApp StorageGRID 11.7
Issue
- During the upgrade to 11.7 a storage node goes into an unknown state and the
upgrade.log
on the appliance is failing with the below Error:
/var/local/log/upgrade.log:
[2023-06-09T12:15:41.748364 #33987] ERROR -- : Failed to open TCP connection to <Admin node>:9999 (Connection timed out - connect(2) for "<Primary Admin Node>" port 9999) (Errno::ETIMEDOUT)
- Can see network isolation events from the remaining nodes when they attempt to communicate with the affected one.
dynip.log
WARNING -- Possible network isolation: Node has no contact with other nodes. If this warning persists, use the /usr/sbin/add_node_ip.py command to tell this node the address of another node in the grid. See the Recovery and Maintenance Guide for details.
- The Storage node cannot connect to any GRID node on any GRID port and the IP of the affected node is missing from the element
grid_ips
in nft ruleset. Checked by running the below command:
nft list ruleset
- Alternatively, the following command can be run from the primary admin node to verify if any other node is experiencing the issue.
run-each-node "wc -l /etc/ssh/ssh_known_hosts"