Cloud Volumes ONTAP node became unreachable by it's partner during upgrade to ONTAP 9.12.1
Applies to
- ONTAP 9.12.1
- CVO (Cloud Volumes ONTAP)
- AWS (Amazon Web Services)
- ANDU (Automated Non-Distruptive Upgrade)
- BlueXP
Issue
- ONTAP upgrade is started in BlueXP for an HA pair in AWS
- Node 2 upgrades normally and then node 1 upgrade is started
- Some time later, the BlueXP timeline states that the ONTAP upgrade failed and that node 1 is unhealthy
- Console logs show that node 1 is booting to the newly upgraded ONTAP version, but the cluster version is not upgraded yet
- Going to node 1 CLI, or the node is
waiting for giveback
or the node is online due to partial (root only) giveback - If the node is not waiting for giveback:
- The node can be connected to using it's management IP address
- There is message that the upgrade failed and to run
system node upgrade-revert upgrade
, but doing this does not help storage failover show
states there is a partial giveback and that node 1 iswaiting for local cluster applications to come online
cluster ring show
states all of node 1's RDB apps are offlinecluster ping-cluster
fails with no paths available to itself or the partnerhalt
and other commands fail withRPC timeout
- If the node is waiting for giveback:
- The node isn't reachable using it's management IP address
- Connecting to node 2,
storage failover show
states there is active takeover, however giveback can't be performed because the partner is unreachable
- The instance type of both nodes is supported
- AWS support is engaged reports that no issue is found on the backend