Cloud Volumes ONTAP node became unreachable by it's partner during upgrade to ONTAP 9.12.1
Applies to
- ONTAP 9.12.1
- CVO (Cloud Volumes ONTAP)
- AWS (Amazon Web Services)
- ANDU (Automated Non-Distruptive Upgrade)
- BlueXP
Issue
- ONTAP upgrade is started in BlueXP for an HA pair in AWS
- Node 2 upgrades normally and then node 1 upgrade is started
- Some time later, the BlueXP timeline states that the ONTAP upgrade failed and that node 1 is unhealthy
- Console logs show that node 1 is booting to the newly upgraded ONTAP version, but the cluster version is not upgraded yet
- Going to node 1 CLI, or the node is
waiting for givebackor the node is online due to partial (root only) giveback - If the node is not waiting for giveback:
- The node can be connected to using it's management IP address
- There is message that the upgrade failed and to run
system node upgrade-revert upgrade, but doing this does not help storage failover showstates there is a partial giveback and that node 1 iswaiting for local cluster applications to come onlinecluster ring showstates all of node 1's RDB apps are offlinecluster ping-clusterfails with no paths available to itself or the partnerhaltand other commands fail withRPC timeout
- If the node is waiting for giveback:
- The node isn't reachable using it's management IP address
- Connecting to node 2,
storage failover showstates there is active takeover, however giveback can't be performed because the partner is unreachable
- The instance type of both nodes is supported
- AWS support is engaged reports that no issue is found on the backend
