Skip to main content
NetApp Knowledgebase

FAQ: Troubleshooting clustered data ONTAP upgrades

 

Applies to

Data ONTAP

Answer

This article contains a list of most ONTAP Upgrade operational and troubleshooting workflows. However, it is not a comprehensive list.
This can be used to narrow your search to the more commonly utilized troubleshooting KBs, broken down to a specific category.

Overview  

Data ONTAP upgrades typically consists of multiple step processes. Every node in an ONTAP cluster that is upgraded must go through the following steps:

  • Download of the ONTAP software package file to the cluster from an accessible web server.
  • Installation of the ONTAP software package file as the secondary boot image on the storage controller's boot media.
  • Setting the newly installed boot image as primary boot image.
  • Rebooting (via storage failover takeover process) the storage controller to load the newly installed primary boot image.
  • After the controller has rebooted, ONTAP upgrade tasks are automatically executed after the controller has completed reboot.
  • Once the ONTAP upgrade tasks are completed, the storage controller is effectively upgraded.  
The above process is repeated on all storage controllers that are members of the cluster. Only after all controllers are upgraded is the entire cluster considered upgraded.

To resolve failures that occur during the upgrade process, it is crucial to identify which step in the process described above has failed.
Methods of Performing ONTAP Upgrades
With Clustered Data ONTAP 8.3 and newer, there are two methods of performing ONTAP upgrades non-disruptively:
  • Manual Upgrade:  Sometimes referred to as a non-disruptive upgrade, or NDU, the manual upgrade process involves many steps run on each storage controller individually.  These steps are performed by the storage administrator.  Based on the number of storage controllers configured in the cluster and the desired version of ONTAP that is being installed, the steps can be performed individually on each storage controller in a "rolling upgrade" manor or in parallel using the "batch upgrade" process.
  • Automated Upgrade: Referred to as an automated non-disruptive upgrade, or ANDU, the automated process greatly simplifies the number of steps involved to perform a cluster-wide ONTAP upgrade.  The automated upgrade system performs all the manual steps required to perform a cluster-wide ONTAP upgrade.  Depending on the number of storage controllers configured in the cluster and the desired version of ONTAP that is being installed, the automated upgrade process will perform the upgrade in a "rolling upgrade" manner, or "batch upgrade" manner.
The storage administrator must select which method to be used prior to beginning upgrade.  
Documentation
It is highly recommended to review the ONTAP " Upgrade and Revert/Downgrade Guide" which fully documents the ONTAP upgrade process as well as the " Release Notes" documents also documents the changes in ONTAP for each version. Every ONTAP version has an " Upgrade and Revert/Downgrade Guide" and " Release Notes" which documents any version specific information.

The Data ONTAP 8 documentation can be found here: Data ONTAP 8 Product Documentation
The ONTAP 9 documentation can be found here: ONTAP 9 Product Documentation

Select the appropriate software version on those pages and the following page will contain links to the " Upgrade and Revert/Downgrade Guide" and " Release Notes" for that version.
Upgrade Advisor Action Plan
It is also recommended to generate Upgrade Advisor action plan.  The Upgrade Advisor action plans are custom generated action plans for a given cluster.  Please visit the NetApp Active IQ website to generate these action plans.

For assistance on troubleshooting issues with generating an Upgrade Advisor action plan, please review the following KB:

Active IQ - Upgrade Advisor fails to generate
Troubleshooting Upgrade Problems
Complete Wipe And Re-initialization Of The Cluster
There are some limited cases where the storage administrator intends to completely wipe the ONTAP software, destroy all user data on all volumes and install a different version of ONTAP on the storage controller.  Instead of leveraging the ONTAP upgrade process to accomplish this, the ONTAP software can be installed, the disks attached to the storage controller can completely be wiped of data and re-initialized using the ONTAP special boot menu.  Please note that this procedure is disruptive and will wipe the storage controller of ALL data.

Refer to this KB article:
How to perform a software installation from the Data ONTAP boot menu


After the software installation is complete, wipe the storage controller of all data using this KB article:
How to wipe the configuration of a clustered Data ONTAP 8.x node and re-initialize it

Problems with Downloading ONTAP Software Package File To The Cluster From An Accessible Web Server
ONTAP utilizes each storage controller's node management logical inferface (LIF) to connect to a reachable web server in order to download the ONTAP software package file.  If there are problems with the "system image get", "system image update -package " or "cluster image package get" commands, this may indicate some issues with the following:
 
1) Looking up the IP address for the web server in DNS:  Verify the correct DNS servers that can resolve the IP address of the web server are configured for the admin SVM:
cluster1::> dns show -vserver
 

Test to see if the web server hostname is resolvable:
 
cluster1::> set advanced
cluster1::*> vserver services name-service getxxbyyy gethostbyname -node
-vserver -hostname
 
 
2) Unable to connect to the web server:  Use the ping utility to ensure that the web server is accessible from the node management LIF
Workaround: Use the "system image get" with IPv6 or use "cluster image package get" with IPv4.
Cluster image package get fails
In cases where using the " cluster image package get" command to download the ONTAP package file fails, try using the " system image get" (manual upgrade method) command to see if the package can be downloaded via the manual method.  If so, this may indicate a failure with the ONTAP subsystem that manages the automated upgrade method.  

To continue with the Automated nondisruptive upgrades (ANDU) you must use the "cluster image uppatde -version x.x" command but you need the image saved to the cluster image repository.
To do this, run the following to move the image from the etc/software dir to the repository.

Download the system image to the cluster repository:
example
::*> cluster image package get -url file:///mroot/etc/software/93P7_q_image.tgz

Check to ensure cluster image repository now shows the ONTAP 9.3P7 image:
::*> cluster image package show-repository

Check if each node has the image installed:
::>system node package show

If some nodes are missing the image , then you will need to log directly into the management interface of either node to download the cluster image.
So for instance, log into node02 s mgmt lif.
::> set advanced
::*> cluster image package get -url file:///mroot/etc/software/93P7_q_image.tgz)

Continue with Automated cluster upgrade "cluster image uppatde -version x.x"

While using the manual upgrade method can serve as a workaround for upgrading the cluster, it is recommended to contact NetApp Technical Support for further assistance with troubleshooting the failure with using the automated update method.
Troubleshooting Validation Warning Messages From "cluster image validate" Command
The " cluster image validate" (automated upgrade method) command performs a series of cluster-wide checks to ensure that the cluster can be upgraded non-disruptively.  Any errors or warnings that the validation operation reports will prevent the automated upgrade from beginning.  These must be resolved before continuing the upgrade.  Please refer to the " Error-Action" field in the " cluster image validate" output to identify the corrective action to take to resolve the errors or warning.  The following command can be run once the storage administrator has determined that any remaining errors or warnings can be safely ignored:
cluster1::> cluster image update -ignore-validation-warning true

Errors:

Error Description Solution
"Ensure the nodes being updated are running same version of Data ONTAP." seen during upgrade from 9.3 to 9.x in an MC config Bug 1142709

Troubleshooting Default Boot Image Setting
The ONTAP operating system is installed on the boot media device of the storage controller.  The default boot media device can store up to two ONTAP software images, one as the primary (default) boot image and the other as the secondary boot image.  Typically when a system boots the default boot image, that is the active (current) boot image in use.  The " system image show" command lists the information for each boot image and if it is default and current boot image.
cluster1::> system image show
                 Is      Is                                Install
Node     Image   Default Current Version                   Date
-------- ------- ------- ------- ------------------------- -------------------
cluster1-01
         image1  false   false   9.1P4                     8/12/2017 09:11:43
         image2  true    true    9.1P7                     8/31/2017 14:34:30
cluster1-02
         image1  false   false   9.1P4                     8/12/2017 09:15:21
         image2  true    true    9.1P7                     8/31/2017 14:34:52
4 entries were displayed.

During upgrade, the ONTAP software package is installed to non-active boot image and then is marked as the default boot image.  However, this only takes effect after a clean shutdown of the ONTAP operating system during storage failover takeover of the storage controller.  The " Setting default boot image to" message should appear on the console storage controller that is being upgraded just prior to ONTAP shutdown.  Here is an example of the messages seen:
Waiting for PIDS:  1244.
Terminated
.
Setting default boot image to image2... done.
Uptime: 7d2h51m23s

If the " Setting default boot image to" message never appears, this may indicate that ONTAP was not able to cleanly shut down.  The subsequent reboot will not load the image that was set as the default image and the storage controller will not undergo the upgrade.  If this occurs, please contact NetApp Technical Support for further assistance to determine why the storage controller was not able to cleanly shut down.
Resuming Automated Upgrades That Are Paused Due To Error
The automated update process will pause if it encounters an error situation.  For example, if storage giveback failed for a storage controller for some reason like a giveback veto, then automated update process will show " pause-on-error".  The storage administrator must correct the error condition in order to continue the upgrade.  Use the " cluster image show-update-progress" command to identify why the automated update process was paused.  The " Comments" field will identify why the automated update process was paused and possibly suggest corrective actions to take.  Once the corrective action has been taken, the automated update process can be resumed by using the " cluster image resume-update" command. 
Troubleshooting ONTAP Upgrade Task Failures
After a storage controller completes a reboot during an ONTAP upgrade, the system beings upgrading the controller's software configuration so that new software features can be made available once the entire cluster is completely upgraded.  These tasks automatically run in the background.  When logging into the storage controller after reboot, you may see a SYSTEM MESSAGE like this, which indicates the controller is running these background tasks:
The upgrade of this node is in progress or not completed. The ability to provide
data service to clients is not affected while the upgrade completes. You can
check on the status of the upgrade by running "system node upgrade-revert show"
in advanced privilege mode. The status for this node should be listed as
'complete'. If the upgrade has stopped, you can restart the upgrade by running
"system node upgrade-revert upgrade" in advanced privilege mode. If this command
does not complete the node's upgrade, contact technical support immediately. The
node will be ready for management operations once the upgrade is completed
successfully.

If these upgrade tasks were interrupted or encounter errors, the following SYSTEM MESSAGE may be observed:
The upgrade is not complete: an upgrade task aborted. This node is not fully
operational. Contact support personnel for the upgrade repair procedure.


or 

One or more upgrade tasks on this node failed. This node is not fully
operational. Contact support personnel for the upgrade repair procedure.

The status of these upgrade tasks can be found by running the following advanced privilege level commands:
cluster1::> set advanced
cluster1::*> system node upgrade-revert show
cluster1::*> system node upgrade-revert show -task-status

If there are any failed or aborted upgrade tasks, the following command maybe used to restart or re-run those task(s):
cluster1::*> system node upgrade-revert upgrade -node

If the upgrade task continues to fail, then please contact NetApp Technical Support for further assistance.
Troubleshooting Mixed Version Message That Appears During Upgrade
In the process of upgrading ONTAP on cluster configurations greater than two storage controllers, during the upgrade where some storage controllers have completed the upgrade and other are still yet to be upgraded, the cluster is considered in a mixed version state.  When logging into the cluster, you may see the follow SYSTEM MESSAGE displayed:
Warning: The cluster is in a mixed version state. Update all of the nodes to
the same version as soon as possible.

When the cluster is in a mixed version state, the cluster continues to operate and behave as the old version installed without the new features of the newer ONTAP version.  Only once all storage controllers have successfully upgraded to the new version is the entire cluster considered upgraded and new features are available for use.

The version of ONTAP software is tracked in 3 ways:
 
1) The version of ONTAP that the software booted on the storage controller.  This can be checked with the following command:
cluster1::> node run -node * -command version

2) The effective version of ONTAP that the node configuration has been upgraded to.  This can be checked with the following command:
cluster1::> version -node *

3) The effective version of ONTAP that the cluster configuration has been upgraded to. This can be checked with the following command:
cluster1::> version


ONTAP is designed to remain operational and serving data during a mixed version state, however it is not recommended to remain in mixed version state for longer than the time it takes to upgrade the entire cluster.  It is also highly discouraged to make any configuration changes to the cluster while the cluster is in a mixed version state.

The cluster can also enter mixed version state when a storage controller with a newer version of ONTAP is joined to a cluster that is older version.  If this occurs, then upgrade the rest of the cluster to the newer ONTAP version.

Additional Information

N/A