Skip to main content
NetApp Knowledge Base

Troubleshooting ONTAP upgrades

Views:
8,798
Visibility:
Public
Votes:
3
Category:
ontap-9
Specialty:
core
Last Updated:

Applies to

  • ONTAP 9
  • Clustered Data ONTAP 8

Answer

  • This article contains a list of ONTAP Upgrade operational and troubleshooting workflows. However, it is not a comprehensive list.
  • This can be used to narrow your search to the more commonly utilized troubleshooting KBs, broken down to a specific category.
Troubleshooting upgrade problems
Complete wipe and re-initialization of cluster
  • There are limited cases where the storage administrator intends to completely wipe the ONTAP software, destroy all user data on all volumes and install a different version of ONTAP on the storage controller.
  • Instead of leveraging the ONTAP upgrade process to accomplish this, the ONTAP software can be installed, the disks attached to the storage controller can completely be wiped of data and re-initialized using the ONTAP special boot menu.  

This procedure is disruptive and will wipe the storage controller of all data.

For more information, see How to perform a software installation from the Data ONTAP boot menu

Problems with downloading ONTAP Software Package File to the cluster from accessible web server
  • ONTAP utilizes each storage controller's node management logical inferface (LIF) to connect to a reachable web server to download the ONTAP software package file. If there are problems with the 'system image get', 'system image update -package' or 'cluster image package get' commands, this may indicate  issues with the following:
  • Looking up the IP address for the web server in DNS:
    • Verify the correct DNS servers that can resolve the IP address of the Web server are configured for the admin SVM: cluster1::> dns show -vserver
    • Test to see if the web server hostname is resolvable:
cluster1::> set advanced
cluster1::*> vserver services name-service getxxbyyy gethostbyname -node
-vserver -hostname
  • Unable to connect to the web server:
    Use the ping utility to ensure that the web server is accessible from the node management LIF
    Workaround: Run 'system image get' command with IPv6 or run the 'cluster image package get' command with IPv4.
Cluster image package get fails
  • In cases where running the 'cluster image package get' command to download the ONTAP package file fails, try running the 'system image get' (manual upgrade method) command to see if the package can be downloaded via the manual method.
  • If so, this may indicate a failure with the ONTAP subsystem that manages the automated upgrade method.
  • To continue with the Automated Nondisruptive Upgrades (ANDU), run the 'cluster image update -version x.x' command, but save the image saved to the cluster image repository.
  • To do this, run the following to move the image from the etc/software dir to the repository:
1. Download the system image to the cluster repository
Example: ::*> cluster image package get -url file:///mroot/etc/software/93P7_q_image.tgz
2. Check to ensure cluster image repository now shows the ONTAP 9.3P7 image
::*> cluster image package show-repository
3. Check if each node has the image installed
::>system node package show
4. If some nodes are missing the image, then log directly into the management interface of either nodes to download the cluster image. So for instance, log into node02 s mgmt lif.
::> set advanced
::*> cluster image package get -url file:///mroot/etc/software/93P7_q_image.tgz)
  1. Continue with automated cluster upgrade 'cluster image update -version x.x'
  • While using the manual upgrade method can serve as a workaround for upgrading the cluster, it is recommended to contact NetApp Technical Support for further assistance with troubleshooting the failure with using the automated update method.
Troubleshooting validation warning messages from the 'cluster image validate' Command
  • The 'cluster image validate' (automated upgrade method) command performs a series of cluster-wide checks to ensure that the cluster can be upgraded non-disruptively.
  • Any errors or warnings that the validation operation reports will prevent the automated upgrade from beginning. These must be resolved before continuing the upgrade.
  • Refer to the 'Error-Action' field in the 'cluster image validate' output to identify the corrective action to take to resolve the errors or warning.
  • The following command can be run once the storage administrator has determined that any remaining errors or warnings can be safely ignored:
cluster1::> cluster image update -ignore-validation-warning true

Errors:

Error Description Solution
Ensure the nodes being updated are running same version of Data ONTAP Seen during upgrade from 9.3 to 9.x in an MC config Bug 1142709

Troubleshooting default boot image setting
  • ONTAP operating system is installed on the boot media device of the storage controller.
  • The default boot media device can store up to two ONTAP software images, one as the primary (default) boot image and the other as the secondary boot image. Typically when a system boots the default boot image, that is the active (current) boot image in use.
  • The 'system image show'  command lists the information for each boot image and if it is default and current boot image.
cluster1::> system image show
                 Is      Is                                Install
Node     Image   Default Current Version                   Date
-------- ------- ------- ------- ------------------------- -------------------
cluster1-01
         image1  false   false   9.1P4                     8/12/2017 09:11:43
         image2  true    true    9.1P7                     8/31/2017 14:34:30
cluster1-02
         image1  false   false   9.1P4                     8/12/2017 09:15:21
         image2  true    true    9.1P7                     8/31/2017 14:34:52
4 entries were displayed.
  • During upgrade, the ONTAP software package is installed to non-active boot image and then is marked as the default boot image. However, this only takes effect after a clean shutdown of the ONTAP operating system during storage failover takeover of the storage controller.
  • The 'Setting default boot image to' message should appear on the console storage controller that is being upgraded just prior to ONTAP shutdown.
Example of the messages seen:
Waiting for PIDS:  1244.
Terminated
.
Setting default boot image to image2... done.
Uptime: 7d2h51m23s
  • If the 'Setting default boot image to' message never appears, this may indicate that ONTAP was not able to cleanly shut down. The subsequent reboot will not load the image that was set as the default image and the storage controller will not undergo the upgrade.
  • If this occurs, contact NetApp Technical Support for further assistance to determine why the storage controller was not able to cleanly shut down.
Resuming automated upgrades that are paused due to error
  • The automated update process will pause if it encounters an error situation. For example, if storage giveback failed for a storage controller for some reason like a giveback veto, then automated update process will show 'pause-on-error'.
  • The storage administrator must correct the error condition in order to continue the upgrade. Run the 'cluster image show-update-progress' command to identify why the automated update process was paused.
  • The 'Comments' field will identify why the automated update process was paused and possibly suggest corrective actions to take.
  • Once the corrective action has been taken, the automated update process can be resumed by running the 'cluster image resume-update' command.

Notes:

  • There are multiple reasons an upgrade can be in a 'pause-on-error' state. Make sure to log into the node currently being upgraded via its console or SP/BMC to confirm its status.

  •  If the node is at the LOADER prompt, a boot_ontap command can be run to attempt to bring the node online. If the node will not boot or is in a boot/panic loop, contact NetApp Technical Support for further assistance.  

Troubleshooting ONTAP upgrade task failures
  • After a storage controller completes a reboot during an ONTAP upgrade, the system begins upgrading the controller's software configuration so that new software features can be made available once the entire cluster is completely upgraded. These tasks automatically run in the background.
When logging into the storage controller after reboot, you may see a SYSTEM MESSAGE, which indicates the controller is running these background tasks
The upgrade of this node is in progress or not completed. The ability to provide
data service to clients is not affected while the upgrade completes. You can
check on the status of the upgrade by running "system node upgrade-revert show"
in advanced privilege mode. The status for this node should be listed as
'complete'. If the upgrade has stopped, you can restart the upgrade by running
"system node upgrade-revert upgrade" in advanced privilege mode. If this command
does not complete the node's upgrade, contact technical support immediately. The
node will be ready for management operations once the upgrade is completed
successfully.
If these upgrade tasks are interrupted or encounter errors, the SYSTEM MESSAGE may be observed
The upgrade is not complete: an upgrade task aborted. This node is not fully
operational. Contact support personnel for the upgrade repair procedure.

or 

One or more upgrade tasks on this node failed. This node is not fully
operational. Contact support personnel for the upgrade repair procedure.
To find the status of these upgrade tasks, run advanced privilege level commands
cluster1::> set advanced
cluster1::*> system node upgrade-revert show
cluster1::*> system node upgrade-revert show -task-status
If there are any failed or aborted upgrade tasks, commands may be used to restart or re-run those task(s)
cluster1::*> system node upgrade-revert upgrade -node
Troubleshooting mixed version message that appears during upgrade
  • In the process of upgrading ONTAP on cluster configurations greater than two storage controllers, during the upgrade where some storage controllers have completed the upgrade and other are still yet to be upgraded, the cluster is considered in a mixed version state.
When logging into the cluster, you may see the SYSTEM MESSAGE displayed
Warning: The cluster is in a mixed version state. Update all of the nodes to
the same version as soon as possible.
  • When the cluster is in a mixed version state, the cluster continues to operate and behave as the old version installed without the new features of the newer ONTAP version. Only once all storage controllers have successfully upgraded to the new version is the entire cluster considered upgraded and new features are available for use.

The version of ONTAP software is tracked in 3 ways:

The version of ONTAP that the software booted on the storage controller.
This can be checked with the command: cluster1::> node run -node * -command version
The effective version of ONTAP that the node configuration has been upgraded to
This can be checked with the command: cluster1::> version -node *
The effective version of ONTAP that the cluster configuration has been upgraded to.
This can be checked with the command: cluster1::> version
  • ONTAP is designed to remain operational and serving data during a mixed version state, however, it is not recommended to remain in mixed version state for longer than the time it takes to upgrade the entire cluster. It is also highly discouraged to make any configuration changes to the cluster while the cluster is in a mixed version state.
  • The cluster can also enter mixed version state when a storage controller with a newer version of ONTAP is joined to a cluster that is older version. If this occurs, then upgrade the rest of the cluster to the newer ONTAP version.

Additional Information

N/A