How to perform graceful shutdown and power up of all ONTAP nodes in a cluster

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 127,943

Visibility:: Public

Votes:: 230

Category:: ontap-9

Specialty:: HW

Last Updated:

Applies to

ONTAP 9
AFF systems
FAS systems
Excludes MetroCluster configurations

Description

Several events might require a graceful shutdown of ONTAP equipment such as:

Scheduled site power outage.
Data center-wide maintenance.
Physical system move.
Preparation for future re-purposing of equipment.

Requirements

Local admin credentials for ONTAP.
Ensure SP/BMC for each controller is accessible.
Stop all clients/hosts and external backup jobs from accessing data on the NetApp system - refer to How to View Active Client Connections by Protocol in ONTAP.
Personnel onsite to perform physical equipment tasks following preparation for onsite maintenance.
If using a MetroCluster:
- Do not use this procedure, follow either Powering off an entire MetroCluster FC configuration or Powering off an entire MetroCluster IP configuration instead.
If using SSDs:
- Refer to SU490: [Impact: Critical] SSD Best Practices: Avoid risk of drive failure and data loss if powered off for more than two months.
If using NetApp Onboard Key Management (OKM) and configured for CC-mode:
- The cluster-wide passphrase available and tested.
If using NetApp StorageGRID or ONTAP S3 used as FabricPool cloud tier:
- Refer to the Gracefully shutdown and power up your storage system Resolution Guide for the graceful shutdown procedure to perform for those systems after performing the procedure in this article.
If using FlexArray array LUNs:
- Follow the specific vendor storage array documentation for the shutdown procedure to perform for those systems after performing the procedure in this article.

Prior to shutdown
(Best practices)

Perform additional system health checks.
Upgrade ONTAP to a recommended release.
Resolve any Active IQ Wellness Alerts and Risks.
Run Active IQ Config Advisor.
Make a note of any faults presently on the system - amber LEDs on controllers, shelves, IOMs, disks, PSUs, etc.

Identifying hardware components

The ONTAP system is comprised of one or more of the following components. Use the following links for helpful details and images to assist onsite personnel with locating and identifying the equipment.

Controllers:
Disk Shelves
Switches:
- Cluster interconnect
- Cluster management
- Storage
- FC

Procedure

WARNING

This procedure will shut down all nodes within the cluster and access to data on the cluster will not be possible until the system is powered back up.

ONTAP Shutdown Process

Log in to cluster via SSH. Otherwise, log in from any node in the cluster using a local console cable.
Generate a case suppression AutoSupport message for the expected duration of the shutdown event along with any descriptive text:

cluster1::> system node autosupport invoke -node * -type all -message "MAINT=8h Power Maintenance"

Identify the SP/BMC IP address of all nodes:

cluster1::> system service-processor show -node * -fields address

node address -------------- ------------ cluster1-01 10.10.10.10 cluster1-02 10.10.10.20 cluster1-03 10.10.10.30 cluster1-04 10.10.10.40

Exit clustershell:

cluster1::> exit

Connect to SP/BMC over SSH using the IP address of any node from Step 3. Otherwise, connect a local console cable to the node. Log in using the same cluster administrator credentials.

If accessing via the SP/BMC prompt, switch to system console and supply the cluster administrator credentials:

login as: admin admin@10.10.10.10's password: <password> SP cluster1-01> system console Type Ctrl-D to exit. SP-login: admin Password: <password> cluster1::>

Note: Open an SSH session window to every SP/BMC for monitoring as described in this step.

Halt all nodes in the cluster:

For most cluster configurations:

cluster1::> system node halt -node * -skip-lif-migration-before-shutdown true -ignore-quorum-warnings true -inhibit-takeover true

For clusters with SnapMirror Synchronous operating in StrictSync mode:

cluster1::> system node halt -node * -skip-lif-migration-before-shutdown true -ignore-quorum-warnings true -inhibit-takeover true -ignore-strict-sync-warnings true

Respond to the prompt for each node:

Warning: Are you sure you want to halt node "cluster1-01"? {y|n}: y Warning: Are you sure you want to halt node "cluster1-02"? {y|n}: y Warning: Are you sure you want to halt node "cluster1-03"? {y|n}: y Warning: Are you sure you want to halt node "cluster1-04"? {y|n}: y 4 entries were acted on.

Wait for each node to halt completely by reaching the LOADER prompt:

LOADER-A>

Connect to each node in the cluster via SP/BMC (if not already connected) or using a local console cable and confirm each node is at the LOADER prompt (as in Step 8).

Power Down Activity

WARNING

Do not proceed until the full ONTAP Shutdown Process has been completed.

Power Down Locally (Preferred)

The physical activity here ensures no equipment damage occurs while the system is powered down and ensures the correct order of equipment startup is followed, so that the ONTAP system is prepared to serve data after the event is complete.

Toggle each PSU rocker switch to the OFF position on each piece of equipment.

Note: Some PSUs do not have rocker switches.

Remove the power cable connecting each PSU to the PDU.
Visually confirm each component has successfully powered off.
Ensure that all controllers, disk shelves, and switches associated with the cluster are powered down.

Power Down Remotely

Connect to the SP/BMC over SSH.
Confirm the node has reached the LOADER prompt by switching to system console:

login as: admin admin@10.10.10.10's password: <password> SP cluster1-01> system console Type Ctrl-D to exit. LOADER-A>

OPTIONAL: To prevent unintentional bootup after power on, disable AUTOBOOT:

LOADER-A> printenv AUTOBOOT AUTOBOOT true

LOADER-A> setenv AUTOBOOT false

LOADER-A> saveenv

Return to the SP/BMC prompt and remote power off the node.

SP/BMC> system power off

Confirm the shutdown. The warning "This will cause a dirty shutdown of your appliance." can be ignored only after a clean shutdown and the node is at the LOADER prompt, any other use can cause loss of data.

This will cause a dirty shutdown of your appliance. Continue? [y/n] y

Confirm the power is shown as off:

SP/BMC> system power status

Chassis Power is off

When running the command "system power status", it may return "Host Power is off" on some models, rather than "Chassis Power is off"

Repeat the previous steps for the other nodes being shut down.

Power Up Activity

When the system is ready to be powered back up following the steps in How to power up all ONTAP nodes in a cluster following a graceful shutdown

Additional Information

What is the procedure to gracefully shutdown and power up your storage system during a scheduled power outage?