Skip to main content
Effective December 3 - NetApp adopts Microsoft’s Business-to-Customer (B2C) identity management to simplify and provide secure access to NetApp resources. For accounts that did not pre-register (prior to Dec 3) access to your NetApp data may take up to 1 hour as your legacy NSS ID is synchronized to the new B2C identity. To learn more, Read the FAQ and Watch the video.
NetApp Knowledge Base

How to perform graceful shutdown and power up of all ONTAP nodes in a cluster?

Views:
23,883
Visibility:
Public
Votes:
27
Category:
ontap-9
Specialty:
hw
Last Updated:

Applies to

  • ONTAP 9
  • AFF models
  • FAS models
  • Excluding MetroCluster

Description

Introduction

Several events might require a graceful shutdown of ONTAP equipment such as:

  • Scheduled site power outage
  • Data center wide maintenance
  • Physical system move
  • Preparation for future re-purposing of equipment
Prerequisites
  • This procedure is for non-MetroCluster configurations only
  • Local admin password for ONTAP 9
  • If using NetApp onboard key management (OKM), the cluster-wide passphrase available
  • Ensure SP/BMC for each controller is accessible
  • Stop all clients/hosts from accessing data on the NetApp system
  • Suspend external backup jobs
  • Personnel onsite to perform physical equipment tasks
  • General preparation for onsite maintenance
Prior to shutdown best practices
Identifying hardware components

The ONTAP system is comprised of one or more of the following components. Use the following links for helpful details and images to assist onsite personnel with locating and identifying the equipment.

Procedure

Performing graceful shutdown
Important: This procedure will shut down all nodes within the cluster and will make access to data on the cluster unavailable until the system is powered back up.
  1. Login to cluster via SSH. Otherwise, login from any node in the cluster using a local console cable.
  2. Generate a case suppression AutoSupport for the expected duration of the shutdown event and any descriptive text:

cluster1::> system node autosupport invoke -node * -type all -message "MAINT=8h Power Maintenance"

  1. Identify SP/BMC IP address of all nodes:

cluster1::> system service-processor show -node * -fields address
node address
-------------- ------------
cluster1-01 10.10.10.10
cluster1-02 10.10.10.20
cluster1-03 10.10.10.30
cluster1-04 10.10.10.40

  1. Exit cluster shell:

cluster1::> exit

  1. Connect to SP/BMC over SSH using the IP address of any node from step #3. Otherwise, connect a local console cable to the node. Login using the same cluster administrator credentials. 

If accessing via the SP/BMC prompt, switch to system console and supply the cluster administrator credentials:

SP-login: login:
login as: admin
admin@10.10.10.10's password:
SP cluster1-01> system console
Type Ctrl-D to exit.
SP-login: admin
Password:
cluster1::>​​​​

Note: Open an SSH session window to every SP for monitoring as described in this step.

  1. Halt all the nodes in the cluster:

For most cluster configurations:

cluster1::> system node halt -node * -skip-lif-migration-before-shutdown true -ignore-quorum-warnings true -inhibit-takeover true

For clusters with StrictSync SnapMirror relationships:

cluster1::> system node halt -node * -skip-lif-migration-before-shutdown true -ignore-quorum-warnings true -inhibit-takeover true -ignore-strict-sync-warnings true
  1. Respond to the prompt for each node:

Warning: Are you sure you want to halt node "cluster1-01"?
{y|n}: y

Warning: Are you sure you want to halt node "cluster1-02"?
{y|n}: y

Warning: Are you sure you want to halt node "cluster1-03"?
{y|n}: y

Warning: Are you sure you want to halt node "cluster1-04"?
{y|n}: y

4 entries were acted on.

  1. Wait for the node to halt completely by reaching LOADER prompt:

LOADER-A/B>

  1. Connect to each node in the cluster via SP/BMC (if not already connected) or using a local console cable and confirm each node is at the LOADER prompt (as in Step 8).
  2. (Optional) Power OFF each controller from the SP/BMC prompt:

SP> system power off

See Additional Info section for more information and warnings.

Physical activity
The physical activity here ensures no equipment damage occurs while the system is powered down and ensures the correct order of equipment startup is followed, so that the ONTAP system is prepared to serve data after the event is complete.
  1. Make a note of any faults presently on the system - amber LEDs on controllers, shelves, IOMs, disks, PSUs, etc.
  2. Toggle each PSU rocker switch to the off position on each piece of equipment.

Note: Some PSUs do not have rocker switches.

  1. Remove the power cable connecting each PSU to the PDU.
  2. Visually confirm each component has successfully powered off.
  3. Ensure that all controllers, disk shelves, and switches associated with the cluster are powered down.
Performing system power up

Power up procedure must be performed in the following order:

  1. Switches (network/FC/storage)
  2. Disk shelves
  3. Controllers
Switch power up
  1. Plug back in each power cable from PDU to PSU.
  2. Flip each rocker switch to the ON position (if applicable).
  3. Wait for the switch to power up.
  4. Check for any fault lights on the switch (both front and back).
  5. Connect to the switch via the management IP address.
  6. Confirm switch health (refer to the switch vendor documentation for more details).
  7. Repeat for each switch until all are powered up and healthy.
Disk shelves power up
  1. Plug back in each power cable from PDU to PSU.
  2. Flip each rocker switches to the ON position (if applicable).
  3. Wait for all the disk shelves to power up and for the drives to spin up.
  4. Ensure all shelf IDs are the proper values.
  5. Check for any fault lights on the disk shelf (both front and back) that did not exist before the shutdown.
Controllers power up
  1. Plug back in each power cable from PDU to PSU.
  2. Flip each rocker switch to the ON position (if applicable). HA pairs that are not in the same chassis should be powered up simultaneously.
  3. Wait for the controller(s) in the chassis to power up.
  4. Check for any fault lights on the chassis and controllers (both front and back).
  5. Repeat for each controller/chassis switch until all are powered up.
  6. Connect to the cluster management IP address via SSH.
  7. Perform additional system health checks.
  8. Generate a maintenance task complete (step 4) AutoSupport.

cluster1::> system node autosupport invoke -node * -type all -message MAINT=end

Best practices for post power up
Troubleshooting
Switch fails to power up
  • Refer to 3rd Party Support to contact the appropriate vendor for the impaired switch.
  • Do not continue with the power up procedure until the impaired switch has been repaired.
Disk shelf fails to power up
  • Please contact NetApp Technical Support and reference this article for further assistance in troubleshooting the shelf.
  • Do not continue with the power up procedure until the impaired shelf has been repaired.
Controller fails to power up

If one of the controller fails to power up (for instance a motherboard failure), the HA partner will not takeover since the -inhibit-takeover true flag was used at shutdown. The status of system will look similar to this:

cluster1::*> storage failover show
Takeover
Node Partner Possible State Description
-------------- -------------- -------- -------------------------------------
cluster1-01 cluster1-02 - Unknown
cluster1-02 cluster1-01 false Waiting for cluster1-01. Waiting
for cluster applications to come
online on the local node. Offline
applications: mgmt, vldb, vifmgr,
bcomd, crs., Takeover is not
possible: Partner node halted after
disabling takeover, Disk inventory
not exchanged
2 entries were displayed.

If unable to boot the controller, to recover from this situation, perform these steps:

  1. Please contact NetApp Technical Support and reference this article for further assistance in troubleshooting the impaired controller.
  2. Enter advanced privilege:

cluster1::> set -privilege advanced

  1. Force a takeover of the impaired node:

cluster1::*> storage failover takeover -option force -ofnode cluster1-01 -skip-lif-migration-before-takeover true

  1. Any LIFs from the impaired node will eventually come up on the available node (if there are broadcast-domain ports available).
  2. Perform a normal giveback when the impaired node is repaired.

Additional Information

Tips

To power off a controller remotely:

SP> system power off
This will cause a dirty shutdown of your appliance.  Continue? [y/n] y
SP> system power status
Chassis Power is off

The warning can be ignored only after a clean shutdown and the node is at the LOADER prompt. Any other use can cause loss of data.

Repeat from other SP in the same chassis (where applicable).

 

CUSTOMER EXCLUSIVE CONTENT

Registered NetApp customers get unlimited access to our dynamic Knowledge Base.

New authoritative content is published and updated each day by our team of experts.

Current Customer or Partner?

Sign In for unlimited access

New to NetApp?

Learn more about our award-winning Support