Skip to main content
NetApp Knowledge Base

SHUTDOWN PENDING (degraded mode) CRITICAL - AutoSupport message

Views:
3,562
Visibility:
Public
Votes:
8
Category:
ontap-9
Specialty:
hw
Last Updated:

Applies to

  • ONTAP 9
  • callhome.shutdown.pending
  • monitor.shutdown.brokenDisk
  • HA Group Notification from node_name (SHUTDOWN PENDING (degraded mode)) ALERT

Event Summary

callhome.shutdown.pending

This message occurs when an automatic shutdown sequence is initiated due to a degraded RAID group that cannot be reconstructed because there are insufficient appropriate spare disks. i.e. the RAID group is completely degraded. 

The definition of "degraded" depends on the RAID group types used by the aggregate:

  • raid4 - RAID group has one missing or failed disk
  • raid-dp - RAID group has two missing or failed disks
  • raid-tec - RAID group has three missing or failed disks
  • A mirrored aggregate is considered "degraded" if both plexes of the aggregate has missing or failed disks in the same positional RAID group.
  • In ONTAP versions earlier than 9.12.1, the system halts automatically to prevent a RAID group integrity failure and possible loss of data, if it runs in completely degraded mode for the defined timeout interval.
    • The default timeout is 24 hours.
  • If a spare drive becomes available while the system is running in degraded mode, the system immediately begins rebuilding the failed drive.

Validate

Event Log

event log show -severity * -message-name callhome*

[node1: statd: callhome.shutdown.pending:alert]: Call home for SHUTDOWN PENDING (degraded mode)

event log show -severity * -message-name monitor.brokenDisk*

[node1: statd: monitor.brokenDisk.notice:info]: When two disks are broken in raid_dp volume, the system shuts down automatically every 24 hours to encourage you to replace the disk. If you reboot the system it will run for another 24 hours before shutting down. (The 24 hour timeout may be increased by altering the "raid.timeout" value using the "options" command.)

[node1: statd: monitor.shutdown.brokenDisk.pending:notice]: two data disks in RAID group "/aggregate_name/plex0/rg0" are broken. Halting system in 24 hours.

Command line

To verify aggregate status, run storage aggregate show-status

RAID group /aggregate_name/plex0/rg1 (double degraded, block checksums)

      RAID Disk    Device     HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      ---------    ------     ------------- ---- ---- ---- ----- --------------    --------------
      dparity      0b.07.12   0b    7   12  SA:B   0   SAS 10000 1713523/3509295616 1716957/3516328368
      parity       0b.07.13   0b    7   13  SA:B   0   SAS 10000 1713523/3509295616 1716957/3516328368
      data         FAILED             N/A                        1713523/ -
      data         0b.07.15   0b    7   15  SA:B   0   SAS 10000 1713523/3509295616 1716957/3516328368
      data         FAILED             N/A                        1713523/ -
      data         0b.07.21   0b    7   21  SA:B   0   SAS 10000 1713523/3509295616 1716957/3516328368

 Run storage failover show to verify if the aggregate containing the disk that needs to be reconstructed/replaced is in a partial giveback state

::> storage failover show
                              Takeover
Node             Partner        Possible State Description
--------------   -------------- -------- -------------------------------------
Node-1           Node-2         true     Connected to Node-2, Partial giveback
Node-2           Node-1         true     Connected to Node-1.

 

Resolution

  1. Check for unassigned disks. Assign them to the node which requires spares to start reconstruction (the status should disappear once reconstructions start):

::> storage disk show -container-type unassigned

::> storage disk assign -disk <stackID>.<shelfID>.<bayID> -owner <node name>

  1. If in a Partial giveback state, complete the giveback . Refer to Disk does not reconstruct or evacuate when in the partial giveback state
  2. Replace any failed drives. Refer to this KB article to check your Part Status - DISK FAILED - AutoSupport message
Workaround

For further assistance:

Please contact NetApp Technical Support or log into the NetApp Support Site to create a case. Reference this article for further assistance.

Additional Information

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.