SHUTDOWN PENDING (degraded mode) CRITICAL - AutoSupport message
Applies to
- ONTAP 9
- callhome.shutdown.pending
- monitor.brokenDisk
- HA Group Notification from node_name (SHUTDOWN PENDING (degraded mode)) ALERT
Event Summary
This message occurs when a disk drive fails but there are no suitable spares available for reconstruction.
- To protect your data, the system enters degraded mode.
- The system halts automatically to prevent a double disk drive failure, and possible loss of data, if it runs in degraded mode for the set time interval.
- The default timeout is usually 24 hours.
- If a spare drive becomes available while the system is running in degraded mode, the system immediately begins rebuilding the failed drive.
Validate
Event Log
event log show -severity * -message-name callhome*
[node1: statd: callhome.shutdown.pending:alert]: Call home for SHUTDOWN PENDING (degraded mode)
event log show -severity * -message-name monitor.brokenDisk*
[node1: statd: monitor.brokenDisk.notice:info]: When two disks are broken in raid_dp volume, the system shuts down automatically every 24 hours to encourage you to replace the disk. If you reboot the system it will run for another 24 hours before shutting down. (The 24 hour timeout may be increased by altering the "raid.timeout" value using the "options" command.)
[node1: statd: monitor.shutdown.brokenDisk.pending:notice]: two data disks in RAID group "/aggregate_name/plex0/rg0" are broken. Halting system in 24 hours.
Command line
Verify Aggregate status, run storage aggregate show-status
RAID group /aggregate_name/plex0/rg1 (double degraded, block checksums) RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) --------- ------ ------------- ---- ---- ---- ----- -------------- -------------- dparity 0b.07.12 0b 7 12 SA:B 0 SAS 10000 1713523/3509295616 1716957/3516328368 parity 0b.07.13 0b 7 13 SA:B 0 SAS 10000 1713523/3509295616 1716957/3516328368 data FAILED N/A 1713523/ - data 0b.07.15 0b 7 15 SA:B 0 SAS 10000 1713523/3509295616 1716957/3516328368 data FAILED N/A 1713523/ - data 0b.07.21 0b 7 21 SA:B 0 SAS 10000 1713523/3509295616 1716957/3516328368
Verify failover status, run storage failover show
to verify if the aggregate containing the disk that needs to be reconstructed/evacuated is in a partial giveback state
storage failover show
Takeover
Node Partner Possible State Description
-------------- -------------- -------- -------------------------------------
Node-1 Node-2 true Connected to Node-2, Partial giveback
Node-2 Node-1 true Connected to Node-1.
Resolution
- Check for unassigned disks, and assign them to the node which requires spares to start reconstruction:
-
::> storage disk show -container-type unassigned
-
::> storage disk assign -disk <stackID>.<shelfID>.<bayID> -owner <node name>
- If in a Partial giveback state, complete the giveback . Refer to Disk does not reconstruct or evacuate when in the partial giveback state
- Replace any failed drives. Refer to this Kb to check your Part Status - DISK FAILED - AutoSupport message
Workaround
- Check if the partner node has 2 or more spare disks of the same type available and reassign the disk. Follow: How to reassign spare disks from HA or DR partner node
Note: If you need assistance, please contact NetApp Support