Node shutdown with "monitor.shutdown.brokenDisk:EMERGENCY" error
Applies to
- ONTAP version earlier than 9.12.1
- FAS / AFF models
Issue
monitor.shutdown.brokenDisk
message occurs when the automatic shutdown sequence expires per callhome.shutdown.pending
(the RAID group was in a degraded mode for the time interval specified .default is usually 24 hours)
- Node shuts down with the following error and without performing a takeover:
Example:
[Node-01: statd: monitor.brokenDisk.notice:info]: When two disks are broken in raid_dp volume, the system shuts down automatically every 24 hours to encourage you to replace the disk. If you reboot the system it will run for another 24 hours before shutting down. (The 24 hour timeout may be increased by altering the "raid.timeout" value using the "options" command.)
[Node-01: statd: monitor.shutdown.brokenDisk:EMERGENCY]: data disk,parity disk in RAID group "/aggr0_n1/plex0/rg0" are broken. Halting system now.
[Node-01: shutdown_thread0: ha.localNodeShutDown:notice]: Shutdown of the local node has been initiated with inhibit_takeover set to TRUE.
[Node-01: shutdown_thread0: kern.shutdown:notice]: System shut down because : "BROKEN DISK".
- Two disks failed from the same RAID-DP group and is in a "double degraded" state:
Example:
Aggregate aggr0_n1 (online, raid_dp, degraded) (block checksums)
Plex /aggr0_n1/plex0 (online, normal, active, pool0)
RAID group /aggr0_n1/plex0/rg0 (double degraded, block checksums)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0d.03.0 0d 3 0 SA:A 0 SAS 10000 1142352/2339537408 1144641/2344225968
parity FAILED N/A 1142352/ -
data FAILED N/A 1142352/ -