Node shutdown with "monitor.shutdown.brokenDisk:EMERGENCY" error
- Views:
- 3,230
- Visibility:
- Public
- Votes:
- 2
- Category:
- fas-systems
- Specialty:
- HW
- Last Updated:
- 5/19/2025, 7:21:56 AM
Applies to
- ONTAP version earlier than 9.12.1
- FAS / AFF models
Issue
monitor.shutdown.brokenDisk
message occurs when the automatic shutdown sequence expires per callhome.shutdown.pending
(the RAID group was in a degraded mode for the time interval specified .default is usually 24 hours)
- Node shuts down with the following error and without performing a takeover:
Example:
[Node-01: statd: monitor.brokenDisk.notice:info]: When two disks are broken in raid_dp volume, the system shuts down automatically every 24 hours to encourage you to replace the disk. If you reboot the system it will run for another 24 hours before shutting down. (The 24 hour timeout may be increased by altering the "raid.timeout" value using the "options" command.)
[Node-01: statd: monitor.shutdown.brokenDisk:EMERGENCY]: data disk,parity disk in RAID group "/aggr0_n1/plex0/rg0" are broken. Halting system now.
[Node-01: shutdown_thread0: ha.localNodeShutDown:notice]: Shutdown of the local node has been initiated with inhibit_takeover set to TRUE.
[Node-01: shutdown_thread0: kern.shutdown:notice]: System shut down because : "BROKEN DISK".
::> cluster ring show
Node UnitName Epoch DB Epoch DB Trnxs Master Online
--------- -------- -------- -------- -------- --------- ---------
node01 mgmt 0 2 13923732 - offline
node01 vldb 0 2 556665 - offline
node01 vifmgr 0 2 56 - offline
node01 bcomd 0 2 11 - offline
node01 crs 0 2 1 - offline
::> storage failover show
Takeover
Node Partner Possible State Description
-------------- -------------- -------- -------------------------------------
node01 node02 false Waiting for node02, Takeover is not
possible: Partner node halted after
disabling takeover
node02 node01 - Unknown
- Two disks failed from the same RAID-DP group and is in a "double degraded" state:
Example:
Aggregate aggr0_n1 (online, raid_dp, degraded) (block checksums)
Plex /aggr0_n1/plex0 (online, normal, active, pool0)
RAID group /aggr0_n1/plex0/rg0 (double degraded, block checksums)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0d.03.0 0d 3 0 SA:A 0 SAS 10000 1142352/2339537408 1144641/2344225968
parity FAILED N/A 1142352/ -
data FAILED N/A 1142352/ -