Node shutdown with "monitor.shutdown.brokenDisk:EMERGENCY" error

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 3,765

Visibility:: Public

Votes:: 2

Category:: fas-systems

Specialty:: HW

Last Updated:

Applies to

ONTAP version earlier than 9.12.1
FAS / AFF models

Issue

monitor.shutdown.brokenDisk message occurs when the automatic shutdown sequence expires per callhome.shutdown.pending (the RAID group was in a degraded mode for the time interval specified. The default is usually 24 hours)

Node shuts down with the following error and without performing a takeover:

Example:

[node01: statd: monitor.brokenDisk.notice:info]: When two disks are broken in raid_dp volume, the system shuts down automatically every 24 hours to encourage you to replace the disk. If you reboot the system it will run for another 24 hours before shutting down. (The 24 hour timeout may be increased by altering the "raid.timeout" value using the "options" command.) [node01: statd: monitor.shutdown.brokenDisk:EMERGENCY]: data disk,parity disk in RAID group "/aggr0_n1/plex0/rg0" are broken. Halting system now. [node01: shutdown_thread0: ha.localNodeShutDown:notice]: Shutdown of the local node has been initiated with inhibit_takeover set to TRUE. [node01: shutdown_thread0: kern.shutdown:notice]: System shut down because : "BROKEN DISK".

::> cluster ring show Node UnitName Epoch DB Epoch DB Trnxs Master Online --------- -------- -------- -------- -------- --------- --------- node01 mgmt 0 2 13923732 - offline node01 vldb 0 2 556665 - offline node01 vifmgr 0 2 56 - offline node01 bcomd 0 2 11 - offline node01 crs 0 2 1 - offline

::> storage failover show Takeover Node Partner Possible State Description -------------- -------------- -------- ------------------------------------- node01 node02 false Waiting for node02, Takeover is not possible: Partner node halted after disabling takeover node02 node01 - Unknown

Two disks failed from the same RAID-DP group and the RAID group is in a "double degraded" state:

Example:

Aggregate aggr0_n1 (online, raid_dp, degraded) (block checksums) Plex /aggr0_n1/plex0 (online, normal, active, pool0) RAID group /aggr0_n1/plex0/rg0 (double degraded, block checksums)

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) --------- ------ ------------- ---- ---- ---- ----- -------------- -------------- dparity 0d.03.0 0d 3 0 SA:A 0 SAS 10000 1142352/2339537408 1144641/2344225968 parity FAILED N/A 1142352/ - data FAILED N/A 1142352/ -