- Data ONTAP 7 and earlier
- Disk Drive
- ONTAP 9
The following events occur when a disk fails in a filer that is not equipped with a hot spare disk:
- The storage system enters a state called degraded mode. In this state, the RAID feature allows the storage system to continue to run without losing data (although the storage system's performance might be affected).
- The event "callhome.spares.low" is logged in the EMS log
- If a disk drive fails without a spare to reconstruct on, the system enters "degraded" mode.
- Depending on the RAID group type, the aggregate may go into the "completely degraded" state.
- raid4 - RAID group has one missing or failed disk
- raid-dp - RAID group has two missing or failed disks
- raid-tec - RAID group has three missing or failed disks
- A mirrored aggregate is considered "completely degraded" if both plexes of the aggregate has missing or failed disks in the same positional RAID group.
- The system halts automatically to prevent a RAID group integrity failure and possible loss of data, if it runs in "completely degraded" mode for the defined time interval.
- The default timeout is usually 24 hours.
|Warning: Replace all reported failed disk as soon as possible, as additional disk failures will cause data loss within the affected raid group.|
4. The storage system logs one of the following warning messages in the
/etc/messages file and to the system console every hour:
Parity disk is broken. Halting in m hours.
Data disk n is broken. Halting in m hours.
where "n" is the disk ID number, and "m" is the number of hours before the system halts.
5. Immediately before the system halts, a message similar to the following is sent to
/etc/messages and the console:
Sat Oct 29 13:26:42 PDT [statd]: When a disk is broken, the system shuts down automatically every 24 hours to encourage you to replace the disk.If you reboot the system it will run for another 24 hours before shutting down.
6. The system shuts down after the specified time period if it is still running in completely degraded mode. The shutdown ensures that you notice the disk failure. You can restart the storage system without fixing the disk, but the storage system will continue to shut itself off at the specified intervals until the issue is fixed.
- With the ONTAP 9.12.1 code change via Bug ID 944990 - System no longer halts if a RAID aggregate remains degraded for 24 hours, the default system behavior is changed to not halt if an aggregate is completely degraded.
- If you wish to preserve the previous behavior, set the raid.timeout option to a nonzero value in order for the system to shut down on expiry of the timeout period.
- ONTAP 9.12.1 "monitor" events
- SPARES_LOW ERROR messages when using ADP
- How low spare warnings can help you manage your spare disks
- Why am I getting a SPARES LOW warning?
- SHUTDOWN PENDING (degraded mode) CRITICAL - AutoSupport message
- Node shutdown with "monitor.shutdown.brokenDisk:EMERGENCY" error