How does Data ONTAP respond, when a medium error occurs during a disk reconstruction?

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 1,643

Visibility:: Public

Votes:: 0

Category:: data-ontap-7

Specialty:: HW

Last Updated:

Applies to

Data ONTAP

Answer

Media scrub is proactive reading of all the disks to detect and fix media errors before they cause issues during reconstruction or double errors.
Although, this significantly reduces the number of issues caused by such errors, it cannot prevent all the errors.

Following is the recovery sequence:

Data ONTAP marks the volume or aggregate involved as 'inconsistent' and ignores the medium error.
Next, Data ONTAP attempts to start ‘wafliron’ on that volume or aggregate.

If this does not succeed, it tries to restrict (unmount) the volume.
If both fail, storage system panics with a message similar to:

PANIC: raid volfsm: vol vol_8TB_u33: fatal multi-disk error. in process config_thread

On next boot, the volume that caused panic will be restricted (not mountable).

If it is a root volume, storage system will not boot.
If it is a non-root volume, storage system will boot with the volume restricted.

Note: Data ONTAP will not allow the storage system to boot with this volume online at this point, as the medium error might cause metadata corruption.

User can manually start wafliron on the affected volume or reboot storage system and run wafl_check on the affected volume.
After the reconstruction completes, a scrub is started to clear any further double errors.
After the scrub completes, the 'ignore medium error mode' will be cleared on the volume.

The unrecoverable data is replaced with zeroed blocks. At least some applications recognize the zeroed blocks as bad data, if that data is ever needed.

Additional Information

For more information, see KB Panic string: raid volfsm: vol volume_name fatal multi-disk error. in process config_thread on release NetApp Release 7.x.x