Skip to main content
The NetApp Knowledge Base Site will be down for 60 minutes on August 13, 2022, from 8 PM to 9 PM PT for system maintenance and infrastructure update.
NetApp Knowledge Base

How does Data ONTAP respond, when a medium error occurs during a disk reconstruction?

Views:
1,010
Visibility:
Public
Votes:
0
Category:
data-ontap-7
Specialty:
hw
Last Updated:

 

Applies to

Data ONTAP

Answer

  • Media scrub is proactive reading of all the disks to detect and fix media errors before they cause issues during reconstruction or double errors.
  • Although, this significantly reduces the number of issues caused by such errors, it cannot prevent all the errors.

Following is the recovery sequence: 

  1. Data ONTAP marks the volume or aggregate involved as 'inconsistent' and ignores the medium error. 
  2. Next, Data ONTAP attempts to start ‘wafliron’ on that volume or aggregate.
  • If this does not succeed, it tries to restrict (unmount) the volume.
  • If both fail, storage system panics with a message similar to:  

PANIC: raid volfsm: vol vol_8TB_u33: fatal multi-disk error. in process config_thread

  1. On next boot, the volume that caused panic will be restricted (not mountable).
  • If it is a root volume, storage system will not boot.
  • If it is a non-root volume, storage system will boot with the volume restricted.

Note: Data ONTAP will not allow the storage system to boot with this volume online at this point, as the medium error might cause metadata corruption. 

  1. User can manually start wafliron on the affected volume or reboot storage system and run wafl_check on the affected volume. 
  2. After the reconstruction completes, a scrub is started to clear any further double errors. 
  3. After the scrub completes, the 'ignore medium error mode' will be cleared on the volume. 

The unrecoverable data is replaced with zeroed blocks. At least some applications recognize the zeroed blocks as bad data, if that data is ever needed.

 

Scan to view the article on your device