Skip to main content

NetApp_Insight_2020.png 

NetApp Knowledgebase

How does Data ONTAP respond, when a medium error occurs during a disk reconstruction?

Views:
173
Visibility:
Public
Votes:
0
Category:
data-ontap-7
Specialty:
hw
Last Updated:

 

Applies to

Data ONTAP

Answer

Media scrub is proactive reading of all the disks to detect and fix media errors before they cause issues during a reconstruction or double errors. While, this significantly reduces the number of issues caused by such errors, it cannot prevent all the errors.

The following is the recovery sequence: 

  1. Data ONTAP will mark the volume or aggregate involved as 'inconsistent' and ignore the medium error. 
  2. Next, Data ONTAP attempts to start ‘wafliron’ on that volume or aggregate. If this does not succeed, it tries to restrict (unmount) the volume. If both fail, the storage system panics with a message that appears similar to the following:  

PANIC: raid volfsm: vol vol_8TB_u33: fatal multi-disk error. in process config_thread

  1. On the next boot, the volume that caused the panic will be restricted (not mountable). If it is a root volume, the storage system will not boot. If it is a non-root volume, the storage system will boot with the volume restricted.

Note: Data ONTAP will not allow the storage system to boot with this volume online at this point, since the medium error might cause metadata corruption. 

  1. The user can manually start wafliron on the affected volume or reboot the storage system and run wafl_check on the affected volume. 
  2. After the reconstruction completes, a scrub will be started to clear any further double errors. 
  3. After the scrub completes, the 'ignore medium error mode' will be cleared on the volume. 

The unrecoverable data is replaced with zeroed blocks. At least some applications recognize the zeroed blocks as bad data, if that data is ever needed.