What is Media Scan on E-Series storage systems?

Last updated

Mar 29, 2024
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 2,453

Visibility:: Public

Votes:: 7

Category:: e-series-systems

Specialty:: esg

Last Updated:: 3/29/2024, 6:12:10 AM

Applies to

E-Series Controller Firmware 6.xx
E-Series Controller Firmware 7.xx
E-Series Controller Firmware 8.xx

Answer

Check Active IQ if this impacts your systems

Media scan is a process that, when enabled, runs during idle time to check the physical disks in a volume.
- It works to ensure that the sectors are readable, and if Redundancy Check is enabled, will check RAID parity for consistency.
- In the event that it finds issues with sectors or data-parity mismatches, these are reported to the Major Event Log (MEL) so that the user is aware of any issues.
The process runs at a predetermined rate.
- For example, if a 30-day interval is selected when enabling it (though this interval is customizable), it will scan that volume at a rate that would take 30 days to complete.
- However, since media scan only operates during idle time, the actual time for completion might be longer, as it gives priority to host IO over media scan operations.
- Once the operation completes, it will automatically start over, so that the drives in the background are constantly checked.
The limitation to this is that an issue will not be discovered until the controllers are scanning the part of a drive that contains errors.
- Thus, if a drive develops bad sectors or corruption one day after it was last scanned, it will not be detected until the next time the scan runs over that region of the drive (or until the error is found during some other operation).
Any performance hit to the host IO is negligible.
- Media scan will pause to give priority to the host IO, but the initial response time might be very minutely delayed to switch from media scan to service the IO.
- For most purposes, this will not be noticeable.
Media Scan errors as reported in the MEL

Reported Error	Description	Result
Unrecovered media error	The data could not be read on its first attempt, or on any subsequent 2 retries.	If any of the 3 tries is successful, the data is returned to the host. If the read retries are unsuccessful, except for RAID 0, attempt error correction via VDD Repair
Recovered media error	The drive could not read the requested data on its first attempt, but succeeded on a subsequent attempt.	Data is written to the drive and verified.
Redundancy mismatches	Redundancy errors are found.	The first 10 redundancy mismatches found on a logical drive are reported. Operating system data checking operations should be executed.
Unfixable error	The data could not be read, and parity or redundancy information could not be used to regenerate it.	An error is reported.

Additional Information

VDD repair:

The VDD Repair starts by reading the data from the RAID stripe + the parity from the stripe.
The VDD Repair then calculates from data+parity of the stripe the data that resides in the unreadable sector of the drive.
If the data is reconstituted successfully from data+parity from the rest of the stripe, the read is returned to the host.
If the VDD Repair is successful, then it does a 'Write Verify' SCSI operation. This writes the reconstituted data to the unreadable sector, and then immediately reads it back.
If the VDD Repair fails (the data is unable to be reconstituted due to a bad read on another drive (in a RAID5), or a degraded RAID group (not enough redundancy), then the affected LBA in the RAID Volume is marked as an 'Unreadable Sector' (ends up in the USM log), and an error is returned to the host. The data at that LBA is lost if we reach this point.
In the background, the Write Verify to the 'bad' sector of the drive will result in the drive firmware reallocating the physical sector (transparently to the E-Series controllers).