WAFL inconsistency due to medium errors on X412_S15K7560A15 drives
Applies to
- Drives X412_S15K7560A15 running firmware NA08
- WAFL user data inconsistency
Issue
- Unrecovered medium errors on multiple disks
[?] Thu Aug 26 00:49:34 CEST [node2: pmcsas_intrd_0: disk.ioMediumError:warning]: Medium error on disk 0b.12.16: op 0x28:00b73e00:0200 sector 12009271 SCSI:medium error - Unrecovered read error - If the disk is in a RAID group, the subsystem will attempt to reconstruct unreadable data (3 11 0 81) (6551) [NETAPP X412_S15K7560A15 NA08] S/N [3SL0PJ8Q0000904601FS]
- RAID has issues to reconstruct the stripe (while additional writes comes in)
[?] Thu Aug 26 00:49:41 CEST [node2: raidio_thread: raid.tetris.media.err:notice]: Read error on Disk /Aggr1/plex0/rg2/0b.12.16 Shelf 12 Bay 16 [NETAPP X412_S15K7560A15 NA08] S/N [3SL0PJ8Q0000904601FS], block #1501159 during stripe write
[?] Thu Aug 26 00:49:41 CEST [node2: raidio_thread: raid.multierr.bad.block:critical]: Marking 'Disk /Aggr1/plex0/rg2/0b.12.16 Shelf 12 Bay 16 [NETAPP X412_S15K7560A15 NA08] S/N [3SL0PJ8Q0000904601FS]', block number 1501158, volume block number 8889490534, as a bad block.
[?] Thu Aug 26 00:49:45 CEST [node2: raidio_thread: raid.multierr.bad.missingBlk:debug]: Marking '/Aggr1/plex0/rg2', block number 1501158, volume block number 7025878758, as bad block.
- WAFL inconsistency errors reported
[?] Sun Aug 29 11:15:20 CEST [node2: wafl_hipri: wafl.raid.incons.userdata:error]: WAFL inconsistent: inconsistent user data block at VBN 8895324051 (vvbn:1003059249 fbn:693434436 level:0) in public inode (fileid:104 snapid:0 file_type:15 disk_flags:0x841a error:120) in volume vol1.
[?] Sun Aug 29 11:15:20 CEST [node2: wafl_hipri: wafl.incons.userdata.vol:error]: WAFL inconsistent: volume vol1 has an inconsistent user data block. Note: Any new Snapshot copies might contain this inconsistency.
- If there are impacted LUNs they can report with
[?] Sun Aug 29 11:15:20 CEST [node2: scsitgt_admin: scsitarget.read.RAID.error:error]: Read of LUN /vol/vol1/lun1.lun failed due to a RAID error.
- System has drives with over 6 years of operating life