CONTAP-207605: RAID detected lost writes across multiple drives
Issue
- ONTAP event messages example:
Mon Jan 01 01:02:03 +0100 [node_name: disk_server_0: disk_lostwriteDetected_1:error]: params: {'diskName': '0n.1', 'bno': '5250', 'vol': 'aggr_name', 'fileid': '-1', 'block': '0', 'cksum': '0xd0215f58', 'cksum2': '0x725c56f9'}
Mon Jan 01 01:02:03 +0100 [node_name: disk_server_0: scsi.debug:debug]: shm_setup_for_failure disk 0n.1 (S/N A123BC4D567890) error 10000h
Mon Jan 01 01:02:03 +0100 [node_name: disk_server_0: shm.disk.lostWriteError:error]: shm: Disk 0n.1 has detected and recovered from a lost write error; the system will fail the disk if possible.
- This issue may create a WAFL inconsistency. Example:
Mon Jan 02 01:02:03 +0100 [node_name: wafl_exempt07: callhome_wafl_inconsistent_block_1:alert]: params: {'subject': 'WAFL INCONSISTENT BLOCK'}
Mon Jan 02 01:02:03 +0100 [node_name: wafl_exempt17: callhome_wafl_inconsistent_user_block_1:alert]: params: {'subject': 'WAFL INCONSISTENT USER DATA BLOCK'}
Mon Jan 02 01:02:03 +0100 [node_name: wafl_exempt15: sk_panic_1:alert]: params: {'reason': 'Unrecoverable metadata block (file 73, block 34356874, fbn 51, level 0, file type 1) in volume partner:vol_name. WAFL inconsistent. Contact NetApp technical support. in SK process wafl_exempt15 on release 9.13.1P4 (C)'}