Host memory checksum mismatch on disk WRITE VERIFY
Applies to
- FAS storage nodes (AFF nodes not affected)
- ONTAP 9.8 or 9.9.1
- Local Fabricpool configuration with local- and cloud-tier on same node
- HA-Pairs with one node hosting local-tier and HA-partner node hosting cloud-tier, affected on takeover
Issue
-
Tiering policy other than
none
is assigned to a volume and tiering to cloud-tier is active. -
A node reboots unexpected with PANIC message:
Host memory checksum mismatch on WRITE VERIFY: Disk <disk_ID>, Disk Block #XXXX: Volume <Volume_name>, FileId XXX,File Block #XXX: Expected 0xYYYYYYYY, Recomputed as 0xZZZZZZZZ in SK process disk_server_0 on release 9.X (C)
- A takeover can lead to the panic, if the takeover would bring cloud-tier and performance tier to one node.
- When the partner node has issued HA takeover, it can experience the same unexpected reboot, leading to a HA-Pair outage.
- If each node of an HA-Pair only owns either local- or cloud-tier, the PANIC is triggered only after takeover was issued.
- RAID scrubbing after teh PANIC was hit uncovers parity errors, referring the cloud tier aggregate:
[node-02: raidio_thread: raid_rg_scrub_cksum_err_1:notice]: params: {'disk_rpm': '10000', 'vendor': 'NETAPP ', 'firmware_revision': 'NA01', 'shelf': '23', 'disk_info': 'Disk /<cloud_tier_aggregate>/plex0/rg1/0c.23.8 Shelf 23 Bay 8 [NETAPP X343_TA15E1T8A10 NA01] S/N [XXX] UID [5000039B:3840A21C:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000]', 'site': 'Local', 'bay': '8', 'carrier': '', 'serialno': 'XXX', 'owner': '', 'model': 'X343_TA15E1T8A10', 'disk_type': '4', 'blockNum': '17612'}
[node-02: raidio_thread: raid_rg_readerr_repair_cksum_stored_1:notice]: params: {'disk_rpm': '10000', 'vendor': 'NETAPP ', 'firmware_revision': 'NA01', 'shelf': '23', 'disk_info': 'Disk /<cloud_tier_aggregate>/plex0/rg1/0c.23.8 Shelf 23 Bay 8 [NETAPP X343_TA15E1T8A10 NA01] S/N [XXX] UID [5000039B:3840A21C:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000]', 'site': 'Local', 'bay': '8', 'carrier': '', 'serialno': 'XXX', 'owner': '', 'model': 'X343_TA15E1T8A10', 'disk_type': '4', 'blockNum': '17612'}
- Inconsistent user data blocks are detected, refering a
<volume_name>
on the local tier aggregate:
[node-01: wafl_exempt12: wafl.raid.incons.userdata:error]: WAFL inconsistent: inconsistent user data block at VBN XXX (vvbn:XXX fbn:XXX level:0) in public inode (fileid:XXX snapid:0 file_type:15 disk_flags:0x8402 error:120 raid_set:1) in volume <volume_name>@vserver:<Vserver_UUID>.