CHW-2354: NVMe flash cache goes offline causing a potential in-memory corruption of file system metadata
Issue
- NVMe flash cache reports an NVMe read context mismatch and then goes offline
::Fri May 17 15:03:31 0000 [node-01: irq78: nvme1:io12: extCache.io.readError:notice]: WAFL external cache I/O read error: Unexpected nvme read context mismatch. Mismatch field: context checksum, expected: 3425374128, received: 0, code 1519064567.
::Fri May 17 15:03:31 0000 [node-01: irq78: nvme1:io12: extCache.io.extInfo:notice]: Failed block information : fbn: 11105, pvbn: 70965920790, vvbn: 24296618993, inum: 78, ecbn: 1519064567, vol: vol1, aggr: aggr1.
::Fri May 17 15:03:55 0000 [node-01: gemini-notif-thread: callhome.nvme.offline:alert]: Callhome for Flash Cache NVMe Offline.
- A few minutes later, a wafl inconsistent error is triggered
::Fri May 17 15:08:59 +0000 [node-01: wafl_exempt03: wafl.raid.incons.buf:error]: WAFL inconsistent: bad block at VBN 26672591486 (vvbn:8653375619 fbn:265118 level:0) in private inode (fileid:73 snapid:0 fixable:1 file_type:1 disk_flags:0x2 error:119 raid_set:1) in volume vol1@vserver:cd390d00-9817-11ee-8b48-00a098433dd8.