HA pair down due to multi disk failure
Applies to
AFF A250
Issue
- Automatic node shutdown due to 2 disks failed with a third one in a reconstruction so the "reconstruction stalled".
Aggregate aggregate1 (failed, raid_dp, partial, fast zeroed) (block checksums)
Plex /aggregate1/plex0 (offline, failed, inactive)
RAID group /aggregate1/plex0/rg0 (partial, block checksums)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0n.0P2 0n 0 0 0 SSD-NVM N/A 867680/222126208 867688/222128256 (reconstruct stalled)
parity 0n.1P2 0n 0 1 0 SSD-NVM N/A 867680/222126208 867688/222128256 (fast zeroed)
data FAILED N/A 867680/ -
data 0n.3P2 0n 0 3 0 SSD-NVM N/A 867680/222126208 867688/222128256
...
data 0n.19P2 0n 0 19 0 SSD-NVM N/A 867680/222126208 867688/222128256
data FAILED N/A 867680/ -
Raid group is missing 2 disks.
Mon Jul 05 2021 09:16:31 GMT [node_name1: statd: monitor.brokendisk.notice:NOTICE]: When two disks are broken in raid_dp volume, the system shuts down automatically every 24 hours to encourage you to replace the disk. If you reboot the system, it will run for another 24 hours before shutting down.
- Multidisk PANIC:
Panic_Message: aggr aggregate1: raid volfsm, fatal multi-disk error.. Raid type - raid_dp Group name plex0/rg0 state DEGRADEDRECONS. 1 disk failed in the gr...
- WAFL inconsistency error:
Sat Jul 03 02:27:13 +0000 [node_name1: wafl_exempt00: wafl.raid.incons.userdata:error]: WAFL inconsistent: inconsistent user data block at VBN 729078144 (vvbn:69395562 fbn:69395562 level:0) in private inode (fileid:container snapid:0 file_type:6 disk_flags:0xc10000800800143 error:120 raid_set:1) in volume volume_name@vserver:ab0123c4-56de-78fg-9hi0-j123kl45m6n7.
Sat Jul 03 02:27:13 +0000 [node_name1: wafl_exempt00: wafl.incons.userdata.vol:alert]: WAFL inconsistent: volume volume_name@vserver:ab0123c4-56de-78fg-9hi0-j123kl45m6n7 has an inconsistent user data block. Note: Any new Snapshot copies might contain this inconsistency.
Sat Jul 03 02:27:13 +0000 [node_name1: wafl_exempt00: callhome.wafl.inconsistent.user.block:alert]: Call home for WAFL INCONSISTENT USER BLOCK
- PCIe link errors for NVMe SSD disks:
Fri Jun 25 2021 11:30:52 GMT [node_name1: kernel: nvme.link.error:ERROR]: PCIe link initialization error for NVMe SSD in slot 22.