Disk missing on CVO causing system to panic
Applies to
- Cloud Volumes ONTAP (CVO)
- Blue XP (formerly Cloud Manager)
- Microsoft Azure
- Amazon Web Services (AWS)
- Google Cloud Platform (GCP)
- Single Node or HA Pair
Issue
- One or more disks becomes unreachable due to an issue in the underline infrastructure and causes a panic:
[Cluster-01: pha_remove000: mlm.array.lun.removed:notice]: Array LUN '0b.29' (00000000i3g268fHE60S) is no longer being presented to this node.
[Cluster-01: dmgr_thread: raid.disk.missing:info]: Disk /aggr04/plex0/rg0/0b.29 S/N [00000000i3g268fHE60S] UID [00000000i3g268fHE60S] is missing from the system
[Cluster-01: config_thread: sk.panic:alert]: Panic String: aggr aggr04: raid volfsm, fatal disk error in RAID group with no parity disk.. Raid type - raid0 Group name plex0/rg0 state NORMAL. 1 disk failed in the group. Disk 0b.29 S/N [00000000i3g268fHE60S] UID [00000000i3g268fHE60S] error: disk does not exist. in SK process config_thread on release 9.7P7 (C)
[Cluster-01: config_thread: sk.panic:alert]: params: {'reason': 'aggr aggr04: raid volfsm, fatal disk error in RAID group with no parity disk.. Raid type - raid0 Group name plex0/rg0 state NORMAL. 1 disk failed in the group. Disk 0b.29 S/N [00000000i3g268fHE60S] UID [00000000i3g268fHE60S] error: adapter error prevents command from being sent to device. in SK process config_thread on release 9.7P7 (C)'}
- Under some circumstances, the system may instead panic with a
WAFL Hung
panic:
Panic String: WAFL hung for aggr1. in SK process wafl_exempt02 on release 9.9.0 (C)
- In AWS/GCP it may result in plex failure and Node may come back as "unknow" status. On Azure it may result in panic if disks (page blobs in case of Azure HA root/data aggregate) are not reachable.
- Support Case might get created automatically due to
HA Group Notification (PARTNER DOWN, TAKEOVER IMPOSSIBLE ) EMERGENCY
alert