Disk missing on CVO causing system to panic

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 947

Visibility:: Public

Votes:: 1

Category:: not set

Specialty:: cloud

Last Updated:

Applies to

Cloud Volumes ONTAP (CVO)
Blue XP (formerly Cloud Manager)
Microsoft Azure
Amazon Web Services (AWS)
Google Cloud Platform (GCP)
Single Node or HA Pair

Issue

One or more disks becomes unreachable due to an issue in the underline infrastructure and causes a panic:

[Cluster-01: pha_remove000: mlm.array.lun.removed:notice]: Array LUN '0b.29' (00000000i3g268fHE60S) is no longer being presented to this node.

[Cluster-01: dmgr_thread: raid.disk.missing:info]: Disk /aggr04/plex0/rg0/0b.29 S/N [00000000i3g268fHE60S] UID [00000000i3g268fHE60S] is missing from the system

[Cluster-01: config_thread: sk.panic:alert]: Panic String: aggr aggr04: raid volfsm, fatal disk error in RAID group with no parity disk.. Raid type - raid0 Group name plex0/rg0 state NORMAL. 1 disk failed in the group. Disk 0b.29 S/N [00000000i3g268fHE60S] UID [00000000i3g268fHE60S] error: disk does not exist. in SK process config_thread on release 9.7P7 (C)

[Cluster-01: config_thread: sk.panic:alert]: params: {'reason': 'aggr aggr04: raid volfsm, fatal disk error in RAID group with no parity disk.. Raid type - raid0 Group name plex0/rg0 state NORMAL. 1 disk failed in the group. Disk 0b.29 S/N [00000000i3g268fHE60S] UID [00000000i3g268fHE60S] error: adapter error prevents command from being sent to device. in SK process config_thread on release 9.7P7 (C)'}

Under some circumstances, the system may instead panic with a WAFL Hung panic:
Panic String: WAFL hung for aggr1. in SK process wafl_exempt02 on release 9.9.0 (C)
In AWS/GCP it may result in plex failure and Node may come back as "unknow" status. On Azure it may result in panic if disks (page blobs in case of Azure HA root/data aggregate) are not reachable.
Support Case might get created automatically due to HA Group Notification (PARTNER DOWN, TAKEOVER IMPOSSIBLE ) EMERGENCY alert