Cloud Provider Regional or Availability Zone outages result in CVO issues
Applies to
- Cloud Manager (Service Connector)
- Cloud Volumes ONTAP (CVO)
- Amazon Web Services (AWS)
- Microsoft Azure
- Google Cloud Platform (GCP)
- Availability Zone (AZ)
Issue
- Occasionally cloud providers such as AWS, Azure, or GCP experience a regional/AZ power or network outage. These types of unplanned cloud provider issues directly affect the CVO health in part or whole depending on the location of the CVO node(s).
- Unexpected Node reboot and aggregates going offline can occur in case of power outage in the datacenter hosting the impacted VM instances. The following errors can be seen in autosupport logs:
[?] Sat Jan 10 09:42:18 -0800 [CVO: cf_main: cf.fsm.takeover.noHeartbeat:alert]: Failover monitor: Takeover initiated after no heartbeat was detected from the partner node.Sat Jan 10 09:43:11 -0800 [CVO: ThreadHandlerun: clam.takeover:info]: Local node (name=CVO, ID=0000) initiated takeover of partner node (name=CVO, ID=000, state=1) result=failed, type=panic.[?] Sat Jan 10 09:43:11 -0800 [CVO: kltp: clam.node.ooq:EMERGENCY]: Node (name=CVO, ID=000) is out of "CLAM quorum" (reason=heartbeat failure).
