Skip to main content
NetApp Knowledge Base

AWS or GCP CVO rebooted due to multiple disks missing

Views:
237
Visibility:
Public
Votes:
0
Category:
cloud-volumes-ontap-cvo
Specialty:
cloud
Last Updated:

Applies to

  • Cloud Volumes ONTAP (CVO)
  • Amazon Web Services (AWS)
  • Google Cloud Provider (GCP)

Issue

  • An AWS / GCP CVO node rebooted with an AutoSupport from the surviving HA partner HA Group Notification (MULTIPLE DISKS MISSING) ERROR. 
  • From the surviving node's EMS logs, it can be seen that it has lost access to its mirrored Pool1 disks, which are attached to the failed node:

Mon Jun 03 16:23:02 +0000 [CVO-01: monitor: monitor.globalStatus.critical:EMERGENCY]: This node has taken over CVO-02. One or more mirrored aggregates are degraded.

Mon Jun 03 16:22:35 +0000 [CVO-01: dmgr_thread: raid.disk.missing:info]: Disk /aggr1/plex1/rg0/0d.10 S/N [00000000V9NeubcHXfRG] UID [00000000V9NeubcHXfRG] is missing from the system
Mon Jun 03 16:22:35 +0000 [CVO-01: config_thread: raid.config.filesystem.disk.missing:info]: File system Disk /aggr1/plex1/rg0/0d.10 S/N [00000000V9NeubcHXfRG] UID [00000000V9NeubcHXfRG] is missing.

Note: The above errors are seen for all disks owned by affected node CVO-02.

  • Storage failover show output reports the Previous giveback failed in module: raid  as seen below:

::> storage failover show
                              Takeover
Node           Partner        Possible State Description
-------------- -------------- -------- -------------------------------------
CVO-01         CVO-02         false    Previous giveback failed in module:
                                             raid
CVO-02         CVO-01         -        Waiting for giveback

  • EMS logs (below errors may repeat until raid resync completes):

Sat Jul 19 04:15:20 +0000 [CVO-01: cf_main: gb.cfo.abort.raid.fm:error]: Aggregate local:aggr8 is being resynced; canceling giveback.
 Sat Jul 19 04:15:20 +0000 [CVO-01: cf_main: cf.rsrc.givebackVeto:alert]: Failover monitor: raid: giveback canceled due to active state.
Sat Jul 19 04:15:20 +0000 [CVO-01: cf_main: cf.fsm.autoGivebackVetoed:error]: Failover monitor: Automatic giveback has been deferred due to long running operations

  • Shortly after this event, the following AutoSupport alerts may be generated as a residual symptom of missing disks:

HA Group Notification (SYNCMIRROR PLEX FAILED) ALERT

NODEOQ:HA Group Notification from CVO-02 (NODE(S) OUT OF CLUSTER QUORUM) EMERGENCY

  • After the node reboots, it is able to reestablish connectivity to the presented AWS / GCP disks and giveback completes successfully.

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.