Skip to main content
NetApp Knowledge Base

Takeover caused by multiple disk failures

Views:
390
Visibility:
Public
Votes:
0
Category:
metrocluster
Specialty:
7dot
Last Updated:

Applies to

  • Data ONTAP (7-Mode) 8.2.5P5
  • FAS6250
  • 2 node Fabric attached MetroCluster

Issue

Multiple disk-related errors are observed in the messages:
  • 'Invalid checksum entry during write operation' on multiple disks
  • 'Orphaning disk because not in consistent label set (CLS)' on multiple disks
  • 'Orphaning disk because it is more recent than the calculated plex consistent label set' on multiple disks
  • Syncmirror plex failed autosupports triggered
  • 'diskown.ownerReservationMismatch' errors
 
Examples:
Sat May 15 04:50:41 UTC [Node01:raid.tetris.cksum.embed:CRITICAL]: Invalid checksum entry on Disk /aggr_Node01_data/plex1/rg1/Site01-sw1:2.126L36 Shelf 31 Bay 9 [NETAPP   X422_SLTNG600A10 NA02] S/N [SerialNumber], block #60799576, during write operation.  
Sat May 15 04:51:16 UTC [Node01:raid.assim.cls.notInCls:error]: Orphaning disk Site02-sw1:2.126L14 in plex aggr_Node01_data/1, because not in consistent label set (CLS). 
Sat May 15 04:51:16 UTC [Node01:raid.assim.cls.moreRecent:error]: Orphaning disk Site01-sw2:2.126L14 in plex aggr_Node01_data/0, because it is more recent (146175/1789746823, 146175/1789746823) than the calculated plex consistent label set (146174/1789745659).
Sat May 15 04:51:16 UTC [Node01:raid.assim.rg.missingChild:error]: Aggregate aggr_Node01_data, rgobj_verify: RAID object 0 has only 18 valid children, expected 22.  
Sat May 15 04:51:16 UTC [Node01:raid.assim.plex.missingChild:error]: Aggregate aggr_Node01_data, plexobj_verify: Plex 1 only has 1 working RAID groups (2 total) and is being taken offline  
Sat May 15 04:51:16 UTC [Node01:callhome.syncm.plex:CRITICAL]: Call home for SYNCMIRROR PLEX FAILED 
Sat May 15 04:51:17 UTC [Node01:raid.config.check.failedPlex:error]: Plex /aggr_Node01_data/plex1 has failed.  
Sat May 15 04:51:17 UTC [Node01:monitor.diskLabelCheckFailed:warning]: Periodic check of RAID Disk /aggr_Node01_data/plex1/rg0/Site01-sw1:2.126L54 Shelf 32 Bay 1 [NETAPP   X422_SLTNG600A10 NA02] S/N [SerialNumber] has failed. The system will correct the problem.  
Sat May 15 04:51:17 UTC [Node01:monitor.diskLabelCheckFailed:warning]: Periodic check of RAID Disk Site01-sw1:2.126L14 Shelf 30 Bay 13 [NETAPP   X422_SCOMP600A10 NA03] S/N [SerialNumber] has failed. The system will correct the problem.  
Sat May 15 04:51:17 UTC [Node01:raid.config.check.failedPlex:error]: Plex /aggr_Node01_data/plex1 has failed.  
Sat May 15 04:51:39 UTC [Node01:diskown.ownerReservationMismatch:warning]: disk Site01-sw2:2.126L12 (S/N SerialNumber) is supposed to be owned by this node but has a persistent reservation placed by node ?? (ID 28600)
 
Shortly after these errors first start, the node will be taken over by the partner due to the degraded state of the node.
 
Example:
 A disk reservation was detected on disk Site01-sw1:2.126L8 at DDMMMYYYY 04:53:51
Ordinarily, this will only occur if the partner node has taken over.
This node will be shutdown.
HALT: HA partner has taken over disk reservations
Uptime: ddhhmmss
System rebooting...
 

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.