What happens if MetroCluster loses intercluster network and CRS is unable to sync?
Applies to
- MetroCluster
- ONTAP 9
Answer
This article desribes what happens in a MetroCluster if the intercluster network for the Config Replication Service (CRS) stops functioning, and the cluster is unable to synchronize the configuration.
Configuration updates are replicated between clusters with nearly synchronous semantics.
A configuration update in the source cluster is committed to local storage before the update is acknowledged to the user. Thus the capture of a configuration update has locally synchronous semantics; a configuration update is committed locally before it is acknowledged.
A captured configuration update is then sent to the destination cluster asynchronously and applied to the destination cluster as soon as possible. Thus the real-time replication of a configuration update is nearly synchronous in the destination cluster.
On switchover, the surviving (destination) cluster is able to read the storage used by the disaster-stricken (source) cluster. Configuration updates captured and not yet sent by the source cluster are read and applied by the destination cluster during a switchover operation. Thus the captured and not yet sent configuration updates are replicated to the destination cluster as if the replication had been done synchronously. All the configuration updates are applied remotely at the time switchover completes.
MetroCluster can thus guarantee that after switchover the configuration updates done in the source cluster will have been replicated to the destination cluster as if the replication had been done remotely with real-time synchronous semantics. A site failure immediately following configuration updates at site A should always make it across to site B after switchover.
MetroCluster might fail the guarantee if the recent configuration updates committed to storage used by the source cluster cannot be read by the destination cluster on switchover. This can happen if the ISLs connecting the FC switches in the backend fabric are down (or the links flicker up and down) for an extended time while the network used for CRS is also down.
Additional Information
additionalInformation_text