Skip to main content

NetApp_Insight_2020.png 

NetApp Knowledgebase

What does "lag" mean for SnapMirror/SnapVault?

Views:
303
Visibility:
Public
Votes:
0
Category:
snapmirror
Specialty:
dp
Last Updated:

 

Applies to

  • Clustered Data ONTAP 8
  • ONTAP 9
  • Data ONTAP 8 7-Mode
  • Data ONTAP 7 and earlier
  • SnapMirror

Answer

The definition of the word 'lag' is simply "the amount of time passed between two events".  Because of this definition, 'lag' is typically associated with performance.  However, 'lag' is a term that is misunderstood in the context of replication and SnapMirror/SnapVault relationships.  The typical understanding of lag is the amount of time since the last successful update.  While this is not completely incorrect, it does not account for 2 other factors.

  • Time configured on the controllers
  • Duration of the transfer

How time is configured is important because that is the timestamp inherited by the snapshots and file system.  If time if configured incorrectly, timestamps will be inaccurate.  Since lag is calculated using timestamps, lag will also be incorrect in this case.    

Duration of the transfer is also overlooked because of the nature of replication.  Lag is not measured starting from the time the last transfer completes. 
Lag is measured starting from the time the snapshot was first created on the Source/Primary controller.  This may seem similar, but the difference can be significant.

Consider the following SnapMirror scenario:
Source  Destination
ControllerA:vol_1 ControllerB:vol_1_mir

A scheduled update is started @ 12:00pm. 
A SnapMirror snapshot is created on the Source volume and a transfer is started.  The transfer takes 45 minutes to complete.  Its now 12:46pm, the transfer completed 1 minute ago. 

Q:  What should the "lag" for the relationship?
A:  The 'lag' in this scenario is 46 minutes, ( not 1 minute).

On the Destination controller, Lag is measured by diffing the create time of the snapshot against the controller's current time. 
Hence, if the time is not configured correctly on the Dest (or the Src) controller, the 'lag' time could be incorrect.

Consider the following SnapVault scenario:
Primary Secondary
CIFS_SVM:vol_1 CIFS_DR:vol_1_dr


As per the snapshot policy on vol_1, a snapshot is created @ 5pm with the snapmirror-label of 'sv_daily'. 
A SnapVault update configured for the 'sv_daily' label is triggered at the scheduled time @ 1am the following morning.  The transfer takes 30 minutes to complete. 

Q:  What should the lag be for this relationship at the end of the transfer? 
A:  For this SnapVault scenario, the lag time is 8hours, 30minutes. 

Again, lag is calculated using the time when the snapshot was created on the Primary; not when the transfer completed on the Secondary.
This is because in replication, the version of the file system is what matters.  Snapshot copies are point-in-time references and represent the file system when the snapshot was created. 
The snapshot is the data being protected, therefore the time when it is created is the point-in-time we use when calculating lag. 
This can be exaggerated in other configurations like cascades where the same version of the file system is replicated to many controllers. 

Keep in mind, lag is simply an indicator.  Like all indicators, it can alert you to a problem if understood correctly in context.  
Misunderstanding lag can lead to mistaking normal operations for a problem.  By itself, a high amount of lag for a given relationship does not necessarily indicate there is an issue.

Additional Information