Skip to main content
NetApp Knowledgebase

FAQ: Consistency Point

Views:
2,094
Visibility:
Public
Votes:
2
Category:
ontap-9
Specialty:
perf
Last Updated:

 

Applies to

  • Data ONTAP 7
  • Data ONTAP 8
  • ONTAP 9

Answer

What is a Consistency Point, and why does NetApp use it?

A storage controller has two separate memory buffers for storing write data. The size of these buffers is based on the amount of nonvolatile RAM (NVRAM) in a given system. This can be calculated as 1/2 the NVRAM size for stand-alone systems, 1/4 the NVRAM size for clustered systems and 1/8 the NVRAM size for 4/8-node MetroClusters. All writes into a storage controller are stored concurrently in the following locations:

  • Local memory buffer
  • Local NVRAM/NVMEM
  • Remote NVRAM (Clustered systems only)
  • Remote NVRAM of the DR partner (MetroCluster systems only)

As soon as the process to log to NVRAM is confirmed, the controller acknowledges the write as completed to the client machine (app, protocol?). At pre-defined triggers, this buffered write data is processed from the storage controller memory through the Write Anywhere File Layout (WAFL) and RAID layers and written to disk. The active file system pointers on the disk are not updated to point to the new locations until a write is completed.

Upon completion of a write to disk, the contents of NVRAM are cleared and made ready for the next batch of incoming write data. This act of writing data to disk and updating active file system pointers is called a Consistency Point (CP).

During the writing phase of one memory buffer, the second memory buffer and NVRAM space ( both local and remote ) are used to store and acknowledge incoming writes.

What are the benefits of Consistency Points versus Direct Writes?

The following characteristics are some of the primary benefits of using Consistency Points instead of passing writes directly to disk:

  • WAFL file system integrity: Even during a power loss, the WAFL file system remains in a consistent state. All operations acknowledged by the controller to the clients are confirmed consistent and are preserved intact. This allows for rapid boot up no matter how the node was taken down since WAFL file system consistency checking is not required.
  • Increase in client write performance: Since memory is significantly faster than spinning media, the end client receives faster acknowledgement of its writes and can continue processing the next write request. A performance gain is also attained on the disk subsystem since the write operations to disk can be grouped together into larger sequential writes instead of many smaller random writes.
  • Optimized data layout: Since the writes to disk are processed in bulk, the WAFL subsystem can allocate larger swaths of data in a contiguous space on disk. This minimizes the possibility of data layout randomization, and thus prevents the need for data reallocation.
  • Decreased requirement for non-volatile RAM (NVRAM) / Memory buffer size: This is not obvious at first; however, logging write requests as they arrive consumes a considerably smaller amount of memory than would be required if the caching was done after processing by Write Anywhere File Layout (WAFL) and RAID layers. This also prevents a failure of NVRAM from resulting in a damaged filesystem that normal file system consistency checks cannot detect nor correct.
Write interruption scenario

In the event of a power failure or other disruption during the write process, the filesystem consistency is still maintained because the active pointers are still pointing to the pre-change data. The write that was interrupted is still available in local NVRAM upon next boot. This data is checked for and when found, is replayed to the local memory buffer and immediately re-processed through the WAFL and RAID layers and written to disk during the next CP.

This next boot period could be when power is returned, when hardware failures are corrected, or when in a clustered scenario-upon booting in takeover on the partner head. Since all write data for one node is also stored in the partner controller’s NVRAM, when the takeover occurs and the downed node boots virtually, all the writes that had been acknowledged are available for it to replay to its memory buffer and process through WAFL and RAID layers and then written to disk. In this clustered environment, when the giveback is performed and the original node boots, it will find the contents in NVRAM from when the node went down. However, instead of replaying this log into memory and processing it for writing to disk, the node is aware that it had been taken over and that this data was already processed. Upon this realization, the NVRAM contents will be cleared since the data was already written to disk.

WAFL attempts to place blocks that are likely to be accessed together in locations close together on disk. This is achieved by buffering multiple write requests into memory and logging the changes to the NVRAM. After a certain interval (normally 10 seconds), RAID stripes are created from the contents of the buffer, parity is computed, and these are then flushed to disk. A flush will also be initiated if the buffers are close to filling up before the normal interval has expired.

ONTAP 9 - Per Aggregate CP

ONTAP 9 has performance enhancements to lessen the impact of consistency points. Rather than processing a single consistency point for all aggregates at the same time (Global), ONTAP 9 has functionality to allow each aggregate to commit CPs independently of one another.

It is still possible for Back to Back consistency points to occur, however the negative performance of one aggregate should not affect other aggregates. This is important to keep in mind when troubleshooting performance on a cluster running ONTAP 9.

With ONTAP 9 and Per-Aggregate CP's, the CP information in sysstat has changed.  Instead of displaying individual CP's, a summary of CP's is now displayed:

         CP_Ty               CP_Ph    
[T--H--F--N--B--O--#--:]  [n--v--p--f]

The "CP_Ty" columns now display the number of aggregates that had a CP triggered (or continuing) due to the following reasons:

CP_Ty Definition
T CP caused by timer
H CP caused by high water mark; the amount of modified data in the storage system's memory cache is high enough that it
is ideal to start a CP to force it out to disk
F CP caused by full NVLog; the amount of logged data in the storage system's NVRAM pool is high enough that it is ideal
to start a CP to force it out to disk
N CP caused by the NV Log reaching a maximum number of entries
B Back to back CP
O All other types of CP
# Continuation of CP from previous interval and the storage system has determined it needs to commit the current data to
disk (a watermark of some sort has been reached), so that the next CP will be of type B
: Continuation of CP from previous interval

The "CP_Ph" column represent the number of aggregates in the following phases:

CP_Ph Definition
n Processing normal files
v Flushing modified superblock to disk
p All other file processing
f Flushing modified data to disk
What is the Back-to-Back (B2B) Consistency Point Scenario?
Please see this article for more infomation.

A NetApp Storage Controller has two buffers for accepting and logging write data. The Storage Controller can only process one Consistency Point (CP) perf Aggregate at a time due to this buffered writing scenario. Under certain circumstances, while one CP is being processed and written to disk, the second memory buffer can reach a watermark that triggers a new CP prior to the previous CP being completed. Since the Consistency Point process is global per aggregate (meaning ALL writes for that aggregate flow through this mechanism), and atomic (meaning all changed data that is part of the CP must be committed to disk in order to complete it), a Storage Controller in this situation must momentarily delay acknowledging ALL incoming write data requests until the previous CP is completed and the corresponding Non-Volatile RAM (NVRAM) and local memory buffers are cleared.

In most instances of this specific scenario, the time at which the storage controller must pause incoming write requests is measured in milliseconds, and the environment is not significantly impacted. However, on storage controllers that fall into one or both of the categories below, the impact on overall performance might be undesirable.

Aggravating circumstances for the Back-to-Back scenario:
  • The incoming workload (continuous or burst) is greater than what the storage appliance is configured to handle (commonly this includes using slower drives for production loads).
  • Inefficiencies in data layout which cause excessive internal systems overhead(usually due to unaligned IO in LUNs or VMDK storage objects).If either of the above circumstances exists, they absolutely must be corrected in effort to alleviate future performance issues with the storage controller. If the issue is due to unaligned IO on LUNs or VMDK storage objects, this should be the initial focus for addressing the possible performance issues.

Note: Back-to-Back CPs are identified by a 'B' or 'b' in the CP Type column of sysstat output, not simply seeing CPs start in the next sample after one completes.
If you are experiencing a performance problem with workload, throughput or latency, then please open a case with NetApp Global Support.

Legacy CP Information:
What are the different Consistency Point types and how are they measured?

Consistency Points are measured using the sysstat command. For more information on the options available and a sample output, see the sysstat Manual Page. The Consistency Point (CP) type is the reason of the CP that was started in the interval. The first character in the CP Type column identifies the type of CP as listed below. Multiple CPs display no cause character, just the count of CPs during the measurement interval. The CP types are described as follows:
 

CP Type Definition
'.' No CP started during sampling interval
B Back to back CPs (CP generated CP)
b Deferred back to back CPs (CP generated CP)
D CP caused by a low number of datavecs (static allocated buffer space for writes)
F CP caused by full NVLog; the amount of logged data in the storage system's NVRAM pool is high enough that it is ideal to start a CP to force it out to disk
H CP caused by high water mark; the amount of modified data in the storage system's memory cache is high enough that it is ideal to start a CP to force it out to disk
L CP caused by low water mark; the amount of memory available for routine housekeeping tasks is low enough that it is ideal to start a CP to release some more
M CP caused by low mbufs; writes data to the disk in order to prevent an out-of-memory buffer situation
N CP caused by the NV Log reaching a maximum number of entries
S CP caused by snapshot operation
T CP caused by timer
U CP caused by flush; one or more clients who have been issuing asynchronous writes (that is, writes that under the rules of the client protocol do not have to be committed to persistent storage immediately) has issued a request that all of its outstanding uncommitted I/Os should now be committed to persistent storage
V CP caused by low virtual buffers
Z CP caused by internal sync; the storage system wants to force a disk update, usually during snapshot processing
: Continuation of CP from previous interval
# Continuation of CP from previous interval and the storage system has determined it needs to commit the current data to disk (a watermark of some sort has been reached), so that the next CP will be of type B

Only CP’s of type 'B' or 'b' will affect write latency.

The type character is followed by a second character which indicates the phase of the CP at the end of the sampling interval. If the CP completed during the sampling interval, this second character will be blank. The phases are as follows:

  • 0 Initializing
  • n Processing normal files
  • s Processing special files
  • f Flushing modified data to disk
  • v Flushing modified superblock to disk
  • q Processing quota files

Additional Information

Add your text here.