- ONTAP 9
- Data ONTAP 8 7-mode
What is file history and how is it communicated?
- File history is generated during an Network Data Management Protocol (NDMP) backup of a volume hosted on NetApp storage using the
- File history enables a backup application or Data Management Application (DMA) to build an index database of all the files in a backup
- This database enables users to locate which backup contains a particular file, when that file was modified, and other useful metadata
- The purpose of collecting and storing file history is to:
- Provide a human-readable user interface to backup data
- Provide a basis for Direct Access Recovery (DAR). DAR allows a DMA to access files / directories directly on tape without having to traverse the entire backup. This allows for quicker file and directory recovery operations.
- How file history is communicated:
- During a backup, ONTAP's
dumpprocess generates file history in phase 3 and 4 of a backup. For more information on dump phases, see Network Data Management Protocol (NDMP) dump phases description.
dumpcommunicates file history information to the NDMP Server running in ONTAP. This communication is internal to the storage controller running the backup.
- The NDMP Server in ONTAP communicates the file history to the backup application / DMA through the NDMP control connection over the network.
- The backup application / DMA receives the file history from the network, ingests the data, and writes it to the file history index database.
What is file history back pressure? What impact does it have?
- Generating, communicating, and ingesting file history will always add some overhead to a backup.
- A backup typically always runs faster with file history disabled even if there are no other performance issues.
- A bottleneck in one step of file history communication can trigger latencies downstream.
- Due to the way NDMP and dump work together, latency in file history delivery or ingestion can cause a slow-down in the overall backup performance.
- In other words,
dumpcannot continue writing data to the backup stream until the associated file history is completely ingested and acknowledged by the backup application / DMA.
- In other words,
How is file history back pressure identified?
- Check the dump backup log located in
/etc/log/backupon the node which hosts the volume being backed up. The interesting logs print after a backup has completed successfully. An aborted or incomplete backup will not provide the logs sufficient to diagnose this problem.
- Calculate the duration of phase 3 and phase 4 of the backup in question. In the following example, phase 3 was 20 minutes long and phase 4 was 5 hours long:
dmp Thu Feb 27 12:01:36 CET 2020 /vol/NdmpBackup/(0) Phase_change (III)
dmp Thu Feb 27 12:21:36 CET 2020 /vol/NdmpBackup/(0) Phase_change (IV)
dmp Thu Feb 27 17:21:36 CET 2020 /vol/NdmpBackup/(0) Phase_change (V)
- Find the total
Dir to FH entry time statsand
Node to FH entry time statsfor the backup which are presented in milliseconds. For example:
dmp Thu Feb 27 17:21:52 CET 2020 /vol/NdmpBackup/(0) Log_msg (Dir to FH entry time stats (msec) numEntries: 2000 min: 0 max: 1526 avg: 5 tot: 15236)
dmp Thu Feb 27 17:21:52 CET 2020 /vol/NdmpBackup/(0) Log_msg (Node to FH Entry time stats (msec) numEntries: 4000 min: 0 max: 1599 avg: 7 tot: 85569)
- If the total
Dir to FH entry timeis 15% or more of total phase 3 time, this is considered file history backpressure in phase 3. In the above example, 15.236 seconds is only ~1% of the 20 minute phase 3 time, so this is not considered backpressure.
- If the total
Node to FH entry timeis 15% or more of total phase 4 time, this is considered file history backpressure in phase 4. In the above example, 85.569 seconds is <1% of the 5 hour phase 4 time, so this is also not considered backpressure.
- ONTAP may warn of possible file history backpressure with the following log:
dmp Thu Feb 27 12:05:52 CET 2020 /vol/NdmpBackup/(0) Warning (Total Dir to FH time spent is greater than 15 percent of phase 3 total time. Please verify the settings of backup application and the network connectivity)
- In releases prior to ONTAP 9.6, these warnings can be inaccurate due to BUG 1177614.
- It is recommended to use the above manual calculations to confirm file history backpressure.
- In ONTAP 9.7 and later, Dir and Node to FH stats are displayed in seconds (sec) rather than milliseconds (msec)
What are the common root causes and solutions of file history backpressure?
- File history ingestion at the backup application / DMA common causes and solutions:
- Cause #1: Resource contention on the server or VM hosting the backup application / DMA as seen in general performance statistics from the hosting OS.
- Solution #1: Add additional resources to the server / VM hosting the DMA or reduce resource contention to allow for faster ingestion and indexing.
- Cause #2: Poor performance writing to the file history index database.
- Solution #2:
- This is typically due to poor performance of the underlying storage or configuration / sizing issues with the DMA software.
- Ensure the storage hosting the file history index database has adequate performance.
- If required, contact your DMA vendor for sizing assistance or other tweaks to allow for faster indexing of file history.
- NDMP Server communication to the backup application common cause and solution:
- Cause: The NDMP control connection is using a latent or lossy network path.
- Ensure the network path hosting the NDMP control connection has adequate speed and throughput to support the NDMP control connection and file history messages.
- Check interface statistics for signs of packet loss or other issues.
- On a NetApp storage controller, the node-shell
ifstatcommand can be used to view interface statistics.
- Finally, ensure the MTU configured in ONTAP and at the backup application is supported through the entire network path.
Add your text here.