What is the method of troubleshooting SnapMirror, SnapVault, and OSSV performance issues?
- Last Updated:
- Data ONTAP 7 and earlier
- Open Systems SnapVault
The following methodology will help in troubleshooting SnapMirror, SnapVault, and OSSV performance issues.
Performance issues are mainly due to the following:
- Overload SnapMirror/SnapVault implementation.
- Non-optimal space & data layout management.
- High system resources utilization (CPU% Util, Disks I/O, Common Internet File System protocol (CIFS) / Network File System (NFS) connections/transactions, etc.).
- Low network bandwidth.
Symptoms are as follows:
- Initialization or transfer updates lagging. Consequently, the lag is above expectations, and the transfer duration does not meet the Service-Level Agreement (SLA).
- The transfer duration meets the SLA, however, the throughput is low.
/etc/snapmirror.conf or the
snapvault snap sched, define what is the expected lag (= Expected time between 2 scheduled updates).
Then explore the
snapmirror status –l or
snapvault status –l outputs to get a helicopter view of the mirror implementation:
- How many systems are involved?
- How many mirror/backup services are active?
- Which systems are a source and a destination at the same time?
- How many relationships are set per source and destination systems?
- Note the transfer lag and define the date/time the last transfer succeeded.
- Analyze the snapmirror logs and syslog messages to trace what happened before and after the last successful transfer completed: when the request was sent, started and ended? Any error?
- Try to keep all relationships that take roughly the same transfer time in the same volume.
- Create multiple volumes on the destination with varying primary sizes and transfer requirements.
- Stagger SnapMirror or SnapVault schedules for transfers to reduce the impact of resources on the target. For example, if four transfers are required every hour, space the start times for every 15 minutes apart.
- Scheduling SnapMirror updates per minute is not advised. Check the
snapmirror.confschedule minute field (A * in that field means the update request is triggered each minute). If the business requires a synchronized backup for critical data, then Sync SnapMirror is the suitable service to use instead of Async SnapMirror scheduled per minute.
- Ensure that snapshot creation schedules for all mirror/backup active services do not overlap. When possible, schedule transfers at different times than the scheduled regular volume snapshot copies.
- For traditional volumes, SnapMirror ensures that disks size/type and raid group size are identical between the source and the destination volumes.
- Throttle bandwidth on transfers in the
/etc/snapmirror.conffile with the Kbs argument:
- Default settings result in no throttling of transfers.
- Throttling is especially important in a high-speed LAN with many mirror relationships.
Pay close attention to space
When the OSSV primary install partition is short in space, update will fail with the error
Failed to sort inode records Database, Temporary and Trace directories have 0% space left (5Mb)
- Is there enough space on the source and destination volume? (Use the
dfcommand to display the free space per volume).
- If the volume is full, even snapshot creation may fail:
For flexible volume, increase volume size
For traditional volume, add disks (Minimum 3) to the volume
- Delete unnecessary unlocked snapshots
- Use the snap reclaimable command to display the amount of space you can reclaim by deleting snapshots
- If OFM (W2K & NT) is being used, make sure the file systems being backed up have at least 15% disk space free in the drive been backed up.
- Make sure the OSSV client has enough disk space to operate.
- Enable the
Run estimator before each backupfeature, available from OSSV 2.2.
- You can also run the Health
svinstallcheck. It calculates and displays the free space in the database and tmp partition.
- Refer to the OSSV Release Notes for disk space requirements and consumption, especially when Block-Level Increment (BLI) is in use.
- To correct this problem, increase the amount of space in the partition containing the OSSV database, temporary, and trace directories. If the partition cannot be increased in size, move the directories to a location with more available free space. Note that if these directories are moved, the path in the SVConfigurator General tab must be updated then stop and restart the OSSV service on the Windows server.
- Enable the
System resources utilization
High system resources utilization (CPU% Util, Disks I/O, CIFS/NFS connections/transactions, etc.) may slow down transfer throughput.
- Collect and analyze the outputs of the following commands:
perfstatoutput from source and destination (this adds
sysstatoutput as well).
sysstat -moutput while the transfer is going on, both on the source and on the destination.
- Network details (other jobs, bandwidth, failures, expected throughput, throttling in place).
For common SnapMirror/SnapVault problems, see KB: Top 10 SnapMirror/ SnapVault issues and solutions.