Skip to main content

NetApp_Insight_2020.png 

NetApp Knowledgebase

What is the method of troubleshooting SnapMirror, SnapVault, and OSSV performance issues?

Views:
441
Visibility:
Public
Votes:
0
Category:
snapmirror
Specialty:
om
Last Updated:

 

Applies to

  • Data ONTAP 7 and earlier 
  • SnapMirror
  • SnapVault
  • Open Systems SnapVault 

Answer

The following methodology will help in troubleshooting SnapMirror, SnapVault, and OSSV performance issues.

Performance issues are mainly due to the following:

  • Overload SnapMirror/SnapVault implementation.
  • Non-optimal space & data layout management.
  • High system resources utilization (CPU% Util, Disks I/O, Common Internet File System protocol (CIFS) / Network File System (NFS) connections/transactions, etc.).
  • Low network bandwidth.

Symptoms are as follows:

  • Initialization or transfer updates lagging. Consequently, the lag is above expectations, and the transfer duration does not meet the Service-Level Agreement (SLA).
  • The transfer duration meets the SLA, however, the throughput is low.

From the /etc/snapmirror.conf or the snapvault snap sched, define what is the expected lag (= Expected time between 2 scheduled updates).
Then explore the snapmirror status –l or snapvault status –l outputs to get a helicopter view of the mirror implementation:

  • How many systems are involved?
  • How many mirror/backup services are active?
  • Which systems are a source and a destination at the same time?
  • How many relationships are set per source and destination systems?
  • Note the transfer lag and define the date/time the last transfer succeeded.
  • Analyze the snapmirror logs and syslog messages to trace what happened before and after the last successful transfer completed: when the request was sent, started and ended? Any error?

Recommendations:

  • Try to keep all relationships that take roughly the same transfer time in the same volume.
  • Create multiple volumes on the destination with varying primary sizes and transfer requirements.
  • Stagger SnapMirror or SnapVault schedules for transfers to reduce the impact of resources on the target. For example, if four transfers are required every hour, space the start times for every 15 minutes apart.
  • Scheduling SnapMirror updates per minute is not advised. Check the snapmirror.conf schedule minute field (A * in that field means the update request is triggered each minute). If the business requires a synchronized backup for critical data, then Sync SnapMirror is the suitable service to use instead of Async SnapMirror scheduled per minute.
  • Ensure that snapshot creation schedules for all mirror/backup active services do not overlap. When possible, schedule transfers at different times than the scheduled regular volume snapshot copies.
  • For traditional volumes, SnapMirror ensures that disks size/type and raid group size are identical between the source and the destination volumes.
  • Throttle bandwidth on transfers in the /etc/snapmirror.conf file with the Kbs argument:
    • Default settings result in no throttling of transfers.
    • Throttling is especially important in a high-speed LAN with many mirror relationships.

Pay close attention to space

When the OSSV primary install partition is short in space, update will fail with the error Failed to sort inode records Database, Temporary and Trace directories have 0% space left (5Mb)

  • Is there enough space on the source and destination volume? (Use the df command to display the free space per volume).
  • If the volume is full, even snapshot creation may fail:

    For flexible volume, increase volume size
    For traditional volume, add disks (Minimum 3) to the volume

    • Delete unnecessary unlocked snapshots
    • Use the snap reclaimable command to display the amount of space you can reclaim by deleting snapshots
  • If OFM (W2K & NT) is being used, make sure the file systems being backed up have at least 15% disk space free in the drive been backed up.
  • Make sure the OSSV client has enough disk space to operate.
    • Enable the Run estimator before each backup feature, available from OSSV 2.2.
    • You can also run the Health Check Utility or svinstallcheck. It calculates and displays the free space in the database and tmp partition.
    • Refer to the OSSV Release Notes for disk space requirements and consumption, especially when Block-Level Increment (BLI) is in use.
    • To correct this problem, increase the amount of space in the partition containing the OSSV database, temporary, and trace directories. If the partition cannot be increased in size, move the directories to a location with more available free space. Note that if these directories are moved, the path in the SVConfigurator General tab must be updated then stop and restart the OSSV service on the Windows server.

System resources utilization

High system resources utilization (CPU% Util, Disks I/O, CIFS/NFS connections/transactions, etc.) may slow down transfer throughput.

  • Collect and analyze the outputs of the following commands:
  • perfstat output from source and destination (this adds statit and sysstat output as well).
  • statit and sysstat -m output while the transfer is going on, both on the source and on the destination.
  • Network details (other jobs, bandwidth, failures, expected throughput, throttling in place).

For common SnapMirror/SnapVault problems, see KB: Top 10 SnapMirror/ SnapVault issues and solutions.

Additional Information

N/A

 

CUSTOMER EXCLUSIVE CONTENT

Registered NetApp customers get unlimited access to our dynamic Knowledge Base.

New authoritative content is published and updated each day by our team of experts.

Current Customer or Partner?

Sign In for unlimited access

New to NetApp?

Learn more about our award-winning Support