Skip to main content
NetApp Knowledge Base

Active IQ Wellness: Up to High Impact - This system is nearing the limits of its performance capacity

Views:
2,164
Visibility:
Public
Votes:
1
Category:
ontap-9
Specialty:
perf
Last Updated:

 

Applies to

ONTAP 9 

Answer

Value of reviewing this information:
  • Performance capacity or headroom measures how much work you can place on a node or aggregate before the performance of workloads begins to be affected by latency.
  • Being aware of and managing available performance capacity helps ensure you provision and balance workloads to get expected response times. 
How this wellness check is validated?

Current performance capacity can be accessed and viewed by 3 different methods:

  • Follow the steps outlined in ONTAP 9 document: Identifying Remaining Performance Capacity 
    • This method leverages 1 month of heuristic data maintained on the ONTAP system.
  • Active IQ Unified Manager:
    • This method leverages 3 months of data collected by Active IQ Unified Manager in CM_Archive format. 
      Active IQ Unified Manager
  • Active IQ
    • (Node - CPU) CPU Performance capacity is the difference between peak_performance and current_utilization counters:
      Active IQ Wellness
    • (Local Tier - Aggr Util%)  Please note Active IQ does not provide a Peak value so cannot be used to spotlight available performance capacity, however current utilization spikes can be viewed:Active IQ Unified Manager
      • Be aware that systems with no data aggregates or with backup/disaster recovery roles may exhibit low-performance headroom for aggr utilization due to low drive count or periodic high volume sequential IO.   
      • If increased per IO latency is not a concern for the system in question then instances of this risk can be ignored.
      • The risk is validated via AutoSupport Counter Manager data sent to NetApp in Daily Performance Data Notice AutoSupport messages. 
        • The data assessed aligns with the ONTAP 9 CLI 1-month calculations.
      • Available performance capacity is reviewed across all of existing NetApp systems to determine the level of impact for this alert: 
      • Values greater than the 99.5th percentile or top 0.5% will result in a High Risk
      • Values from the 99th to 99.5th percentile will result in a Medium Risk 
What should I do about the information provided by this Active IQ Wellness rule?  
  • If you already have a plan for this proactive Active IQ warning, acknowledge it within your Active IQ dashboard. 
  • This will ensure that the Wellness warnings you see are issues you do not have a plan in place to address. 
  •  To address this type of scenario:
  1. Do not attempt to increase workload if available performance capacity is insufficient to handle it and the current workloads cannot tolerate increased latency. 
  2. Ensure that you are monitoring workload indicators such as your throughput in xbps/IOPS/and utilization, so you can respond and plan before getting to the point of experiencing performance impact. 

A good start is the Performance Management guidance which includes using Active IQ Unified Manager, setting thresholds, and alerts. 
Active IQ Unified Manager1103633-3.png

The following counters can be monitored: 
Active IQ Unified Manager

Active IQ Unified Manager
 

  1. If while monitoring selected thresholds, you detect warning about capacity threshold exceeded, reduce or relocate workload to less busy nodes as necessary to ensure continued expected performance. 
  2. Use Unified Manager’s Usage Overview Panel to identify top consuming workloads and try to ensure they don’t share the same controller. 
  3. Use Active IQ to review the difference between current and peak performance (CPU) or spikes in avg utilization (AGGR) which would be associated with performance capacity information provided by AutoSupport.
    If current utilization approaches peak performance or spikes are seen, the recommendation would be to review workloads and if there is an issue relocate workloads to less busy nodes. 
  4. Review KB: How to rectify performance issues using monitoring tools

Additional Information

Where can I find more information on this topic?

ONTAP 9 document: Identifying Remaining Performance Capacity 

 

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.