Skip to main content
NetApp Knowledgebase

Active IQ Wellness: Up to High Impact - This system is nearing the limits of its performance capacity

Views:
584
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
core
Last Updated:

 

Applies to

ONTAP 9 

Answer

Value of reviewing this information:

Performance capacity or headroom measures how much work you can place on a node or aggregate before performance of workloads begins to be affected by latency. Being aware of and managing available performance capacity helps ensure you provision and balance workloads to get expected response times. 

How this wellness check is validated?

Current performance capacity can be accessed and viewed by 3 different methods:

  • ONTAP 9 CLI: set privilege advanced
    statistics show -object resource_headroom_[cpu or aggr] -raw -counter <countername such as ewma_daily, ewma_weekly, or ewma_monthly> 
    This method leverages 1 month of heuristic data maintained on the ONTAP system.
     
  • Active IQ Unified Manager:

    1103633-1.png

    This method leverages 3 months of data collected by Active IQ Unified Manager in CM_Archive format. 
     
  • Active IQ: 

    1103633-2.png

    The risk is validated via AutoSupport Counter Manager data sent to NetApp in Daily Performance Data Notice AutoSupport messages. The data assessed aligns to the ONTAP 9 CLI 1-month calculations.

Available performance capacity is reviewed across all of existing NetApp systems to determine the level of impact for this alert: 

  • Values greater than the 99.5th percentile or top 0.5% will result in a High Risk
  • Values in from the 99th to 99.5th percentile will result in a Medium Risk 
What should I do about the information provided by this Active IQ Wellness rule?  

If you already have a plan for this proactive Active IQ warning, acknowledge it within your Active IQ dashboard. This will ensure that the Wellness warnings you see are issues you do not have a plan in place to address. 
 
To address this type of scenario:

  1. Do not attempt to increase workload if available performance capacity is insufficient to handle it and the current workloads cannot tolerate increased latency. 
  2. Ensure that you are monitoring workload indicators such as your throughput in xbps/IOPS/and utilization, so you can respond and plan before getting to the point of experiencing performance impact. A good start is the Performance Management guidance which includes using Active IQ Unified Manager, setting thresholds, and alerts. 

    1103633-3.png

 
The following counters can be monitored: 
 

1103633-4.png


 

1103633-5.png
 
 

  1. If while monitoring selected thresholds, you detect warning about capacity threshold exceeded, reduce or relocate workload to less busy nodes as necessary to ensure continued expected performance. 
  2. Use Unified Manager’s Usage Overview Panel to identify top consuming workloads and try to ensure they don’t share the same controller. 
     
  3. Use Active IQ to review the difference between current and peak performance, which is driven by performance capacity information provided by AutoSupport 

    1103633-6.png

    If current approaches peak, the recommendation would be to review workloads and relocate workloads to less busy nodes. 
     
  4. Review KB: How to rectify performance issues using monitoring tools

Additional Information