Skip to main content

NetApp_Insight_2020.png 

NetApp Knowledgebase

OnCommand Unified Manager - What is node resources overutilized and how to address it?

Views:
224
Visibility:
Public
Votes:
0
Category:
oncommand-unified-manager
Specialty:
om
Last Updated:

Applies to

  • ONTAP 9
  • ONTAP 9.1
  • ONTAP 9.2
  • ONTAP 9.3
  • ONTAP 9.4
  • ONTAP 9.5
  • ONTAP 9.6
  • Clustered Data ONTAP 8.2
  • OnCommand Unified Manager 6.x
  • OnCommand Unified Manager 7.2+
  • OnCommand Unified Manager 7.3
  • OnCommand Unified Manager 9.4
  • OnCommand Unified Manager 9.5
  • AIQ
  • Unified Manager 9.6

Answer

What is node resources over-utilized and how to address it?

OnCommand Unified Manager provides some standard threshold policies that monitor performance and generate events automatically. These policies are enabled by default, and they will generate warning events if the monitored thresholds are breached for 6 consecutive collection periods (30 minutes). Node resources over-utilized identify the situations where a single node is operating above the bounds of its operational efficiency, and therefore potentially affecting workload latencies. It does this by looking for nodes that are using more than 85% of their CPU resources.

In OnCommand Unified Manager this calculation is based upon complex algorithms using the following counters:

avg_processor_busy
cpu_elapsed_time1
total_cp_msecs
cp_phase_times[P2_FLUSH]
domain_busy{kahuna]
processor_elapsed_time 

This alert simply indicates the storage controller has been busy for 30 minutes or more. There may not be corrective action required, or you can continue to serve data without issues. However, in some situations, performance might be impacted for workloads on the controller when this alert is generated. Before opening a NetApp Technical Support case, confirm the following:

  • Are there any volumes / LUNs facing latency at the time when the alert is generated?
  • Is the latency more than acceptable thresholds for your environment / applications?
  • What type of operations are impacted? Reads, or writes?
  • Have there been any changes in the environment, including user workload and infrastructure?
  • Is the performance impact reproducible? Does an activity or workload trigger the performance impact?

If you can confirm that workloads are impacted when this alert is generated, open a NetApp Technical Support case for further investigation. At the same time, collect performance data for troubleshooting purposes:

If no workloads are impacted, or you are serving data normally, the alert can be safely ignored, though you should monitor node performance closely.

For more information on CPU scheduling and utilization in Data ONTAP, see KB:  CPU utilization in Data ONTAP: Scheduling and Monitoring.