How to measure CPU utilization
Applies to
- ONTAP 9
- Clustered Data ONTAP 8
- Data ONTAP 8 7-Mode
- Data ONTAP 7 and earlier
Answer
As part of a holistic view of the system, use the command line to view CPU utilization in real-time:
Clustered Data ONTAP:
netapp::> set diag
Warning: These diagnostic commands are for use by NetApp personnel only.
Do you want to continue? {y|n}: y
netapp::*> node run -node netapp-01 sysstat -M 1
ANY1+ ANY2+ ANY3+ ANY4+ ANY5+ ANY6+ ANY7+ ANY8+ ANY9+ ANY10+ ANY11+ ANY12+ ANY13+ ANY14+ ANY15+ ANY16+ AVG
100% 100% 100% 99% 98% 96% 94% 91% 86% 81% 76% 70% 64% 57% 48% 37% 81%
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15
78% 76% 77% 83% 82% 83% 82% 82% 82% 82% 83% 84% 83% 82% 83% 82%
Nwk_Excl Nwk_Lg Nwk_Exmpt Protocol Cluster Storage Raid Raid_Ex Target Kahuna WAFL_Ex(Kahu)
3% 2% 450% 0% 0% 49% 2% 136% 0% 4% 511%( 94%)
WAFL_XClean SM_Exempt Cifs Exempt SSAN_Ex Intr Host Ops/s CP
0% 0% 0% 112% 0% 28% 8% 47111 0%
In this example, Average CPU Utilization is 81% across the 16 cores.
Busiest domains:
- WAFL exempt at 511%
- Networking exempt at 450%
- RAID exempt at 136%, and exempt at 112%.
- WAFL was active 98% of the sample interval, with 4% spent in serial processing and 94% in parallel processing.
- WAFL serial processing being quite low, it is likely that more work could be completed by parallelized WAFL
- Being 98% active within the sample interval is not a concern without other contributing performance indicators.
- Overall CPU resources get scarce, increasing the likelihood of work queuing for CPU, potentially impacting client latency.
Data ONTAP 7-Mode:
netapp> priv set diag
netapp*> sysstat -M 1
ANY1+ ANY2+ ANY3+ ANY4+ AVG CPU0 CPU1 CPU2 CPU3
93% 80% 36% 15% 56% 38% 32% 82% 72%
Nwk_Excl Nwk_Lg Nwk_Exmpt Protocol Cluster Storage Raid Raid_Ex Target Kahuna
1% 68% 1% 0% 0% 4% 0% 19% 0% 11%
WAFL_Ex(Kahu) WAFL_XClean SM_Exempt Cifs Exempt SSAN_Ex Intr Host Ops/s CP
80%( 75%) 14% 0% 0% 24% 0% 1% 1% 0 83%
In this example, Average CPU Utilization is 56% and the nwk_legacy domain (max concurrency of 1) is 68%.
- To analyze for a WAFL bottleneck, Kahuna is 11% and
WAFL_Ex
is 75%, or 86% in total:- As this is < 100%, it is not a bottleneck. However, if it is nearing 100%, it might still not be a concern without other contributing performance indicators.
- While CPU (logical and physical) utilization is exposed by Data ONTAP, CPU utilization should not be used as a first-order metric for evaluating the overall performance of a system.
- Instead, the inputs and outputs associated with the requested user work should be the first-order metric.
- A focus on actual latency for work being serviced (Response Time) and the quantity of operations being processed in terms of IO requests or Bytes (Throughput) is recommended.
- This measure of performance is relevant to a given workload and abstracts the complex nature of logical and physical CPU scheduling variations.
Additional Information
What is CPU utilization in Data ONTAP: Scheduling and Monitoring?