Skip to main content

Coming soon...New Support-Specific categorization of Knowledge Articles in the NetApp Knowledge Base site to improve navigation, searchability and your self-service journey.

NetApp Knowledge Base

How to measure CPU utilization

Views:
8,848
Visibility:
Public
Votes:
5
Category:
clustered-data-ontap-8
Specialty:
perf
Last Updated:

Applies to

  • ONTAP 9
  • Clustered Data ONTAP 8 
  • Data ONTAP 8 7-Mode 
  • Data ONTAP 7 and earlier 

Answer

As part of a holistic view of the system, use the command line to view CPU utilization in real-time:

Clustered Data ONTAP: 

netapp::> set diag
Warning: These diagnostic commands are for use by NetApp personnel only.
Do you want to continue? {y|n}: y
netapp::*> node run -node netapp-01 sysstat -M 1
ANY1+ ANY2+ ANY3+ ANY4+ ANY5+ ANY6+ ANY7+ ANY8+ ANY9+ ANY10+ ANY11+ ANY12+ ANY13+ ANY14+ ANY15+ ANY16+  AVG 
 100%  100%  100%   99%   98%   96%   94%   91%   86%    81%    76%    70%    64%    57%    48%   37%   81%

CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15 
 78%  76%  77%  83%  82%  83%  82%  82%  82%  82%   83%   84%   83%   82%   83%   82% 

Nwk_Excl Nwk_Lg Nwk_Exmpt Protocol Cluster Storage Raid Raid_Ex Target Kahuna WAFL_Ex(Kahu)
      3%     2%      450%       0%      0%     49%   2%    136%     0%     4%    511%( 94%) 

WAFL_XClean SM_Exempt Cifs Exempt SSAN_Ex Intr Host  Ops/s   CP
         0%        0%   0%   112%      0%  28%   8%  47111   0%

In this example, Average CPU Utilization is 81% across the 16 cores.  

Busiest domains:

  • WAFL exempt at 511%
  • Networking exempt at 450%
  • RAID exempt at 136%, and exempt at 112%.
  • WAFL was active 98% of the sample interval, with 4% spent in serial processing and 94% in parallel processing. 
  • WAFL serial processing being quite low, it is likely that more work could be completed by parallelized WAFL
  • Being 98% active within the sample interval is not a concern without other contributing performance indicators.
  • Overall CPU resources get scarce, increasing the likelihood of work queuing for CPU, potentially impacting client latency.
Data ONTAP 7-Mode:

netapp> priv set diag
netapp*> sysstat -M 1
ANY1+ ANY2+ ANY3+ ANY4+  AVG  CPU0 CPU1  CPU2  CPU3
93%    80%  36%   15%    56%  38%   32%  82%   72%

Nwk_Excl Nwk_Lg Nwk_Exmpt Protocol Cluster Storage Raid Raid_Ex Target Kahuna
1%         68%     1%       0%        0%      4%     0%  19%      0%    11%

WAFL_Ex(Kahu) WAFL_XClean SM_Exempt Cifs Exempt SSAN_Ex Intr Host Ops/s  CP
80%( 75%)      14%            0%      0%   24%    0%     1%    1%   0    83%

In this example, Average CPU Utilization is 56% and the nwk_legacy domain (max concurrency of 1) is 68%.

  • To analyze for a WAFL bottleneck, Kahuna is 11% and WAFL_Ex is 75%, or 86% in total:
    • As this is < 100%, it is not a bottleneck. However, if it is nearing 100%, it might still not be a concern without other contributing performance indicators.
  • While CPU (logical and physical) utilization is exposed by Data ONTAP, CPU utilization should not be used as a first-order metric for evaluating the overall performance of a system.
    • Instead, the inputs and outputs associated with the requested user work should be the first-order metric.
  • A focus on actual latency for work being serviced (Response Time) and the quantity of operations being processed in terms of IO requests or Bytes (Throughput) is recommended.
  • This measure of performance is relevant to a given workload and abstracts the complex nature of logical and physical CPU scheduling variations.

 

Scan to view the article on your device