Skip to main content
We are redesigning the NetApp Knowledge Base site to make it easier to use and navigate. The new and improved site will be available the first week of October. Check out our video or read this KB article to know more about changes you’ll see on the site.
NetApp Knowledge Base

How to measure CPU utilization

Views:
1,130
Visibility:
Public
Votes:
0
Category:
clustered-data-ontap-8
Specialty:
perf
Last Updated:

Applies to

  • ONTAP 9
  • Clustered Data ONTAP 8 
  • Data ONTAP 8 7-Mode 
  • Data ONTAP 7 and earlier 

Answer

As part of a holistic view of the system, use the command line to view CPU utilization in real-time:

Clustered Data ONTAP: 

netapp::> set diag
Warning: These diagnostic commands are for use by NetApp personnel only.
Do you want to continue? {y|n}: y
netapp::*> node run -node netapp-01 sysstat -M 1
ANY1+ ANY2+ ANY3+ ANY4+ ANY5+ ANY6+ ANY7+ ANY8+ ANY9+ ANY10+ ANY11+ ANY12+ ANY13+ ANY14+ ANY15+ ANY16+  AVG 
 100%  100%  100%   99%   98%   96%   94%   91%   86%    81%    76%    70%    64%    57%    48%   37%   81%

CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15 
 78%  76%  77%  83%  82%  83%  82%  82%  82%  82%   83%   84%   83%   82%   83%   82% 

Nwk_Excl Nwk_Lg Nwk_Exmpt Protocol Cluster Storage Raid Raid_Ex Target Kahuna WAFL_Ex(Kahu)
      3%     2%      450%       0%      0%     49%   2%    136%     0%     4%    511%( 94%) 

WAFL_XClean SM_Exempt Cifs Exempt SSAN_Ex Intr Host  Ops/s   CP
         0%        0%   0%   112%      0%  28%   8%  47111   0%

In this example, Average CPU Utilization is 81% across the 16 cores.  

Busiest domains:

  • WAFL exempt at 511%
  • Networking exempt at 450%
  • RAID exempt at 136%, and exempt at 112%.
  • WAFL was active 98% of the sample interval, with 4% spent in serial processing and 94% in parallel processing. 
  • WAFL serial processing being quite low, it is likely that more work could be completed by parallelized WAFL
  • Being 98% active within the sample interval is not a concern without other contributing performance indicators.
  • Overall CPU resources get scarce, increasing the likelihood of work queuing for CPU, potentially impacting client latency.
Data ONTAP 7-Mode:

netapp> priv set diag
netapp*> sysstat -M 1
ANY1+ ANY2+ ANY3+ ANY4+  AVG  CPU0 CPU1  CPU2  CPU3
93%    80%  36%   15%    56%  38%   32%  82%   72%

Nwk_Excl Nwk_Lg Nwk_Exmpt Protocol Cluster Storage Raid Raid_Ex Target Kahuna
1%         68%     1%       0%        0%      4%     0%  19%      0%    11%

WAFL_Ex(Kahu) WAFL_XClean SM_Exempt Cifs Exempt SSAN_Ex Intr Host Ops/s  CP
80%( 75%)      14%            0%      0%   24%    0%     1%    1%   0    83%

In this example, Average CPU Utilization is 56% and the nwk_legacy domain (max concurrency of 1) is 68%.

  • To analyze for a WAFL bottleneck, Kahuna is 11% and WAFL_Ex is 75%, or 86% in total:
    • As this is < 100%, it is not a bottleneck. However, if it is nearing 100%, it might still not be a concern without other contributing performance indicators.
  • While CPU (logical and physical) utilization is exposed by Data ONTAP, CPU utilization should not be used as a first-order metric for evaluating the overall performance of a system.
    • Instead, the inputs and outputs associated with the requested user work should be the first-order metric.
  • A focus on actual latency for work being serviced (Response Time) and the quantity of operations being processed in terms of IO requests or Bytes (Throughput) is recommended.
  • This measure of performance is relevant to a given workload and abstracts the complex nature of logical and physical CPU scheduling variations.

Additional Information

additionalInformation_text

 

 

******************************************************* *******************************************************