Skip to main content
NetApp Knowledgebase

What is CPU utilization in Data ONTAP: Scheduling and Monitoring ?

Views:
4,849
Visibility:
Public
Votes:
4
Category:
clustered-data-ontap-8
Specialty:
perf
Last Updated:

 

Applies to

  • ONTAP 9
  • Clustered Data ONTAP 8 
  • Data ONTAP 8 7-Mode 
  • Data ONTAP 7 and earlier 

Answer

Note : CPU Utilization is only one part of a complex system in ONTAP. See this KB to determine if CPU Utilization is a concern

 

Each storage controller contains one or more multi-core CPUs. These physical CPU cores are the primary compute resource available to Data ONTAP for processing work. In addition to CPU cores, Data ONTAP interacts with other physical hardware such as Ethernet ports, FC ports, disks, and NVRAM. The use of all these physical resources occurs in a highly optimized and parallel manner, taking into account active requests, resource availability, and the overall activity level of the system. As the interaction between physical resources can be complex and inter-dependent, a measure of CPU being busy (CPU utilization) does not increase linearly with an increase in incoming requests from clients, nor can it be used alone as a measure of the overall system utilization. That said, CPU resources have several unique characteristics that can be useful to understand when analyzing the system as a whole.

This KB describes the following in more detail:

  • CPU as a computing resource
  • CPU as a metric of system performance
  • How to measure CPU utilization 
CPU as a compute resource:

Data ONTAP uses a Coarse Symmetric Multiprocessing (CSMP) design which partitions system functions into logical processing domains. Each logical processing domain has a set of rules that govern how and when the logical CSMP domain can be scheduled across physical CPU cores. These rules are designed to ensure that all processing occurs in a safe and efficient manner.

The following table describes some of the common logical processing domains, their typical tasks, and describes if the logical domain can run on one or more CPU cores concurrently, along with any specific scheduling rules:

Domain name

Typical Tasks in Domain

CPU Concurrency

Notes

nwk_exclusive

IP processing, NFS protocol processing

1

Networking code which can only run on a single CPU concurrently

nwk_exempt

IP processing, NFS protocol processing (7-mode and cDOT), SMB processing (cDOT)

1+

Maximum number of CPUs is dependent on controller model and Data ONTAP release

nwk_legacy

IP processing, NFS protocol processing

1

Networking code which can only run on a single CPU concurrently

storage

SCSI communication
with disks

1+

Concurrency of 1 prior to Data ONTAP 8.2.1 or if less than 6 CPUs

raid

RAID subsystem

1

 

raid_exempt

RAID subsystem

1+

Introduced in Data ONTAP 8.2

XOR_Ex

RAID subsystem XOR parity processing

1+

Introduced in ONTAP 9.0

target

SCSI (FCP/iSCSI) processing

1

 7-mode only

ssan_exempt

SCSI (FCP/iSCSI) processing

1+

Introduced in clustered Data ONTAP 8.2

Kahuna

Serialized WAFL and anything not in another domain

1

Exclusive with WAFL_Ex (i.e. either Kahuna can be active on 1 CPU or WAFL_Ex can be active on 1+ CPUs, but both cannot be active at the same time)

WAFL_Ex

Parallelized WAFL

1+

Exclusive with Kahuna (i.e. either Kahuna can be active on 1 CPU or WAFL_Ex can be active on 1+ CPUs, but both cannot be active at the same time)

WAFL_XCleaner

WAFL

1+

 

SM_Exempt

SnapMirror

1+

 

cifs

SMB protocol processing (7-mode only)

1

Initial decoding only; majority of SMB processing occurs in WAFL

exempt

General parallelized work

1+

 

hostOS

Tasks owned by the BSD layer including NTP, environmental sensor monitoring, ZAPI handling, autosupport

1+

 


 Logical CSMP domains are scheduled to run on physical CPU cores by the Data ONTAP kernel. The scheduling logic is unique to a given Data ONTAP release and hardware platform and is tuned to maximize the overall system performance. As such, the level of parallelism seen for a given logical domain may vary based on a number of factors including the incoming workload rate, the type of work being requested, Data ONTAP OS version and more. 

Scheduling behaviors might include:
  • Pinning of heavily used logical domains to physical CPU core to maximize cache efficiency.
  • Shutdown of physical CPU cores at various load points to optimize processing of the run queue. 

These optimizations might appear to result in an uneven balancing of workload across physical CPU cores. This behavior is by design and is optimized for each specific Data ONTAP release and platform combination.

CPU as a metric of system performance:

As mentioned earlier, CPU is just one of the physical resource types available to Data ONTAP. When analyzing system performance, it is crucial to look at the system holistically. A general strategy for analyzing the bottlenecks is to use both service metrics (protocol/volume/lun latency/workload) and component metrics (CPU, Disk IO, Network IO) to provide a complete view of the system and reduce the chance of coming to an incorrect conclusion.

Looking specifically at the CPU resource, work is classified into priorities and some types of work are identified as background or non-essential/opportunistic. This means that when background work is using one or more CPU cores, it will effectively yield to higher priority work as the requests arrive. Also, as the system load increases, it is likely that processing optimizations will result in non-linear scaling for the measure of both the physical CPU core utilization and the logical CSMP domain utilization. This is normal in a complex compute system.

The following three CPU bottleneck types are possible because of the CSMP model:
  • Average CPU core utilization: The average measure of CPU core utilization for all cores reaches 100%.
  • Logical domain bottleneck: A logical domain reaches its concurrency limit. For example if a logical domain has a concurrency of 1 CPU core and it reaches 100% utilization.
  • Interactions between logical domains: Some logical domains are mutually exclusive and cannot run concurrently with another correlated logical domain. For example, WAFL_ex represents parallel WAFL processing while Kahuna represents serial WAFL processing. These two logical domains are mutually exclusive, meaning either Kahuna can be active on 1 CPU, or WAFL_ex can be active on 1+ CPUs, but both Kahuna and WAFL_Ex cannot be active at the same time. Depending on the workload, it is possible for Kahuna to limit the amount of work that can be performed by WAFL_ex. It is important to note that this type of bottleneck is a simple variation on the previous condition.

Note: A bottleneck on a physical CPU core is not possible without either reaching a domain bottleneck or average CPU bottleneck. Accordingly, the monitoring of physical CPU utilization as a direct measure is not effective.

Note: Beginning with Data ONTAP 8.2.1, the algorithm for representing CPU utilization ('sysstat') has been changed now and reports the max of avg_processor_busy or the busiest domain that has a concurrency of 1. 

How to measure CPU utilization?

As part of a holistic view of the system, you can use the command line to view CPU utilization in real-time using the following:

Clustered Data ONTAP: 

netapp::> set diag
Warning: These diagnostic commands are for use by NetApp personnel only.
Do you want to continue? {y|n}: y
netapp::*> node run -node netapp-01 sysstat -M 1
ANY1+ ANY2+ ANY3+ ANY4+ ANY5+ ANY6+ ANY7+ ANY8+ ANY9+ ANY10+ ANY11+ ANY12+ ANY13+ ANY14+ ANY15+ ANY16+  AVG 
 100%  100%  100%   99%   98%   96%   94%   91%   86%    81%    76%    70%    64%    57%    48%   37%   81%

CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15 
 78%  76%  77%  83%  82%  83%  82%  82%  82%  82%   83%   84%   83%   82%   83%   82% 

Nwk_Excl Nwk_Lg Nwk_Exmpt Protocol Cluster Storage Raid Raid_Ex Target Kahuna WAFL_Ex(Kahu)
      3%     2%      450%       0%      0%     49%   2%    136%     0%     4%    511%( 94%) 

WAFL_XClean SM_Exempt Cifs Exempt SSAN_Ex Intr Host  Ops/s   CP
         0%        0%   0%   112%      0%  28%   8%  47111   0%

In this example, Average CPU Utilization is 81% across the 16 cores.  The busiest domains are WAFL Exempt at 511%, networking exempt at 450%, RAID exempt at 136%, and exempt at 112%. WAFL was active 98% of the sample interval, with 4% spent in serial processing and 94% in parallel processing.  Because WAFL serial processing is quite low it is likely that more work could be completed by parallelized WAFL, so being 98% active within the sample interval it is not a concern without other contributing performance indicators. Overall CPU resources are getting scarce increasing the liklihood that work will queue for CPU potentially impacting client latency.

Data ONTAP 7-Mode:

netapp> priv set diag
netapp*> sysstat -M 1
ANY1+ ANY2+ ANY3+ ANY4+  AVG  CPU0 CPU1  CPU2  CPU3
93%    80%  36%   15%    56%  38%   32%  82%   72%

Nwk_Excl Nwk_Lg Nwk_Exmpt Protocol Cluster Storage Raid Raid_Ex Target Kahuna
1%         68%     1%       0%        0%      4%     0%  19%      0%    11%

WAFL_Ex(Kahu) WAFL_XClean SM_Exempt Cifs Exempt SSAN_Ex Intr Host Ops/s  CP
80%( 75%)      14%            0%      0%   24%    0%     1%    1%   0    83%

In this example, Average CPU Utilization is 56% and the nwk_legacy domain (max concurrency of 1) is 68%. To analyze for a WAFL bottleneck, Kahuna is 11% and WAFL_Ex is 75%, or 86% in total. As this is < 100%, it is not a bottleneck. However, if it is nearing 100%, it might still not be a concern without other contributing performance indicators.

While CPU (logical and physical) utilization is exposed by Data ONTAP, it is recommended that CPU utilization not be used as a first-order metric for evaluating the overall performance of a system. Instead, it is recommended that the inputs and outputs associated with the requested user work be the first-order metric.

In other words, a focus on the actual latency for work being serviced (Response Time) and the quantity of operations being processed in terms of IO requests or Bytes (Throughput) is recommended. This measure of performance is actually relevant to a given workload and abstracts the complex nature of logical and physical CPU scheduling variations.

Additional Information

N/A

 

CUSTOMER EXCLUSIVE CONTENT

Registered NetApp customers get unlimited access to our dynamic Knowledge Base.

New authoritative content is published and updated each day by our team of experts.

Current Customer or Partner?

Sign In for unlimited access

New to NetApp?

Learn more about our award-winning Support