Skip to main content
NetApp Knowledge Base

What are the metrics used to analyze system performance of CPU?

Views:
1,096
Visibility:
Public
Votes:
0
Category:
not set
Specialty:
perf
Last Updated:

Applies to

  • ONTAP 9
  • Data ONTAP 8 7-Mode 
  • Data ONTAP 7 and earlier 

Answer

  • CPU is one of the physical resource types available to Data ONTAP.
  • When analyzing system performance, look at the system holistically.
  • A general strategy for analyzing bottlenecks is to use both service metrics (protocol/volume/lun latency/workload) and component metrics (CPU, Disk IO, Network IO)
    • This provides a complete view of the system and reduces incorrect conclusions.
  • Looking specifically at the CPU resource, work is classified into priorities:
    • Some types of work are identified as background or non-essential/opportunistic:
      • This means that when background work is using one or more CPU cores, it will effectively yield to higher priority work as the requests arrive. 
  • Also, as the system load increases, it is likely that processing optimizations will result in non-linear scaling for the measure of both the physical CPU core utilization and the logical CSMP domain utilization. This is normal in a complex compute system.
CPU bottleneck types

The following three CPU bottleneck types are possible because of the CSMP model:

  • Average CPU core utilization: The average measure of CPU core utilization for all cores reaches 100%.
  • Logical domain bottleneck:
    • A logical domain reaches its concurrency limit.
    • For example, if a logical domain has a concurrency of 1 CPU core and it reaches 100% utilization.
  • Interactions between logical domains:
    • Some logical domains are mutually exclusive and cannot run concurrently with another correlated logical domain.
      • For example, WAFL_ex represents parallel WAFL processing while Kahuna represents serial WAFL processing.
      • These two logical domains are mutually exclusive, meaning either Kahuna can be active on 1 CPU, or WAFL_ex can be active on 1+ CPUs, but both Kahuna and WAFL_Ex cannot be active at the same time.
    • Depending on the workload, it is possible for Kahuna to limit the amount of work that can be performed by WAFL_ex.
      • Note: This type of bottleneck is a simple variation on the previous condition.

Note: A bottleneck on a physical CPU core is not possible without either reaching a domain bottleneck or average CPU bottleneck. Accordingly, the monitoring of physical CPU utilization as a direct measure is not effective.

Note: Beginning with Data ONTAP 8.2.1, the algorithm for representing CPU utilization (cpu_busy) has been changed, and it has different algorithms based on the total number of CPU cores. 

  • <= 20 CPU corescpu_busy returns the higher of the two values below:
    • Average CPU utilization of all the CPU cores(avg_processor_busy)

    • CPU utilization of the busiest domain that has a concurrency of 1

  • >= 36 CPU cores, cpu_busy returns the highest of the three values below. For platforms with 36 CPU cores or more, these CPU cores would be separately evenly into two partitions.

    • Average CPU utilization of the first partition (non-WAFL partition)

    • Average CPU utilization of the second partition (WAFL partition)

    • CPU utilization of the busiest domain that has a concurrency of 1

Additional Information

additionalInformation_text

 

Scan to view the article on your device