Skip to main content
NetApp Knowledge Base

What are the metrics used to analyze system performance of CPU?

Views:
2,154
Visibility:
Public
Votes:
1
Category:
not set
Specialty:
perf
Last Updated:

Applies to

  • ONTAP 9
  • Data ONTAP 8 7-Mode 
  • Data ONTAP 7 and earlier 

Answer

  • CPU is one of the physical resource types available to Data ONTAP.
  • When analyzing system performance, look at the system holistically.
  • A general strategy for analyzing bottlenecks is to use both service metrics (protocol/volume/lun latency/workload) and component metrics (CPU, Disk IO, Network IO)
    • This provides a complete view of the system and reduces incorrect conclusions.
  • Looking specifically at the CPU resource, work is classified into priorities:
    • Some types of work are identified as background or non-essential/opportunistic:
      • This means that when background work is using one or more CPU cores, it will effectively yield to higher priority work as the requests arrive. 
  • Also, as the system load increases, it is likely that processing optimizations will result in non-linear scaling for the measure of both the physical CPU core utilization and the logical CSMP domain utilization. This is normal in a complex compute system.
CPU bottleneck types

The following three CPU bottleneck types are possible because of the CSMP model:

  • Average CPU core utilization: The average measure of CPU core utilization for all cores reaches 100%.
  • Logical domain bottleneck:
    • A logical domain reaches its concurrency limit.
    • For example, if a logical domain has a concurrency of 1 CPU core and it reaches 100% utilization.
  • Interactions between logical domains:
    • Some logical domains are mutually exclusive and cannot run concurrently with another correlated logical domain.
      • For example, WAFL_ex represents parallel WAFL processing while Kahuna represents serial WAFL processing.
      • These two logical domains are mutually exclusive, meaning either Kahuna can be active on 1 CPU, or WAFL_ex can be active on 1+ CPUs, but both Kahuna and WAFL_Ex cannot be active at the same time.
    • Depending on the workload, it is possible for Kahuna to limit the amount of work that can be performed by WAFL_ex.
      • Note: This type of bottleneck is a simple variation on the previous condition.

Note: A bottleneck on a physical CPU core is not possible without either reaching a domain bottleneck or average CPU bottleneck. Accordingly, the monitoring of physical CPU utilization as a direct measure is not effective.

Note: Beginning with Data ONTAP 8.2.1, the algorithm for representing CPU utilization (cpu_busy) has been changed, and it has different algorithms based on the total number of CPU cores. 

  • <= 20 CPU corescpu_busy returns the higher of the two values below:
    • Average CPU utilization of all the CPU cores(avg_processor_busy)

    • CPU utilization of the busiest domain that has a concurrency of 1

  • >= 36 CPU cores, cpu_busy returns the highest of the three values below. For platforms with 36 CPU cores or more, these CPU cores would be separately evenly into two partitions.

    • Average CPU utilization of the first partition (non-WAFL partition)

    • Average CPU utilization of the second partition (WAFL partition)

    • CPU utilization of the busiest domain that has a concurrency of 1

 

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.