What are the metrics used to analyze system performance of CPU?
Applies to
- ONTAP 9
- Data ONTAP 8 7-Mode
- Data ONTAP 7 and earlier
Answer
- CPU is one of the physical resource types available to Data ONTAP.
- When analyzing system performance, look at the system holistically.
- A general strategy for analyzing bottlenecks is to use both service metrics (protocol/volume/lun latency/workload) and component metrics (CPU, Disk IO, Network IO)
- This provides a complete view of the system and reduces incorrect conclusions.
- Looking specifically at the CPU resource, work is classified into priorities:
- Some types of work are identified as background or non-essential/opportunistic:
- This means that when background work is using one or more CPU cores, it will effectively yield to higher priority work as the requests arrive.
- Some types of work are identified as background or non-essential/opportunistic:
- Also, as the system load increases, it is likely that processing optimizations will result in non-linear scaling for the measure of both the physical CPU core utilization and the logical CSMP domain utilization. This is normal in a complex compute system.
CPU bottleneck types
The following three CPU bottleneck types are possible because of the CSMP model:
- Average CPU core utilization: The average measure of CPU core utilization for all cores reaches 100%.
- Logical domain bottleneck:
- A logical domain reaches its concurrency limit.
- For example, if a logical domain has a concurrency of 1 CPU core and it reaches 100% utilization.
- Interactions between logical domains:
- Some logical domains are mutually exclusive and cannot run concurrently with another correlated logical domain.
- For example,
WAFL_ex
represents parallel WAFL processing while Kahuna represents serial WAFL processing. - These two logical domains are mutually exclusive, meaning either Kahuna can be active on 1 CPU, or
WAFL_ex
can be active on 1+ CPUs, but both Kahuna andWAFL_Ex
cannot be active at the same time.
- For example,
- Depending on the workload, it is possible for Kahuna to limit the amount of work that can be performed by
WAFL_ex
.- Note: This type of bottleneck is a simple variation on the previous condition.
- Some logical domains are mutually exclusive and cannot run concurrently with another correlated logical domain.
Note: A bottleneck on a physical CPU core is not possible without either reaching a domain bottleneck or average CPU bottleneck. Accordingly, the monitoring of physical CPU utilization as a direct measure is not effective.
Note: Beginning with Data ONTAP 8.2.1, the algorithm for representing CPU utilization (cpu_busy
) has been changed, and it has different algorithms based on the total number of CPU cores.
- <= 20 CPU cores,
cpu_busy
returns the higher of the two values below:-
Average CPU utilization of all the CPU cores(
avg_processor_busy
) -
CPU utilization of the busiest domain that has a concurrency of 1
-
-
>= 36 CPU cores,
cpu_busy
returns the highest of the three values below. For platforms with 36 CPU cores or more, these CPU cores would be separately evenly into two partitions.-
Average CPU utilization of the first partition (non-WAFL partition)
-
Average CPU utilization of the second partition (WAFL partition)
-
CPU utilization of the busiest domain that has a concurrency of 1
-