High CPU due to user workload causing various issues
Applies to
ONTAP 9
Issue
- CPU utilization is near 100%.
- High write\read latency shown on volume from CPU D-blade.
- EMS log reports
wafl.cp.toolong
error event. - Application /jobs are inconsistent or take longer than usual.
- An Active IQ Unified Manager alert can also be seen sometimes:
High CPU utilization Error: cluster1:kernel:node1 on cluster1 is reporting high CPU utilization of 91.1024 %, placing the node into warn state
- Workload cannot be reduced.
Example: Node 1 has a high CPU due to user workload, but other nodes of the cluster are idle/barely utilized as seen in the node shell sysstat -x 1
command.
Note: Columns removed to improve readability
Cluster::> node run node1 sysstat -x 1 CPU NFS CIFS HTTP Total Net kB/s Disk kB/s in out read write 97% 22453 0 0 22463 1491948 8098 664188 2631848 91% 22448 0 0 22478 1492337 8121 607184 658216 94% 22478 0 0 22509 1492134 8106 78844 101992 96% 22453 0 0 23134 1492587 8108 810668 2736420 Cluster::> qos statistics volume latency show Workload ID Latency Network Cluster Data Disk QoS NVRAM --------------- ------ ---------- ---------- ---------- ---------- --------- --------- --------- -total- - 136.49ms 99.00us 70.00us 136.17ms 153.00us 0ms 0ms vserver1_vol1.. 4201 206.05ms 130.00us 0ms 205.88ms 44.00us 0ms 0ms