Bully write workload causing the performance issue in MetroCluster environment
Applies to
- ONTAP 9
- MCC Environment
Issue
- All the volumes on the affected node report slowness.
- The latency in QOS stats is seen from both Data , and NVLOG.
cluster::> qos statistics volume latency show
Workload ID Latency Network Cluster Data Disk QoS NVRAM Cloud
------------ ------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
volume_1 17882 10ms 0.01ms 0ms 4.5ms 1ms 0ms 3.4ms 0ms
volume_2 5232 12ms 0.02ms 0ms 5.05ms 1ms 0ms 5.90ms 0ms
volume_3 17160 14ms 0.05ms 0ms 4.25ms 1ms 0ms 8.75ms 0ms
- The CPU utilization has increased (reaching near to 100%) with an increase in workload on a single volume.
- CPU utilization and top talkers can be observed from sysstat, and Qos statistics commands.
Cluster::> node run node1 sysstat -x 1
CPU NFS CIFS HTTP Total Net kB/s Disk kB/s
in out read write
79% 22453 0 0 22463 491948 8098 64188 631848
92% 122448 0 0 122448 1492337 8121 07184 1158216 <<<
95% 122578 0 0 122578 1492134 8106 78844 1501992
100% 123453 0 0 123453 1492587 8108 10668 1736420
Cluster1::> qos statistics workload resource cpu show -node node1 -iterations 100 -rows 3
Workload ID CPU
--------------- ----- -----
vs0-wid-102 102 60%
file-bigvmdk-.. 121 2%
vs2_vol0-wid-.. 212 2%
vs0-wid-101 102 5%
file-bigvmdk-.. 121 2%
vs2_vol0-wid-.. 212 1%
- The majority of workload on the bully volume are other and write operations, which need to log to remote NVRAM on the mirrored cluster over ISL link.
- Increased ISL utilization leads to buffer credit deprivation.