High Read or Write Latency due to disk bottleneck from user workload
Applies to
- ONTAP 9
- FAS systems with HDD disks
Issue
- High latency for READ and WRITE operations is seen in volumes located on SATA or SAS disk aggregate which are highly utilized (100%)
- Slow backup is observed due to high disk utilization
- Slow retrieval of data from capacity tier to performance tier can be faced
- Slow copies are seen from users on one cluster compared to another using the
cp
command from Linux - Use
statit
to identify if high disk utilization or latency in the aggregate containing the volumedisk ut% xfers ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs /data_aggr1/plex0/rg0: 0a.11.1 10 23.42 0.00 .... . 11.21 61.56 133 12.21 57.15 50 0.00 .... . 0.00 .... . 0a.10.1 11 23.80 0.00 .... . 11.66 59.25 147 12.14 57.37 55 0.00 .... . 0.00 .... . 1a.12.1 100 158.69 120.29 5.43 2299 10.26 34.39 553 28.15 9.93 839 0.00 .... . 0.00 .... . 0a.11.2 100 156.09 118.85 5.39 2253 9.96 34.97 553 27.28 10.05 794 0.00 .... . 0.00 .... . 0a.10.2 100 158.41 121.44 5.35 2337 9.95 35.11 542 27.01 9.93 835 0.00 .... . 0.00 .... . 0a.12.2 100 153.05 115.90 5.59 2158 9.88 35.00 524 27.27 9.97 803 0.00 .... . 0.00 .... . 0a.11.3 100 162.10 124.59 5.45 2262 10.05 35.10 528 27.45 10.10 809 0.00 .... . 0.00 .... . 0a.10.3 100 158.72 121.88 5.39 2298 9.82 34.82 573 27.02 10.05 838 0.00 .... . 0.00 .... . 1a.12.3 100 157.60 120.63 5.49 2274 9.85 34.90 571 27.11 9.82 875 0.00 .... . 0.00 .... . 0a.11.4 100 157.67 120.74 5.39 2328 9.96 35.29 568 26.97 10.10 857 0.00 .... . 0.00 .... . 0a.10.4 100 156.72 119.29 5.64 2180 9.96 34.65 556 27.47 9.79 825 0.00 .... . 0.00 .... .
- Performance degradation is experienced on the client
- Active IQ Unified Manager (AIQUM) monitoring the cluster raises an alert of higher latency
- Example:
Latency value of XX.X ms/op on volname has triggered a WARNING event based on threshold setting of 20.0 ms/op.
- Example:
- The read latency is 25~40ms
- Disk utilization is high according to sysstat -x (columns removed for readability):
-
CPU ... HDD ... util 10% ... 90% 9% ... 100%
-