High Latency on disks with no HDD errors
Applies to
- ONTAP 9
- FAS systems
- Cloud Volumes ONTAP systems with hard disk drives (HDD)
Issue
- Increased latency to end NAS/SAN users.
qos statistics volume latency show
points to seconds of latency in disk, but statistics on the disk histogram shows 8ms or less
cluster1::> qos statistics volume latency show -vserver vs0 -volume vs0_vol0 Workload ID Latency Network Cluster Data Disk Qos Max Qos Min NVRAM Cloud FlexCache SM Sync VA --------------- ------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- -total- - 455.00us 158.00us 0ms 297.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms vs0_vol0-wid1.. 15658 109.00ms 155.00us 0ms 273.00us 108.2ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
- High io_queued is seen in the disk object for only one disk or a few disks, generally only one disk in most cases
- There are no hardware errors for the single disk in the event log nor any other hardware issues on the shelf or stack that could explain i/o queuing on a single drive
- Failing a drive may move the i/o queuing to another disk
Example: The statistics
command disk
object shows high queuing on a disk and statit shows 100% busy disk i/o on disk 0c.23.13
Cluster::> set -privilege diag Warning: These diagnostic commands are for use by NetApp personnel only. Do you want to continue? {y|n}: y Cluster::*> statistics start -object disk -counter io_queued Statistics collection is being started for sample-id: sample_148 Cluster::*> statistics show -filter "io_queued>100" Object: disk Instance: 0d.23.13 Start-time: 12/5/2022 16:48:26 End-time: 12/5/2022 16:51:58 Elapsed-time: 212s Scope: node1 Number of Constituents: 1 (complete_aggregation) Counter Value -------------------------------- -------------------------------- io_queued 818 1 entry was displayed. Cluster::*> node run -node node1 -command statit -bNote: Post 30 seconds observed queuing on 0d.23.13, the hot disk is seen using
statit
Cluster::*> node run -node node1 -command statit -e ... disk ut% xfers ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs /aggr1_node1/plex0/rg2: 0d.23.18 25 69.19 0.01 2.00 2033 31.50 60.67 175 37.69 51.19 169 0.00 .... . 0.00 .... . 0a.21.16 24 69.77 0.01 2.00 6964 32.11 59.56 194 37.66 51.18 90 0.00 .... . 0.00 .... . 0d.22.17 57 231.26 133.56 5.45 2075 26.91 29.41 573 70.79 9.87 641 0.00 .... . 0.00 .... . 0d.23.22 57 230.68 132.96 5.46 1845 26.83 29.56 646 70.90 9.74 604 0.00 .... . 0.00 .... . 0d.23.13 95 295.63 198.16 4.10 5472 26.83 29.76 1371 70.63 9.91 1975 0.01 .... . 0.00 .... . 0d.22.18 57 231.26 133.55 5.38 2080 26.84 29.60 561 70.86 9.73 634 0.00 .... . 0.00 .... . 0a.20.18 57 231.69 133.54 5.42 1846 27.00 29.46 647 71.15 9.78 608 0.00 .... . 0.00 .... . 0a.20.16 57 233.00 134.49 5.48 1879 27.08 30.09 634 71.43 9.83 594 0.00 .... . 0.00 .... . 0d.22.19 57 231.98 134.18 5.41 2099 26.87 29.67 567 70.93 9.87 646 0.00 .... . 0.00 .... . Cluster::*> set admin