Why does the volume latency or IOPS not match the aggregate in Active IQ Unified Manager or ONTAP?
Applies to
- OnCommand Unified Manager (OCUM)
- Active IQ Unified Manager (AIQUM)
- ONTAP 9
Answer
- ONTAP decouples frontend IOPS from backend due to optimizations and background workloads
- Backend disk/aggregate IOPS should not be used as a metric for monitoring performance unless Performance Capacity hits 100% on the aggregate in Active IQ Unified Manager or disk latency is seen on user work
How do background workloads impact disk IOPS?
- There are background operations which run on disk but not counted as part of the volume IOPS (frontend) and may elevate disk IOPS/transfers
- These include things such as:
- WAFL scanners
- Deduplication (inline or scheduled)
- Anything snapshot related
- Tiering/FabricPool
- Examples:
- The
tiering
scanner may have 60,000 aggregate IOPS while the busiest volume has 2,000 IOPS - A DR or backup filer with minimal frontend will use all available CPU and disk I/O bandwidth to process SnapMirror/backup workloads as quickly in the absense of frontend work
- The
- These background workloads yield to frontend as workload from clients increases
- These include things such as:
What other factors may affect aggregates having more or less IOPS than all the volumes added up?
- Reads are prefetched through ONTAP's readahead engine
- Readahead reduces latencies as readahead has been optimized for years and is very efficient at predicting accurately what is needed
- By prefetching, the reads are in cache (RAM) as the IOP comes in through the network
- Reads are also cached in RAM, and may be cached using Flash Cache or Flash Pool technology with lower latency
- Writes are cached in RAM until written asynchronously to disk in a consistency point, delivering low latency on writes
- Other IOPS may not require going to disk as metadata structures are also cached in RAM as needed
Additional Information
- Why Aggregate Latency graph in UM, shows constantly higher latency for one aggregate?
- This article also shows that AIQUM does a weighted latency and the value does not match what
statit
may show for latency
- This article also shows that AIQUM does a weighted latency and the value does not match what
- Example: The first volume listed on the left has a latency of 0.569 ms/op, while aggregate average latency is approximately 10 ms