Why is a workload's latency high when the IOPS are low?

ONTAP will respond to requests as they come in, and a workload that has few requests will appear to be higher but be responding perfectly fine
Low IOP workloads (ie., 5 IOPs and 32kB/s) will:
- Not be in RAM cache, so will need to go to disk more
- Not have a high sample size, so they are mathematically considered statistically irrelevant (more in Additional Information)
- Not have enough samples to average out any outliers
To put this another way: low IOP workloads are not a problem in the absence of other symptoms (errors, application not responding, network issues, etc.)
Low IOPS are typically below 500-600 IOPS but can vary, reported latency can reach the seconds, or tens of seconds range due to the latency averaging skew
Increasing the workload on the volume with low IOPS can further help determine if latency skew is the reason the latency shows an inflated number

Definitions:
- mean: average, or the sum of all instance values divided by number of instances
- median: the instance value in the middle when values are ordered from smallest to largest
- mode: the instance value occurring most often
In the statistics branch of math, you need to use mean, median, and mode to help calculate that

Example 1: Latency observed across 3 instances in a period (say 3 ops in a minute): 1 ms, 100 ms, 1 ms

mean: (1+100+1)/3=34 ms
median: 1 ms
mode: 1 ms
ONTAP will often give average latency, but in this case, the median and mode show that latency is actually really good

Example 2: Latency observed across 20 instances (7 ops/second): 1ms, 1ms, 1ms, 1ms, 100ms, 1ms, 1ms...1ms (19 @ 1 ms, 1@100 ms)

mean: (19+100) /20=5.95ms
median: 1 ms
mode: 1 ms
In this case, average latency is more accurate than the prior example because we have enough data to have better confidence in the numbers