- Data ONTAP 8.X
- ONTAP 9.X
- Throughput - Rate of data transmitted over a communication channel, often interchanged or confused with Bandwidth
- Bandwidth - The maximum possible rate of data that can be transmitted over a communication channel, often interchanged or confused with Throughput
- Latency - The total time since an input or command is issued and the response is received
- seconds (sec)
- milliseconds (ms)
- microseconds (us)
- Utilization - A measurement of the amount of time in a sample period that a given resource was utilized; utilization is a useful metric of performance, but for Data ONTAP should not be the primary metric
- Bottleneck - The point of congestion in a computing system that impacts performance, there might be more than one bottleneck in an environment
- NetApp Technical Support looks to address the bottleneck contributing the most to overall latency first
- Concurrency - Measurement of the parallelism of workload in a computing system
- The more parallelism there is in a workload, the more simultaneous operations are “in flight” at any point in time
- This allows the system to be more efficient in processing work, and complete more operations in less time even with the same latency per op as a low concurrency workload
- Little’s Law shows the relationship between throughput, latency and concurrency in a steady state. Though it looks intuitively easy, it’s quite a remarkable result:
Throughput = Concurrency / Latency
- Latency is controlled by Data ONTAP
- Concurrency is controlled by the clients/applications
- In order to achieve the best throughput, it should be considered to lower the latency and/or increase the concurrency
Assume a request that takes 1 milliseconds (ms) to complete. An application using one thread with one outstanding read or write operation should achieve 1000 IOPS (1 second or 1000 ms / 1 ms per request). Theoretically, if the thread count is doubled, then the application should be able to achieve 2000 IOPS. If the outstanding asynchronous read or write operations for each thread are doubled, then the application should be able to achieve 4000 IOPS. In practice, request rates do not always scale so linearly, due to overhead in the client from task scheduling, context switching, and so forth.
Note: This is an example showing how to optimize the throughput by increasing the concurrency from the client side, assuming that 1ms latency is already good enough and there is no room for further improvement from a latency perspective.
- Randomness - Refers to a workload that is performed in an unpredicted sequence, with no order or pattern
- Sequentiality - Refers to a workload that is performed in a predetermined, ordered sequence. Many patterns can be detected: forward, backward, skip counts, etc.