Skip to main content
NetApp Knowledgebase

Why is the reported iSCSI LUN latency higher than the volume latency?

Views:
175
Visibility:
Public
Votes:
1
Category:
fas-systems
Specialty:
iscsi
Last Updated:

Applies to

FAS Systems

Answer

It is often observed that the latency measured for iSCSI (and FCP) is significantly higher than that for the underlying volume, and the operation count at the volume level is higher than that measured on the contained LUNs.

For example: With a client having 256KB reads measuring latency and operation count for the volume and its contained LUN:

fas01*> stats show -r -n 1 -i 5 volume:demo2:read_latency volume:demo2:read_ops lun:/vol/demo2/lun2-BLH0G?BQgK/T:avg_read_latency lun:/vol/demo2/lun2-BLH0G?BQgK/T:read_ops
volume:demo2:read_latency:10258.47us
volume:demo2:read_ops:83/s
lun:/vol/demo2/lun2-BLH0G?BQgK/T:avg_read_latency:45.13ms
lun:/vol/demo2/lun2-BLH0G?BQgK/T:read_ops:20/sHere LUN latency = 45ms while volume latency = 10ms.

There are two main reasons for this:

  1. Volume operations are limited to 64KB.
    • Client operations larger than 64KB, which will be broken up into multiple volume operations, executed serially.
    • Operation count for the volume will be correspondingly higher.
    • Expect blockLatency ~= volumeLatency * ROUNDUP(OperationSize / 64KB)
  2. Data transfers for a single operation might require multiple SCSI PDUs (Protocol Data Units)
    • Each iSCSI session or FCP login will negotiate settings for the session, specifying the amount of data that can be sent in a PDU, the size of subsequent data transfers for the request, the number of concurrent data transfers for a request, and so on.
    • For Example: iSCSI session parameters (edited to include relevant information):

      fas01*> iscsi session show -v
      Session 5
      Session Parameters
      ImmediateData=Yes
      FirstBurstLength=65536
      MaxBurstLength=65536

    • Initiator MaxRecvDataSegmentLength=65536
      Target MaxRecvDataSegmentLength=65536

    • 7-Mode iSCSI will always limit the burst and segment length to 64KB. This might be further reduced by the initiator/client configuration.

    • iSCSI Latency is measured from when the first PDU of the command is fully received, until the time the last PDU of the response is sent to the output queue.

    • If the data path between the client and storage is slow (for example, due to congestion, low bandwidth elements, packet loss, and so on), then this latency will be reflected in the latency of larger (typically > 64KB) operations.

    • LUN Latency ~= VolumeLatency * ROUNDUP(OperationSize / 64KB) + NetworkRoundTripTime * (ROUNDUP(OperationSize / SegmentLength) - 1*)\

In the example below, a Linux iSCSI client is used with network latency artificially added by running the following command:

# tc qdisc add dev eth0 root netem delay 100ms

Initial test with 64KB read operations (host measured latency = 103ms, controller lun latency == volume latency ~ 0.06ms):

fas01*> stats show -r -n 1 -i 5 volume:demo2:read_latency volume:demo2:read_ops lun:/vol/demo2/lun2-BLH0G?BQgK/T:avg_read_latency lun:/vol/demo2/lun2-BLH0G?BQgK/T:read_ops
volume:demo2:read_latency:61.42us
volume:demo2:read_ops:9/s
lun:/vol/demo2/lun2-BLH0G?BQgK/T:avg_read_latency:0.04ms
lun:/vol/demo2/lun2-BLH0G?BQgK/T:read_ops:9/s

Test with 128KB read operations (host measured latency = 205ms, controller LUN latency of 101ms includes 1 network round trip):

fas01*> stats show -r -n 1 -i 5 volume:demo2:read_latency volume:demo2:read_ops lun:/vol/demo2/lun2-BLH0G?BQgK/T:avg_read_latency lun:/vol/demo2/lun2-BLH0G?BQgK/T:read_ops
volume:demo2:read_latency:44.71us
volume:demo2:read_ops:9/s
lun:/vol/demo2/lun2-BLH0G?BQgK/T:avg_read_latency:101.58ms
lun:/vol/demo2/lun2-BLH0G?BQgK/T:read_ops:4/s

Test with 256KB read operations (host measured latency = 408ms, controller LUN latency of 302ms includes 3 network round trips):

fas01*> stats show -r -n 1 -i 5 volume:demo2:read_latency volume:demo2:read_ops lun:/vol/demo2/lun2-BLH0G?BQgK/T:avg_read_latency lun:/vol/demo2/lun2-BLH0G?BQgK/T:read_ops
volume:demo2:read_latency:47.02us
volume:demo2:read_ops:10/s
lun:/vol/demo2/lun2-BLH0G?BQgK/T:avg_read_latency:302.62ms
lun:/vol/demo2/lun2-BLH0G?BQgK/T:read_ops:2/s

Most environments will include a mixture of operation sizes, and the effect of external latency will vary depending on the mixture.

A good test to see if there are external factors impacting latency is to calculate the operation time (latency * ops) at the volume and LUN level. If the LUN operation time is significantly higher than the volume operation time and there are operations > 64KB, then it is likely that external client or network factors are impacting performance.

Another example is 1 MB FCP operation is segmented into 16 volume operations. So the latency for 1MB FCP operations include 16 volume operations.

The above explanation concludes that often iSCSI and FCP operations to the LUN are split into multiple volume operations.

Additional Information

additionalInformation_text