Skip to main content
NetApp Knowledge Base

Why does packet loss impact performance?

Views:
3,907
Visibility:
Public
Votes:
2
Category:
ontap-9
Specialty:
nas
Last Updated:

 

Applies to

  • All NetApp products
  • TCP Communication
  • CIFS, NFS, and iSCSI

Answer

  • There are numerous reasons why packet loss can cause performance impact
  • The goal of this article is to describe how packet loss typically causes performance issues, not the reason(s) that loss happens
    • When packet loss is seen, congestion algorithms limit the amount of tcp data on the network to prevent further loss.
    • The limitation set by this algorithm is the congestion window(cwin) and in a lossy network an average congestion window is used to see how much data is sent before packet loss.
    • Badwidth Delay Product (BDP) associates this congestion window, with the round trip time, to give us an average expected throughput.
    • When a packet is lost, the receiver stops responding to new packets until it is retransmitted, causing delays that could last as long as half a second.
  • To see which packets Wireshark has flagged as a retransmission:
    • tcp.analysis.retransmission
  • Some systems will flag a TCP flag called SACK (such as ONTAP), which can be used to identify how many packets are missing at a time.
    • This Wireshark filter will let you see those packets: tcp.options.sack.count > 0

Additional Information

  • How to determine packet loss and the possible reasons it could be occurring

  • Definitions:

    • Bandwidth Delay Product:

      • The product of a data link's capacity (in bits per second) and its round-trip delay time (in seconds)

      • The result, an amount of data measured in bits (or bytes), is equivalent to the maximum amount of data on the network circuit at any given time, i.e., data that has been transmitted but not yet acknowledged

      • The bandwidth-delay product can be estimated by multiplying the ports link speed (in Bits per second) divided by 10, with the round trip time under load across the switch - typically in the order of around 1 millisecond: 40 Gbps / 10 ~= 4 GB/sec * 0.001 sec = 4.2 MB buffer memory

        • The round trip time includes not only the propagation delay of the wires, and the switch latency, but also any buffering within the switch, the host or the storage system while exchanging traffic

        • A switch that switches between different link speeds should provide buffer memory in this range on the participating ports.

    • Receive Window:

      • The throughput of a communication is limited by two windows: the congestion window and the receive window

      • The congestion window tries not to exceed the capacity of the network (congestion control); the receive window tries not to exceed the capacity of the receiver to process data (flow control)

      • The receiver may be overwhelmed by data if for example it is very busy (such as a Web server)

      • Each TCP segment contains the current value of the receive window

      • If, for example, a sender receives an ack which acknowledges byte 4000 and specifies a receive window of 10000 (bytes), the sender will not send packets after byte 14000, even if the congestion window allows it

    • Congestion Window:

      • In TCP, the congestion window is one of the factors that determines the number of bytes that can be sent out at any time

      • The congestion window is maintained by the sender

      • Note that this is not to be confused with the sliding window size which is maintained by the receiver

      • The congestion window is a means of stopping a link between the sender and the receiver from becoming overloaded with too much traffic

      • It is calculated by estimating how much congestion there is on the link.

    • Round Trip Time (RTT):

      • This is the amount of time needed for bytes to be sent by a sender, the receiver to acknowledge the bytes and the sender to receive the acknowledgement

      •  Typically described in milliseconds (ms)

In ONTAP 9.1 and below (including Data ONTAP 8), or 9.5 and above, the netstat command will have a retransmit column.

  • In ONTAP 9.1 and below it is called  Retransmits

  • In ONTAP 9.5 and above, it is called Rexmit

  • It can be useful to check for incrementing retransmits here as well (and may be faster than creating a trace, installing Wireshark, and viewing).

 

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.