Why do network speed mismatches create problems with shallow buffered switches?
Applies to
- All NetApp products
- TCP Communication
- CIFS, NFS, and iSCSI
Answer
- Switches often marketed as "Cut-Through" or "Ultra-low latency" switches have shallow port/ASIC buffers.
- A shallow buffer switch is considered shallow when the buffer is less than Bandwidth Delay Product (BDP).
- When translating speeds from a faster medium to a slower (100Gb/s to 10Gb/s for example, or 10Gb/s to 1Gb/s), buffering is required but insufficient to handle these link speed translations.
- Packet loss degrades performance due to how TCP operates.
- To resolve this condition:
- Ensure both the sender and receiver have equal speeds, and if bonded in a LACP bond, equal amounts of ports in the bond.
- Also, ensure the network path is not slower than sender or receiver.
- If multiple senders feed into one receiver (multiple clients to a single storage system for example), more bond links may be needed between connecting switches.
- Open a ticket with the network vendor if more assistance is needed to resolve the issue.
Additional Information
-
Bandwidth Delay Product
- The product of a data link's capacity (in bits per second) and its round-trip delay time (in seconds). The result, an amount of data measured in bits (or bytes), is equivalent to the maximum amount of data on the network circuit at any given time, i.e., data that has been transmitted but not yet acknowledged.
- The bandwidth-delay product can be estimated by multiplying the ports link speed (in Bits per second) divided by 10, with the round trip time under load across the switch - typically in the order of around 1 millisecond: 40 Gbps / 10 ~= 4 GB/sec * 0.001 sec = 4.2 MB buffer memory. The round trip time includes not only the propagation delay of the wires, and the switch latency, but also any buffering within the switch, the host or the storage system while exchanging traffic. A switch that switches between different link speeds should provide buffer memory in this range on the participating ports.
- In Cisco switches, this will be designed on the
show interface
command asinput discards
oroutput discards
. For example:Ethernet1/15 is up
<-------465k discards in 48 minutes, 16 seconds, or ~161 per second average
Dedicated Interface
Hardware: 1000/10000 Ethernet, address: (ommitted)
Description: Cluster Node 15
MTU 1500 bytes, BW 10000000 Kbit,, BW 10000000 Kbit, DLY 10 usec
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, medium is broadcast
Port mode is access
full-duplex, 10 Gb/s, media type is 10G
Beacon is turned off
Input flow-control is off, output flow-control is off
Rate mode is dedicated
Switchport monitor is off
EtherType is 0x8100
Last link flapped 1week(s) 4day(s)
Last clearing of "show interface" counters 00:48:16
42 interface resets
30 seconds input rate 1028568 bits/sec, 504 packets/sec
30 seconds output rate 6245824 bits/sec, 856 packets/sec
Load-Interval #2: 5 minute (300 seconds)
input rate 919.41 Kbps, 417 pps; output rate 5.89 Mbps, 742 pps
RX
137789441038 unicast packets 1137881 multicast packets 168522 broadcast packets
137790747441 input packets 398042347738746 bytes
44682377059 jumbo packets 0 storm suppression bytes
0 runts 0 giants 0 CRC 0 no buffer
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 465262 input discard
0 Rx pause
TX
181286566439 unicast packets 59885021 multicast packets 3752105 broadcast packets
181350203565 output packets 534820871246236 bytes
54004919525 jumbo packets
0 output error 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 0 output discard
0 Tx pause