throbber
On Systems Integration:
`Tuning the Performance of a Commercial TCP Implementation
`
`D. Leon Guerrero
`Network Solutions Inc.
`Herndon, Virginia
`
`Ophir Frieder'
`Dept. of Computer Science
`George Mason University
`Fairfax, Virginia
`
`Abstract
`We desm'be common TCP implementation pitfalls
`and provide novel
`solutions
`to
`solving
`these
`deficiencies. Ihe peformance enhancements described
`herein are implemented in Network Solutions' OPEN-
`Link TCP product. & results demonstrate signifcant
`peformance improvements over the prior release. Ihe
`techniques h c r i b e d
`improve
`overall
`network
`peformance, while in some instances, also reduce CPU
`demands. m e majority of the improvements are
`incorporated into
`the TCP Window Management.
`Although our study is focused specijkally on the TCP
`protocol, the lessons learned are well suited to other
`network protocols.
`
`1.0 Introduction
`One of
`the more commonly deployed network
`protocol is the Transmission Control Protocol (TCP),
`Internet Protocol
`(IP).
`Although
`the protocol
`specification
`for TCP
`is
`publicly
`available,
`incompatibilities between implementations still exist.
`Some of these compatibility problems are immediately
`noticeable, as error messages may be printed or the
`system may become non-functional. However, more
`subtle incompatibilities may also result in poor network
`performance.
`Our experimental study describes common TCP
`implementation pitfalls and provides solutions
`to
`solving these deficiencies. The TCP enhancements
`described herein are implemented in Network Solutions'
`OPEN-LinkTM TCP product. The results demonstrate
`significant performance improvements over the prior
`release. In certain areas, the performance improved by
`600 9%. Section 2.1 provides detailed comparisons
`between OPEN-Link and other vendor implementations.
`The comparisons include our original implementation as
`well as the enhanced high performance version.
`
`*This work was partially funded by a grant from the
`National Science Foundation under Grant Number
`CCR-9 109804
`'OPEN-Link is a trademark of Network Solutions Inc.
`
`The early TCP implementations were designed to
`operate over the ARPAnet[l4]. By its very nature, the
`ARPAnet
`is a
`low-speed, highdelay, wide area
`network. ARPAnet's
`limited network bandwidth
`disguised
`some performance
`pitfalls
`of TCP
`implementations.
`In
`today's demanding high speed
`Local Area Networks (LAN) however, TCP must
`operate
`in environments
`that require connectivity
`between desk-top computer upwards to supercomputers.
`Many TCP implementations solve this problem by
`that may be
`providing configurational parameters
`manually
`adjusted
`to
`fine
`tune
`performance.
`Configurational parameters alone are not adequate; they
`only work well for limited number of instances.
`Instead, adjustments and adaptation to real-time scenario
`requires intelligent, dynamic calculations.
`To be
`effective, these dynamic calculations must be performed
`the expense of
`with minimal latency and without
`significant CPU overhead.
`We describe techniques to improve overall network
`performance, while in some instances, also reduce CPU
`demands. The majority of the improvements are
`incorporated into
`the TCP Window Management.
`Additionally, we describe checksumming algorithms as
`well as header prediction techniques. Although our
`study is focused specifically on the TCP protocol, the
`lessons learned are well suited to other network
`protocols.
`
`2.0 Common Pitfalls
`Measurements examining interpacket latency and
`byte throughput rate are crucial to the evaluation of
`TCP performance. The interpacket delay introduced by
`the protocol layers provide needed information to reveal
`where the bottlenecks occur. To acquire performance
`measurements, we used an EX-50OOTM2 LAN analyzer.
`The EX-5000 provides time stamped packet traces
`necessary
`to analyze acknowledgement delays and
`interpacket latency.
`To isolate the problem areas, we conducted a series
`of tests to measure the throughput rate at various
`
`2EX-5000 is a trademark of Novel1 Inc.
`
`0-8186-269'7492 $03.00 Q 1992IEEE
`
`509
`
`7- ___
`
`7-
`
`~~~~
`
`INTEL EX. 1241.001
`
`

`

`network layers. We examined three distinct layers: the
`ethernet
`layer,
`(includes
`the operating
`system
`overhead), the IP layer (connectionless) and the TCP
`layer (connection oriented with flow control). The tests
`are designed to continuously send datagrams at each
`protocol layer. The datagram size, initially set to 64
`bytes (minimum ethernet frame size) is increased 10
`bytes after each test suite until the datagram size reached
`1512 bytes (maximum ethemet frame size).
`The
`purpose for this exercise is twofold; to determine the
`optimum packet size and to identify the protocol layer
`where bottlenecks may occur. Figure1 and Figure-2
`show the graphs of the packet and byte throughput
`rates, respectively, when varying the packet size.
`
`pkt.Rec v. Pkl size
`- e t h a r n e t
`- l P L a y a r
`
`I 1 5 0
`s 1 0 0
`e 1 0
`C
`O
`
`
`32
`
`6 4
`
`1 0 2 4
`
`1 2 1 0
`
`2 6 1
`5 1 2
`1 2 1
`Packet Byte Size
`Figure- 1 Packet Throughput Rate At Protocol Layers
`
`KBytesW vs Pkl Size
`
`- l P L.Y.1
`
`K
`b
`Y
`t
`e
`S
`I
`S
`e
`C
`
`I
`
`1
`
`5 2
`
`5 4
`
`i o 2 1
`
`1 2 1 0
`
`ass
`lam
`s i 2
`Packet Byte Size
`Figure-2 Kbyte Throughput Rate At Protocol Layers
`
`I
`
`I
`
`By examining the graphs in Figure-1 and Figure-2, it
`appears that the TCP and IP layers introduce significant
`performance degradation. In Figure-1, the TCP and IP
`C U N ~ S remain relatively flat, compared to the Ethernet
`curve which depicts a direct correlation between packet
`size, packet rate, and byte rate. This observation
`implies that regardless of the packet size, our TCP
`implementation requires the same amount of processing
`time. Intuitively, the packet size must affect the packet
`throughput rate since operations such as checksumming,
`buffer moves and CRC calculation must operate on
`larger buffers. Close observation of the Ethernet curve
`in Figure-1 validates this assumption.
`In our
`implementation,
`the ethernet Network
`Interface Module (NIM), TCP, and IP, are three
`separate processes communicating via
`interprocess
`communication (IPC). Suspecting the operating system
`overhead to be the limiting factor, we devised a
`
`510
`
`modification which combined the NIM, TCP, and IP
`into one process. The results of these modifications are
`shown in Figure-3 and Figured.
`In Figure-3 and Figured, the IP curve shows
`improved throughput rates which coincides with the
`Ethernet curve. However, the TCP curve showing a
`performance increase, remains relatively flat compared
`to the Ethemet and IP curves.
`
`$1
`
`0 4
`
`1 1 8
`
`S 1 1
`¶SO
`Packet Byte Size
`Figure-3 Packet Rate after IPC Enhancement
`
`1 0 1 4
`
`1 1 0 0
`
`KElytes/Sec vs Pkt Size
`
`- l P L a y 8 r
`
`'
`
`S
`e
`
`5 0
`
`o
`
`
`
`5 2
`
`1 4
`
`1 2 1
`
`5 1 2
`2 5 5
`Packet Byte Size
`Figure-4 Kbyte Rate after IPC Enhancement
`
`1 0 2 4
`
`1 2 1 0
`
`By examining the EX-5000 traces closely, several
`problems were diagnosed. The first occurred when
`communicating with a UNIX workstation. Independent
`of the datagram sizes, TCP eventually transmits only
`small datagrams (32-128 data bytes) resulting in the
`UNIX acknowledgements behaving erratically.
`As Clark[l] points out, many TCP implementors
`blame excessive processing overhead associated with
`TCP for poor network performance.
`Like many
`protocol
`implementors, we
`followed
`the TCP
`specifications verbatim. Our study confirms the claim
`and concludes that the performance degradation is often
`attributed to the poor protocol implementation instead
`of
`the protocol overhead processing or protocol
`architecture. Pitfalls in many TCP software reside in
`the acknowledgement processing. Some of these pitfalls
`include sending multiple acknowledgement packets that
`can be similarly accomplished by delaying
`the
`acknowledgement, compacting the information, and
`then sending one acknowledgement packet.
`TCP has a subtle trap attributed to its nature of being
`a byte sequence archtecture. A phenomenon known as
`the Silly Window Syndrome (SWS) [2] can result in
`poor performance in many TCP implementations. The
`
`INTEL EX. 1241.002
`
`

`

`SWS phenomenon occurs between two dissimilar flow
`control implementations causing data transmission to
`thrash much like disk I/O on a badly fragmented disk.
`This occurrence results in data being badly fragmented
`and inefficiently transmitted in small packet sizes.
`
`2.1 Performance comparisons
`Comparisons of other vendor products as well as our
`implementation and the e n h a n d high performance
`version are presented in Table-1. Table-2 lists the
`platforms which were used for these benchmarks.
`
`2.5 MByte file transferred in binary mode. Measurements
`are KBytes/Second.
`
`Table- 1 Performance Benchmarks
`I Operating I MIPS I ETHER NET^
`I PIATFORM
`
`I MAIN
`
`I TCP/IP
`
`Million Instructions per Second base on manufacturer
`specification, unless otherwise denoted.
`* Denotes Compute Index using Norton Utilities
`
`Table-2 Platform Specifications
`
`3.0 TCP flow control
`TCP has been well studied both in practice and in the
`literature. For brevity, we forgo a detailed description
`of TCP, referring the reader to the two volume series by
`Comer[2] and RFC 793 and 1122. This study extends
`the study by Clark[l3] by introducing techniques that
`optimize
`frame-fill
`and
`avoid
`unnecessary
`fragmentation. To better understand our efforts, a very
`brief description of TCP is provided.
`transport
`TCP provides a connection oriented
`service. Data delivery is guaranteed in a byte sequenced
`order. TCP implements the positive acknowledgement
`technique using acknowledgement
`and
`sequence
`numbers to synchronize datagram delivery. Similarly,
`
`51 I
`
`the sliding window technique is used to manage the data
`flow control.
`TCP flow control implements the sliding window
`technique. During the connection setup, each endpoint
`advertises its maximum receive window. The receive
`window restricts the number of unacknowledged bytes
`that may be in transit at any given time. The TCP
`sequence numbers and acknowledgement numbers are
`used to manage the byte order and flow control. The
`acknowledgment number indicates the last data byte
`received by the sender of the packet. Similarly, the
`sequence number indicates the sequence in the data
`stream of the first byte in the packet.
`Figure-5
`illustrates an example of data transmission.
`
`Transmlled M a e Stream
`-1
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12 13 14 15
`
`NO
`NO.'
`~ y p e
`ten
`Slate
`Slate
`ESTABLISHD -->
`5
`5
`ESTABLIMIED
`101
`901
`DATA
`ACK
`ESIABLISHO <--
`0
`0 ESTABLISHED
`901
`106
`)hA&kO <--
`ACK
`0
`106
`5 ESTABLISH4
`901
`Figure-5 Acknowledgement Strategy & Flow Control
`
`Figure-5 shows that the constraint which throttles the
`flow of data is the receive window (advertised as 5
`bytes). The sender (TCP client) sends 5 bytes of data
`and fills the entire receive window. The receiver (TCP
`server) then replies with an acknowledgement that it has
`received 5 data bytes along with an updated receive
`window. The TCP receiver's buffer is full as this is
`reflected in the receive window becoming zero.
`To form the acknowledgement number the receiver
`adds the data length to the sequence number. Since the
`data are still in the server's TCP receive buffer, the
`server replies with a zero receive window to indicate
`that its receive window is full and that it is unable to
`receive additional data momentarily.
`At some later time, after the TCP receiver delivers
`the data to the recipient, (an application program such
`as file transfer), the TCP receiver sends an updated
`window to the TCP sender. In this example, data are
`being transmitted only in one direction (client to
`server). Therefore, the server's sequence number,
`(which
`is
`the
`counterpart
`of
`the
`client's
`acknowledgment number), remains unchanged.
`When the TCP receiver finally delivers the data to
`the application, the TCP receive buffer becomes
`available to receive additional data. Upon receiving the
`window update, the sender's window then slides based
`on the acknowledgement number and receive window.
`Figure-6 illustrates the sliding window concept.
`In Figure-6, the sender's send window advances
`from bytes 1-5 to bytes 6-10. For simplicity of our
`discussion, a receive window of 5 bytes is selected.
`
`INTEL EX. 1241.003
`
`

`

`Realistically the receive window must be sufficiently
`large to accommodate multiple packets to allow a
`continuous transmission. For example, if ethernet is
`used, (maximum frame size 1512 bytes), and it takes 4
`packets to provide a continuous transmission stream
`then the receive window would be 6048 bytes.
`
`Transmilled M a Byle Stream:
`5 1 6 7 8 9 1 4 1 1 12
`3
`1
`2
`4
`
`13 14 15
`
`ESTABLISHED <--
`ACK
`ESTABLISHED <--
`901
`111
`ACK
`Figure-6 TCP Sliding Window Concept
`
`We revisit the TCP acknowledgment strategy in
`section 5 and describe approaches to reducing CPU
`overhead by implementing efficient algorithms.
`
`4.0 Round trip time (RTT) calculations
`The design of proper strategies for computing Round
`Trip Time (RTT) calculations is a highly debated topic.
`The basic concept behind RTT is to calculate the
`amount of time it takes a packet to be transmitted and
`acknowledged. While many
`studies have been
`published on RTT, most conclude that an efficient RTT
`approximation is one that provides the calculation
`within the specified amount of time and with an
`accuracy needed for the particular application[8,9].
`Our requirements mandate an RTT computation of at
`least every 16ms and must provide an accuracy within
`30ms. These requirements were derived by carefully
`examining
`the performance of
`the SPARC-lm3
`workstation using an EX-5000 analyzer. The EX-5000
`analyzer provides a timestamped packet trace with an
`accuracy of 500 microseconds. The EX-5000 traces
`show two important facts. First, data packets are
`transmitted at a burst rate, (restricted by the TCP
`receive window size), with an interpacket spacing less
`500 microseconds,
`than
`(EX-5000
`timestamp
`It can be assumed that the interpacket
`limitation).
`spacing is not less than 9.6 microseconds[4]. Second,
`the acknowledgement turnaround occurred within 16ms
`after the end of the packet burst, (hence 16ms compute
`time). Given these known parameters, we then derived
`the accuracy requirements of 30ms, (actually 32ms, but
`system clock limitations restrict us to 30ms), which is
`influenced by two samples. At this high frequency,
`RTT computation must use minimal CPU resources.
`RTT provides the cornerstone upon which our
`acknowledgement strategy and back-off algorithms are
`based. We maintain two variables associated with RTT.
`
`3SPARC is a trademark of Sun Microsystems Inc.
`
`A rough estimate on a per-packet basis is referred to as
`simply RTT. A second variable, Smoothed Round Trip
`Time (SRTT) is maintained using a Decay Smoothing
`Algorithm[l3].
`SRTT computation is necessary to
`compensate for latency abnormalities associated with the
`chaotic network behavior. SRTT provides a safety
`feature by
`restricting
`the
`fluctuation within a
`predetermined range.
`is a modification of
`Our SRTT approximation
`Kam's[9] approximation. It is our experience that using
`integer, instead of floating point, arithmetic, allows
`faster computation and less CPU overhead. Although
`floating point arithmetic yields more accurate results,
`given our limited precision requirement,
`the CPU
`overhead does not justify its use.
`The RTT and SRTT computations used in our
`implementation are shown in Figure-7.
`
`RTT = Ack-Time - Dispatch-Time
`SRTT= (RTT/Delta-W) t (SRn*( Delta-W- 1 )/Delta-W)
`Figure-7 Round Trip Time Computations
`
`RTT is a rough approximation. Ack-Time is the
`timestamp taken when the TCP sender receives the
`acknowledgement, and Dispatch-Time is the timestamp
`when the TCP sender queued the datagram onto a device
`independent interface. Our implementation provides a
`hardware independent interface. RTT includes the
`operating system overhead.
`SRTT is a weighted average of the previous SRTT
`and the latest RTT.
`The weight factor Delta-W
`determines variance for the approximation. "Choosing
`a large Delta-W makes the weighted average immune to
`changes that last for a short time (e.g., single segment
`that encounters a long delay). Choosing small Delta-W
`makes the weighted average respond to changes in delay
`very quickly. "[2, Page-1451
`Our goal in computing SRTT is to restrict the
`fluctuation to within the 25% range. This value was
`selected because the amount of buffer space allocated
`per-connection is set to 4096 bytes, (memory constraints
`to support 32 simultaneous sessions). This yields 3-4
`packets per burst. Assuming that the receiving TCP can
`accommodate the burst, the weight factor Delta-W must
`then be 8. This yields a 12.5 percent variance on a per-
`packet basis.
`It is necessary to acquire at least 2
`samples to interpolate; using this axiom the 12.5%
`computation then yields a 25 R variance.
`All timestamp variables in our TCP implementation
`are kept in milliseconds since midnight. One drawback
`to this technique is that a special case for midnight roll
`over must be taken into account. Since the operating
`system timer service for this function is fast, we opted
`to use this technique.
`
`512
`
`INTEL EX. 1241.004
`
`

`

`Given an estimation of SRTT that will guarantee our
`approximation outlined, we present
`the enhanced
`acknowledgment strategies
`to
`increase the overall
`network performance.
`
`5.0 Silly Window Syndrome (SWS)
`Flow control is an essential requirement for any
`network protocol. Regulating the flow of data ensures
`high
`reliability and efficient data delivery when
`implemented properly. When
`implemented poorly,
`network performance severely degrades!
`SWS occurrence causes poor TCP performance. The
`SWS problem is often attributed to a sender transmitting
`faster than the receiver can process the input data. SWS
`is identified by noticing that the transmission consists of
`mostly small packets when larger packets are supported.
`The transmission often exhibits thrashing much like I/O
`on a badly fragmented disk.
`(data from
`SWS causes
`the actual user data,
`applications such as file transfer), to be transmitted over
`an excessive number of fragmented packets. Because
`each packet requires protocol processing as well as
`operating system overhead, occurrence of SWS results
`in excessive overhead processing.
`To properly describe fragmentation, we must clarify
`our use of three terms: datagram, TCP segment, and
`packet. The term datagram will imply the block of data
`that the application program, e.g., file transfer, requests
`to be transmitted. The termpacket will imply the actual
`ethemet frame transmitted on the media and is recorded
`by the LAN analyzer. The term TCP segment will
`imply the data portion, (excludes TCP, IP, and Ethernet
`headers), of a packet.
`In our context, fragmentation
`does not imply the IP fragmentation as outlined in [ 1 11,
`but instead, describes the relationship
`between a
`datagram and packet. Specifically, datagrams may be
`fragmented over two or more packets.
`traces
`Figures 8-9
`illustrate EX-5000 analyzer
`recorded during a file transfer session. A 2.5 megabyte
`image file was transferred in binary mode between
`OPEN-Link,
`(old
`version
`before
`performance
`enhancements), and a Unix workstation. The EX-5000
`inefficient
`traces are presented
`to describe SWS,
`acknowledgment, and excessive packet fragmentation.
`To describe SWS using the EX-5000 traces in Figures
`8-9, we categorize each packet as: data, ack or
`ack+ winup packets.
`Our premise, as in [l], is that bulk data transfers
`such as file transfer is the area of performance interest.
`Such applications often exhibit data transmitted only in
`one direction. As a result, the acknowledgement
`number and receive window remain unchanged in the
`data packets. The relevant TCP parameters that affect
`
`the data packet are sequence number, data length, and
`send window size.
`The sequence number in each data packet specifies
`the data byte offset relative to the Initial Sequence
`Number (ISN) for the first data byte in the data packet.
`The ISN in our example is 99, therefore, in Figure-8,
`data byte 1 in pkt-1 must be associated with sequence
`number 100. As data are transmitted, the sequence
`number maintains the accumulative data byte offset.
`The sequence number for pkt-2 must then be 1552 as
`This is derived by adding the
`shown in Figure-8.
`sequence number of pkt-1 (100) and the data byte length
`(1452 bytes) of pkt-1.
`The maximum number of data bytes that may be
`transmitted in each packet is restricted by the maximum
`TCP segment size (1452 bytes) or the amount of send
`window bytes available, the lesser of either. The send
`window is derived by subtracting the amount of
`unacknowledged data from TCP receiver's available
`window. For example, in Figure-8, when pkt-1 was
`transmitted,
`there were no unacknowledged data,
`therefore the send window is 4096 bytes. Unlike pkt-1,
`when
`pkt-2 was
`transmitted,
`pkt-1
`remains
`unacknowledged. Therefore, the send window for pkt-2
`(2644 bytes) must reflect the unacknowledged data from
`pkt-1 (1452 bytes).
`I Pocket
`No.
`1
`2
`3
`4
`5
`
`Receive 1
`Data
`Ack
`Packet
`Send
`NO
`ten
`Window
`window
`Type
`NO,'
`---
`---
`4096
`100
`1452
`DATA
`---
`2644
`1552
`---
`1452
`DATA
`---
`1192(full)
`3004
`1192
`- - -
`DATA (lraqmenl)
`0
`//92
`---
`ocL///-Z)
`- - - 3004
`Q
`4096
`- - - 3004
`Kkfw/hUp(l-$
`- - -
`Figure-8 EX-5000 SWS Data Packet Traces
`
`Sea
`
`As data are transmitted the send window decreases
`until window updates are received. Pkt-3 (1 192 bytes)
`in Figure-8 must be fragmented because there are only
`1192 bytes available in the send window. After
`transmitting pkt-3, the TCP sender must refrain from
`sending more data since the receive window has reached
`zero. This is denoted with the notation "@U" in the
`"Send Window" column.
`Transmission of
`the
`remaining 260 bytes must be postponed until the TCP
`receiver returns a window update.
`The acknowledgement and window update packets,
`are denoted as ack and ack+winup, respectively, in
`Figure-8. The relevant TCP parameters that affect the
`ack and ack + winup packets are the acknowledgment
`number and the receive window.
`The acknowledgement number
`in
`the ack-pkts
`specifies the last data byte received. The ack number is
`derived by adding the data packet's sequence number
`and the data byte length. The ack and ack+winup
`packets also reflect
`the current available receive
`
`513
`
`INTEL EX. 1241.005
`
`

`

`window. For example, in Figure-8, pkt-4 is the ack
`packet associated with data contained in pkts 1 and 2.
`The receive window indicated in pkt-4 reflects the
`amount remaining. The receive window (initially 4096
`bytes) is adjusted to reflect the amount of data contained
`in pkts 1 and 2 (4096 - 1452 - 1452 = 1192 bytes). AS
`data are delivered to the application program, an
`ack+winup packet as shown in Figure-8, pkt-5, is
`retumed to indicate that the TCP receiver is ready to
`receive more data.
`The EX-5000 trace shown in Figure-8 shows 3 data
`packets (pkts 1 , 2 and 3) transmitted, while the ack and
`ack+win packets, (pkts 4 and 5, the notation "(1-2)"
`denotes that these acknowledgments are the counterparts
`for data pkts 1 and 2), only acknowledged pkts 1 and 2.
`This Occurrence is typical due to the latency associated
`with network transmission. As we discuss later,
`interpacket latency and inefficient acknowledgements
`are the primary cause for SWS.
`Careful examination of Figure-9 illustrates that the
`TCP
`sender
`fragments
`over
`(pkts
`50%,
`3,6,8,10,11,14,16,19 are datagram fragments), of the
`datagrams transmitted as denoted by the "fragment"
`notation. It is also clear that the frame fill technique is
`under utilized as packets 6,10,11,14,16 and 19 are less
`than 75% filled. Another deficiency notable in Figure-
`9 is the poor acknowledgement implementation. As
`shown in our example, the TCP receiver sends two ack-
`pkts. The first ack-pkt is sent to indicate data reception
`(ack pkt-4) and a second packet (ack+winup pkt-5) to
`indicate
`that
`the TCP receive window has been
`reopened.
`1 Packel
`No.
`1
`2
`3
`4
`5
`6
`
`I
`
`Receive
`Window
`---
`---
`---
`1192
`4W6
`---
`---
`
`Packel
`lvpe
`OAlA
`DATA
`MIA (fraqtnent)
`a-&f/-Z)
`Ekfr;rwp//-Z)
`D A l A f f r m n t )
`
`Sea Act Ma
`No
`No.
`Len
`loo
`---
`1452
`1552
`1452
`---
`3004 ---
`1192
`--- m 0
`--- m 0
`4196
`---
`260
`1452
`
`Send
`Window
`4096
`2644
`1192(l~ll)
`---
`---
`2904
`2644
`
`102M ---
`932(f~ll)
`932
`DATA (frqned)
`16
`0
`~kfmrU//a-//) --- &92
`11
`- - -
`a-yrii-/a/.
`--- 11/96
`0
`18
`- - -
`---
`1192
`520
`DAlA(I1ogment)
`11196
`19
`I 21
`~kfKk&/4-1@ --- 11716 0
`---
`20
`.
`
`.
`--- //.m a
`---
`a-kfKk&19)
`'E" Denotes Send Window has reached zero
`'hhup" denotes acknowledgement with a window update
`
`---
`4#6
`1/92
`---
`4W6
`4W6 I
`
`Figure-9 Trace of SWS & Inefficient Acknowledgement
`
`514
`
`Figure-9 illustrates that fragmentation progressively
`gets worst, (fragmentation starts at 1192 bytes then
`decreases to 932, 520, 260 bytes). A severe case of
`SWS will eventually cause the datagram to be heavily
`fragmented. The following sections will describe our
`solutions used to avoid SWS while at the same time
`utilize the frame fill technique and provide enhanced
`acknowledgement strategies.
`it is
`To achieve a high byte throughput rate,
`necessary to send sufficiently large datagram. Figure-1
`illustrates the relationship between packet size and the
`byte throughput rate at the ethemet layer.
`The SWS behavior in our example has caused the
`data transmission to be badly fragmented, resulting in
`fragmented data packets and excessive overhead. Thus,
`the overall performance is poor.
`Two solutions for SWS avoidance are presented.
`First, the TCP send algorithm is enhanced to take
`advantage of frame fill and back-off mechanisms.
`Second, the TCP receive algorithm is enhanced to use
`an acknowledgment strategy
`that provides SWS
`avoidance using delayed acknowledgements.
`
`5.1 SWS avoidance for sender logic
`The TCP sender dictates the majority of the flow
`Three
`control activity.
`important concepts are
`presented: frame-fill, burst transmission, and back-off
`strategies. Frame-fill is a technique where the sender
`compacts as much data allowable by the hardware
`media. Burst transmission implies sending multiple
`packets each time the hardware media is accessed.
`Back-off strategies are necessary to allow the TCP
`receiver to process data received so that sufficiently
`large send windows are maintained to allow optimum
`frame-fill and burst transmission.
`The back-off strategies are based on send window
`thresholds and dynamically calculated RTTs. We used
`a 23 % window availability test to detect congestion.
`The 23% window test is derived from our buffer
`allocation scheme which is restricted to 4096 bytes per
`connection and a maximum datagram size of 1024
`bytes. The 4096 bytes of TCP buffer space is b&ed on
`the requirement to support 32 simultaneous sessions.
`The maximum datagram size (1024 bytes) was selected
`by our observations that most TCP implementations
`derived from the BSD software advertise a window size
`of 4096 bytes along with a maximum TCP segment size
`of 1024 bytes. Since a large number of TCP software
`ports are based on the BSD
`implementation, we
`optimized our
`implementation based on
`the BSD
`parameters, RFC-813 recommends a 25% window
`availability test which may be adequate for most
`implementations.
`It is our experience that when
`
`INTEL EX. 1241.006
`
`

`

`selecting a window threshold, minimally, the TCP
`buffer allocation, maximum frame size, and amount of
`backlog must be taken into account.
`An effective technique to remedy SWS is to transmit
`only frames that are even multiples of the maximum
`segment size.
`Although
`this solution
`is highly
`recommended, a solution to transmit variable length
`messages must be provided. In Figure-10, we present
`our solution to remedy SWS. Using our techniques, the
`TCP data transmission will exhibit the behavior of a
`packet sequenced protocol, such as X.25, when in
`actuality the TCP protocol is a byte sequence protocol.
`
`window-used=seqnumber - unack-bytes
`max-window=send-window t window-used
`window-test= max-window / WindowJactor
`IF send-window > window-test THEN
`Send-Datagram
`ELSE
`Start Back-off Timer (Based on RTT)
`Wail for ACK or Timer ExDiration
`Figure- 10 Congestion Back-off Logic
`
`Figure-10 shows that the maximum window must be
`computed each time. This is necessary for protection
`against
`implementations
`that perform
`shrinking
`window. Shrinking window is a situation where a TCP
`receiver decreases the available window size and does
`not reopen the usable window to its full offered
`window.
`the TCP send logic
`If congestion is detected,
`schedules a timer. The timer value must be based on
`the SRTT computation outlined earlier. The period
`during the congestion detection and the next data
`transmission is referred to as the back-off period.
`During the back-off period, the following events may
`occur: an ack-pkt with a window update may be
`received, or the transmit timer may expire which causes
`retransmissions of the data, or the back-off timer may
`expire at which time a zero probe[12] should be
`performed. A zero probe is a packet with no data and is
`intended
`to cause the TCP receiver to send an
`acknowledgement and window update.
`If maximum
`retransmissions occurs,
`the connection is obviously
`aborted, however, if the acknowledgments are received,
`then data transmission commences. The results of these
`enhancements are shown in Figure- 1 1.
`Fragmentation requires more processing due to extra
`buffer moves and header processing. Recall Figure-9,
`packets 3,6,8,10,11,14,16,19 are datagram fragments.
`In contrast to Figure-11, fragmatation is reduced 37%,
`packets 3,6,8,10 and 20. More importantly, the
`
`network bandwidth is utilized more efficiently. Note
`from Figure-9 that the datagram fragments of size 520
`bytes and 932 bytes are eliminated in Figure-1 1.
`LAN media, such as ethernet and token ring[5],
`introduce delays due to access contention and token
`rotational delay,
`respectively.
`Synchronization,
`interframe spacing, and token rotational delay are a few
`examples of delays
`that
`factor
`into
`the overall
`performance. Our study recognizes these network
`characteristics. Because of these delays, we emphasize
`the importance of RTT calculations coupled with the
`frame-fill
`technique
`to overcome some of
`these
`problems. Additionally, our window test enhancement
`which consists of three arithmetic statements and an If-
`Then-Else construct achieves the necessary goals to
`optimize the packet size.
`I Packel
`No.
`1
`2
`3
`4
`5
`6
`
`1
`
`
`
`I ;
`
`3.
`
`Packet
`Type
`DATA
`DATA
`DATA (lraqmenl)
`mk(t-C'/
`a-kfM&(l-$
`DATA haamenll
`
`~
`
`I
`
`I
`
`Receive 1
`kk Ma
`Sea
`Send
`No
`No.
`Wndorr
`Window
`Len
`---
`4096
`1452
`100
`- - -
`---
`---
`2644
`1452
`1552
`---
`1192(full)
`3004
`1192
`---
`1/92
`- - -
`--- 3LV4
`0
`496
`U
`---
`--- 3m4
`---
`2904
`4196
`260
`- - -
`---
`26;4
`4456
`14:
`- - -
`---
`---
`1192lull
`5908
`1192
`DATA
`- _ _
`4096
`--- 4196
`a;lfWh 3
`IO
`---
`7100
`---
`DATA fro ment
`1192
`260
`1192
`U
`- - - 71LV
`~ k b - E /
`I 1
`- - -
`--- 7100
`0
`12
`4096
`mkf&n(6-$
`- - -
`---
`--- 7360
`0
`13
`/U96
`a-kf&p(lU)
`---
`---
`4096
`7360
`1452
`DATA
`14
`2644
`---
`8812
`1452
`DATA
`15
`- - -
`---
`1192(lUll)
`10264
`1192
`DATA
`16
`- - -
`--- 10264
`1191
`0
`mkI14-15)
`11
`- - -
`10164 0
`- - -
`/U96
`1 8
`a-kf ih"p[li- 1 9 - - -
`18
`--- iO/ss
`a
`L
`
`4096 4096
`
`- - - - - -
`0
`--- /U456
`
`iXkfilhUp(1fl iXkfwkp(1g
`
`19 19
`
`--- ---
`
`--- ---
`
`2904 2904
`
`DATA (lraqmenl) DATA (lraqmenl)
`
`11456 11456
`
`260 260
`
`20 20
`
`--- ---
`
`U U
`
`--- 11716 --- 11716
`
`mkfil/hp(/g mkfil/hp(/g
`
`21 21
`
`4096 4096
`Figure- 1 1 Trace After SWS Sender Enhancements
`Figure-1 1 Trace AI
`
`5.2 Delayed acknowledgement
`The TCP receive logic consists of two mechanisms:
`congestion detection and delayed acknowledgment. The
`congestion detection scheme for the receive logic is
`identical to the sender logic.
`Simply put, set a
`predefined threshold to signify congestion detection. In
`our implementation we used the 23% rule for both the
`sender and receiver logic. When this threshold is
`reached set a flag to indi

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket