`
`IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 5, NO. 3, JUNE 1997
`
`The Performance of TCP/IP for Networks with
`High Bandwidth-Delay Products and Random Loss
`
`T. V. Lakshman, Member, IEEE, and Upamanyu Madhow, Senior Member, IEEE
`
`Abstract— This paper examines the performance of TCP/IP,
`the Internet data transport protocol, over wide-area networks
`(WANs) in which data traffic could coexist with real-time traffic
`such as voice and video. Specifically, we attempt to develop a basic
`understanding, using analysis and simulation, of the properties
`of TCP/IP in a regime where: 1) the bandwidth-delay product of
`the network is high compared to the buffering in the network
`and 2) packets may incur random loss (e.g., due to transient
`congestion caused by fluctuations in real-time traffic, or wireless
`links in the path of the connection). The following key results
`are obtained. First, random loss leads to significant throughput
`deterioration when the product of the loss probability and the
`square of the bandwidth-delay product is larger than one. Second,
`for multiple connections sharing a bottleneck link, TCP is grossly
`unfair toward connections with higher round-trip delays. This
`means that a simple first in first out (FIFO) queueing discipline
`might not suffice for data traffic in WANs. Finally, while the
`recent Reno version of TCP produces less bursty traffic than
`the original Tahoe version, it is less robust than the latter when
`successive losses are closely spaced. We conclude by indicating
`modifications that may be required both at the transport and
`network layers to provide good end-to-end performance over
`high-speed WANs.
`
`Index Terms—Flow control, congestion control, error recovery,
`Internet, TCP/IP, transport protocols.
`
`I. INTRODUCTION
`
`but also for determining how TCP needs to be modified in
`the longer term.
`We study two versions of TCP: one is the popular Tahoe
`version developed by Jacobson in 1988 [11] (henceforth called
`TCP-tahoe); the other is the Reno version, which includes the
`fast retransmit option together with a method for reducing the
`incidence of slow start, suggested by Jacobson in 1990 [12]
`(we will refer to this as TCP-reno). We attempt to develop a
`basic understanding of these schemes by considering one-way
`traffic over a single bottleneck link with FIFO transmission.
`For LANs, the round-trip delay of the connection is small, so
`that the bandwidth-delay product could be much smaller than
`the buffering on the bottleneck link. We are more interested,
`however, in WANs with large round-trip delays, so that the
`buffering on the bottleneck link is typically of the same order
`of magnitude as, or smaller than, the bandwidth-delay product
`(this is what we mean by high bandwidth-delay products
`throughout this paper). The bottleneck link may be shared by
`several TCP connections. In addition, we also assume that each
`packets may be lost randomly even after obtaining service at
`the bottleneck link.
`Random loss is a simple model for a scenario of particular
`interest in the context of networks with multimedia traffic,
`where transient fluctuations in real time traffic may cause
`irregularly spaced losses for data traffic. This would occur,
`for instance, for both the UBR and ABR service classes [1]
`in ATM networks. The only difference is that for ATM ABR,
`each connection would have a time-varying available rate de-
`termined by feedback from the switches, so that most random
`losses would occur at the interface of the source to the network,
`since that is where the available rate would be enforced. In
`addition to serving as a model for transient congestion, we
`note that random loss on the Internet has been reported [3],
`where it is conjectured to occur due to a variety of reasons,
`including intermittent faults in hardware elements such as
`Ethernet/FDDI adapters, and incorrect handling of arriving
`packets by routers. Finally, with the anticipated emergence
`of mobile computing over heterogeneous networks with both
`wireless and wireline links, losses and time variations due
`to wireless links in the path of the connection can also be
`accommodated via a random loss model. Since our purpose
`is to obtain a fundamental understanding of TCP, none of the
`preceding situations are explicitly considered in this paper.
`However, as discussed in Section VI, the results here should
`provide a basis for further work on developing network level
`design guidelines for supporting TCP.
`One of the simplifications of the model used for our analysis
`is that two-way traffic (and the accompanying ack compression
`1063–6692/97$10.00 ª
`
`MOST existing data transfer protocols have been de-
`
`signed for local-area network (LAN) applications in
`which buffer sizes far exceed the bandwidth-delay product.1
`This assumption may not hold for the wide-area networks
`(WANs) formed by the interconnection of LANs using high-
`speed backbone networks. In addition, in the Internet of the
`future, data traffic will share the network with voice and video
`traffic. In this paper, we examine the impact of these changes
`on the performance of the most popular data transfer protocol
`in current use, TCP/IP. This is essential not only for network
`provisioning in the short term (since the rapid growth of Web
`applications has caused TCP traffic to grow correspondingly)
`
`Manuscript received June 20, 1995 revised February 25, 1997; approved by
`IEEE/ACM TRANSACTIONS ON NETWORKING Editor D. Mitra. This work was
`supported in part by the U.S. Army Research Office under Grant DAAH04-
`95-1-0246.
`T. V. Lakshman is with the High Speed Networks Research Dept.,
`Bell Laboratories, Lucent Technologies, Holmdel, NJ 07733 USA (e-mail:
`lakshman@research.bell-labs.com).
`U. Madhow is with the ECE Department and the Coordinated Science
`Laboratory, University of
`Illinois, Urbana,
`IL 61801 USA (e-mail:
`madhow@uiuc.edu).
`Publisher Item Identifier S 1063-6692(97)04489-0.
`1 The bandwidth-delay product is loosely defined to be the product of the
`round-trip delay for a data connection and the capacity of the bottleneck link
`in its path.
`
`1997 IEEE
`
`CAVIUM-1075
`Cavium, Inc. v. Alacritech, Inc.
`Page 001
`
`
`
`LAKSHMAN AND MADHOW: THE PERFORMANCE OF TCP/IP FOR NETWORKS WITH HIGH BANDWIDTH-DELAY PRODUCTS AND RANDOM LOSS
`
`337
`
`[27]) is not considered. Feedback systems are notoriously
`difficult to analyze, so that even our simple model is not
`amenable to exact analysis. However, not only does our
`approximate analysis match simulation results for the idealized
`system model, but it also provides a close match to results for a
`detailed simulation that includes two-way traffic for multiple
`TCP-Reno connections over an ATM network (described in
`Section V).
`We obtain the following key results. Discussion of the
`implications of these results for system design is postponed
`to Section VI.
`1) While TCP-reno produces less bursty traffic than TCP-
`tahoe, it is much less robust toward “phase effects.”
`The latter term refers to unpredictability in performance
`resulting from very small differences in the relative tim-
`ings of packet arrivals for different connections sharing
`a link.
`2) Both versions of TCP appear to have significant draw-
`backs as a means of providing data services over mul-
`timedia networks, because random loss resulting from
`fluctuations in real-time traffic can lead to significant
`throughput deterioration in the high bandwidth-delay
`product regime. Roughly speaking, the performance is
`degraded when the product of the loss probability and
`the square of the bandwidth-delay product is large (e.g.,
`ten or more).
`3) For high bandwidth-delay products, TCP is grossly un-
`fair toward connections with higher propagation delays:
`for multiple connections sharing a bottleneck link, the
`throughput of a connection is inversely proportional to
`(a power of) its propagation delay.
`It is worth clarifying that random loss causes performance
`deterioration in TCP because it does not allow the TCP
`window to reach high enough levels to permit good link
`utilization. On the other hand, when the TCP window is
`already large and is causing congestion, random early drops of
`packets when the link buffer gets too full can actually enhance
`performance and alleviate phase effects [10].
`Early simulation studies of TCP-tahoe include [24], [26],
`[27]. Our model is similar to that used in [24], but the key
`differences between our paper and previous studies are that:
`1) the ratio of bandwidth-delay product to buffer size is much
`higher in our study and 2) the effect of random loss due
`to transient congestion (or other sources) is included. Thus,
`some of the undesirable features of TCP-tahoe which arise
`specifically for networks with high bandwidth-delay products
`(such as excessive buffering requirements and vulnerability to
`random loss) were not noticed in earlier studies. Furthermore,
`in contrast to previous studies, we place more emphasis on
`detailed analytical insight on the effects of various parameters
`on performance.
`The bias of TCP-tahoe against connections with large round-
`trip delays and against connections traversing a large number
`of congested gateways has been noticed in other studies of
`TCP-tahoe [8], [9], [26]. A heuristic analysis in [8] shows
`that, for multiple connections sharing a bottleneck link, the
`throughput of a connection is inversely proportional to its
`
`round-trip time. While we consider a similar system in Section
`V, our analysis is more detailed, taking explicit account of
`the buffer size and the bandwidth-delay product. Oscillatory
`behavior and unfairness toward connections with larger prop-
`agation delays have also been noticed in a previous analytical
`study of feedback-based congestion control [2]. Other analyses
`of flow control schemes include [20], [22], [23], but these
`references do not address the specific concerns raised here in
`any detail.
`The system model is described in Section II. Analytical and
`simulation results for the evolution of a single connection in
`the absence of random loss are given in Section III. Section IV
`considers the effect of random loss. Section V contains results
`for multiple connections with and without random loss. We
`give our conclusions in Section VI.
`
`II. SYSTEM MODEL
`We consider infinite data sources which always have packets
`to send, so that the units of data are maximum sized packets
`(in general, packet sizes in TCP may be variable). We consider
`a single bottleneck link with capacity
`packets per second and
`a FIFO buffer of size
`packets. Any packet arriving when
`the buffer is full is lost (random loss may cause additional
`losses). The number of connections, or sources, sharing the
`link is assumed to be constant. For each connection, all delays
`except for service time and queueing at the bottleneck link
`are lumped into a single “propagation delay,” which includes:
`1) the time between the release of a packet from the source
`and its arrival into the link buffer; 2) the time between the
`transmission of the packet on the bottleneck link and its arrival
`at its destination; and 3) the time between the arrival of the
`packet at the destination and the arrival of the corresponding
`acknowledgment at the source. The propagation delay for a
`packet from the th connection is denoted by
`.
`The
`are taken to be deterministic, which implicitly
`assumes that deterministic propagation and processing delays
`are more significant than random queueing delays at all nodes
`and links other than the bottleneck link. Although such an
`assumption is overly simplistic even for a relatively simple
`system with two-way traffic [27], it suffices for our present
`purpose of arriving at a basic understanding of the interaction
`between different connections sharing a link.
`Each connection is assumed to use a window flow control
`protocol. At time , the window size for connection is denoted
`by
`, and is equal to the maximum allowed number of
`unacknowledged packets (retransmissions are not counted). It
`is assumed that each connection uses its allowable window to
`the fullest extent, i.e., that at time , there are indeed
`unacknowledged packets for connection . The window varies
`dynamically in response to acknowledgment and detection
`of packet loss. Upon receiving a packet, the destination is
`assumed to send an acknowledgment back immediately. These
`acknowledgments are cumulative and indicate the next byte
`expected by the receiver.
`In the original version of TCP-tahoe, packet loss is detected
`by maintaining a timer based on an estimate of the round-
`trip time. When a packet is sent, a timeout value is computed
`using the current round-trip time estimate and the timer is
`
`CAVIUM-1075
`Cavium, Inc. v. Alacritech, Inc.
`Page 002
`
`
`
`338
`
`IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 5, NO. 3, JUNE 1997
`
`started. Expiry of this timer is taken to signal packet loss.
`For each retransmission following a timer expiry, the timer
`value used is twice the previous timer value. Estimates of the
`round-trip time are obtained by measuring the round-trip time
`upon receipt of unambiguous acknowledgment (i.e., ignoring
`acknowledgment for retransmitted segments) and computing a
`weighted average of the old and new estimates. Refer to [15],
`[25] for a detailed description of round-trip time estimation.
`We will refer to a timer based on this estimate as a fine-grained
`timer, in order to distinguish it from the coarse-grained timers
`used in practice, which are typically multiples of 500 ms. In
`order to prevent a needlessly lengthy stoppage of transmission
`upon expiry of a coarse-grained timer, most current versions
`of both TCP-tahoe and TCP-reno incorporate a fast retransmit
`option:
`if the number of duplicate acknowledgments (i.e.,
`multiple acknowledgment with the same “next expected”
`packet number
`) exceeds a threshold, packet
`is assumed
`to be lost. In this paper, we implement fine-grained timers
`in our simulations, in order to study the dynamic evolution
`of TCP (and to highlight possible shortcomings) in the most
`ideal setting. The original version of TCP-tahoe, without fast
`retransmit, is implemented. However, in simulation results not
`reported here, we have checked that coarse-grained timers with
`fast retransmit give virtually identical performance in most
`cases of interest for TCP-tahoe (unless almost all packets in a
`window are lost, fast retransmit detects loss very effectively).
`For TCP-reno, we implement fast retransmit with a fine-
`grained timer in our simulations. Because TCP-reno has a less
`robust congestion control mechanism, we have found in later
`work that the use of a coarse-grained timer does impact its
`performance even with fast retransmit, unlike for TCP-tahoe.
`Since either fine-grained timers or the fast retransmit option
`provide almost perfect loss detection, it is assumed in our
`analysis that packet losses are detected perfectly.
`A simplified description of TCP-tahoe [11] and TCP-reno
`[12] follows.
`
`A. Description of TCP-tahoe
`The algorithm followed by each connection has two param-
`eters, current window size
`and a threshold
`, which are
`updated as follows.
`
`, set
`
`; Slow Start Phase
`. Congestion Avoidance
`
`1) After every acknowledgment of a new
`packet:
`if
`else set
`Phase
`(
`).
`denotes the integer part of
`2) After a packet loss is detected:
`set
`;
`set
`
`.
`
`The algorithm typically evolves as follows (although, as
`described in the next section,
`the evolution is somewhat
`different for relatively small buffer size): when packet loss is
`detected, the window is reduced to one. In the slow start phase
`that follows, the window grows rapidly for every successfully
`acknowledged packet until it reaches half of the window size
`
`at the last packet loss. The algorithm then switches to the
`congestion avoidance phase, probing for extra bandwidth by
`incrementing the window size by one for every window’s
`worth of acknowledged packets. This growth continues until
`another packet loss is detected, at which point another cycle
`begins. We use the term cycle to mean TCP evolution starting
`from the end of one congestion avoidance phase to the end
`of the next. In Section III, it turns out that, for our simple
`model, TCP evolution is periodic if there is no random loss,
`so that successive cycles are identical. In Section IV, on the
`other hand, where we consider random loss, the duration of,
`and window evolution within, different cycles is random.
`
`B. Description of TCP-reno
`After the number of duplicate acknowledgments exceeds a
`threshold (typically three), TCP-reno retransmits the packet.
`However, instead of cutting the window back to one, it only
`reduces it by a factor of two. Further, in order to prevent a burst
`of packets from being transmitted when the retransmission is
`finally acknowledged, it temporarily permits new packets to be
`transmitted with each repeated acknowledgment until the “next
`expected” number in the acknowledgment advances. While
`these subtleties are essential to the working of the algorithm
`(see [12] for details) and are implemented in our simulations,
`the following simplified description is adequate for conveying
`an understanding of the algorithm’s behavior.
`1) After every nonrepeated acknowledgment,
`the algorithm works as before:
`; Slow Start Phase
`if
`, set
`. Congestion Avoidance
`else set
`Phase.
`2) When the duplicate acknowledgment
`exceeds a threshold,
`retransmit “next expected” packet;
`set
`, then set
`window);
`resume congestion avoidance using new window
`once retransmission is acknowledged.
`3) Upon timer expiry, the algorithm goes into
`slow start as before:
`set
`;
`set
`
`(i.e., halve the
`
`.
`
`In this case, after an initial slow start transient, the typical
`cyclical evolution does not involve slow start, since the win-
`dow size is halved upon loss detection. Each cycle begins when
`a loss is detected via duplicate acknowledgment. Assuming
`that loss occurs at window size
`, the window size at the
`beginning of each cycle is
`. The algorithm resumes
`probing for excess bandwidth in congestion avoidance mode
`until the window size reaches
`again, at which point a
`loss occurs and a new cycle with window size
`begins.
`We will show that the throughput attained by this scheme is
`higher than that of TCP-tahoe, especially when the buffer size
`is small compared to the bandwidth-delay product. However,
`this algorithm is almost as vulnerable to random loss.
`For the remainder of this paper, we will use
`as a
`generic notation for the window size at which congestion
`
`CAVIUM-1075
`Cavium, Inc. v. Alacritech, Inc.
`Page 003
`
`
`
`LAKSHMAN AND MADHOW: THE PERFORMANCE OF TCP/IP FOR NETWORKS WITH HIGH BANDWIDTH-DELAY PRODUCTS AND RANDOM LOSS
`
`339
`
`could therefore change
`avoidance ends. The value of
`from cycle to cycle if loss occurs randomly, or could be the
`same for all cycles if loss occurs periodically. It is worth
`relating our notation to that usually used in TCP code (see
`[24], for instance):
`is usually referred to as the congestion
`window
`, and
`is denoted by
`. The actual
`window for flow control purposes is taken to be the minimum
`of
`and
`, where the latter is set by the receiver.
`For the purpose of this paper, the window size is assumed to
`be dictated by the capacity and buffering of the bottleneck link
`(i.e.,
`), so the actual window size equals the
`congestion window. Note that some form of window scaling
`(i.e., increasing the window size in bytes while using the same
`sequence number space, by scaling up the size of the data
`segment referred to by a given number) may be required to
`achieve this for large bandwidth-delay products [14].
`
`III. EVOLUTION WITHOUT RANDOM LOSS
`We consider the evolution of a single connection and derive
`expressions for its long-term throughput. Define the normal-
`ized buffer size
`, where
`denotes
`the propagation delay for each packet of the connection and
`denotes the propagation delay plus the service
`time. Since we are concerned with large bandwidth-delay
`products, we restrict attention to
`in this section. In
`contrast, simulations in earlier work [24] consider
`, for
`which the average throughput is close to the capacity of the
`bottleneck link. For brevity, expressions for the latter case are
`omitted.
`The maximum window size that can be accommodated in
`steady state in the bit pipe is
`
`(1)
`
`In this case, the buffer is always fully occupied and there
`are
`packets in flight. The cyclical evolution of TCP-tahoe
`consists of a slow start phase starting with
`and
`continuing until the window size reaches
`,
`. The
`followed by congestion avoidance until
`next increase in window size leads to buffer overflow, at
`which point the window is reset to one and a new cycle
`starts. We show that
`if the relative buffer size
`is not
`large enough, buffer overflow may occur even in the slow
`start phase, and the cyclical evolution is somewhat different
`from the preceding description. For TCP-reno, if the scheme
`functions as designed, slow start is eliminated from the cyclical
`evolution. In each cycle, the algorithm starts from
`, and
`, does congestion avoidance until
`drops back to
`after a packet loss due to
`buffer overflow is detected via duplicate acknowledgment.
`In each case, if the number of packets successfully trans-
`mitted during a cycle is
`and the duration of a cycle is
`,
`then the periodic evolution implies that the average throughput
`is given by
`. In the following, we describe this
`evolution more carefully, and compute these quantities in
`sufficient detail to produce an excellent match with simulations
`(see Table II).
`
`TABLE I
`EVOLUTION DURING SLOW START PHASE
`
`A. Slow Start Phase
`The slow start phase must be considered in some detail
`to understand the advantage of TCP-reno over TCP-tahoe.
`Starting from
`with slow start threshold
`, the window
`size is increased by one for every acknowledgment in this
`phase, so that two packets are released into the buffer for
`every acknowledgment. Table I shows the evolution of the
`window size and the queue length in this phase. For every
`acknowledgment, we indicate the number of the packet which
`was acknowledged (for convenience, we number the packets
`in increments of one rather than in increments equal to the
`number of bytes per packet).
`The evolution in Table I is best described by considering
`mini-cycles of duration equal to the round-trip time
`, where
`the th mini-cycle refers to the time interval
`(the mini-cycles are separated by lines in the table). The
`acknowledgment for a packet released in mini-cycle
`arrives
`in mini-cycle
`, and increases the window size by one.
`This leads to a doubling of the window in each mini-cycle.
`Further, acknowledgment for consecutive packets served in
`mini-cycle arrive spaced by the service time
`during mini-
`cycle
`, and two packets are released for each arriving
`acknowledgment, leading to a buildup of queue size. The
`preceding evolution assumes implicitly that the normalized
`buffer size
`, so that the window size during the slow
`start phase is smaller than
`and the queue empties out by
`the end of each mini-cycle. Denoting the window size at time
`by
`, we obtain that, during the
`th mini-cycle,
`
`(2)
`
`where we have assumed that
`. Similarly, letting
`denote the queue length at time , the queue build-up
`during the
`th mini-cycle is given by
`
`(3)
`
`The maximum queue length during the
`th mini-cycle
`is therefore
`, which is approximately half the maximum
`during that mini-
`window size
`cycle. For a buffer size
`, we can use (2) and (3) to determine
`the window size at which, the queue length exceeds the buffer
`
`CAVIUM-1075
`Cavium, Inc. v. Alacritech, Inc.
`Page 004
`
`
`
`340
`
`IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 5, NO. 3, JUNE 1997
`
`Fig. 1. Window and buffer evolution for a single connection: Two slow
`starts. Prop. delay = 1 ms; b =0.1.
`
`Fig. 2. Window and buffer evolution for a single connection: one slow start.
`tau1 = 1; tau2 = 3; b = 0:8.
`
`, so
`size as follows. Define the integer
`that
`. From (3), buffer overflow will
`occur in the
`th mini-cycle (the largest queue length in the
`previous cycle is
`, which is smaller than
`), with
`. From (2), the window size
`at which this
`, so that that
`
`happens is
`
`Buffer overflow during a slow start phase with threshold
`thus occurs only if
`
`(4)
`
`(5)
`
`A more explicit condition for buffer overflow can be derived
`as follows. Assuming that the packet loss causing the slow
`start phase occurred when the window size exceeds the value
`, the slow start threshold equals
`, the
`. Since
`condition for buffer overflow (5) is approximately equivalent
`.
`to
`Fig. 1 shows the simulated congestion window and buffer
`occupancy evolution for a single connection using TCP-tahoe
`with
`,
`, and
`(i.e.,
`). The
`congestion window size is shown by the solid line and the
`buffer occupancy by the dotted line. As expected, the window
`grows to
`and the next increase in
`the window causes a packet to be dropped. Detection of this
`loss (upon expiry of the associated timer) causes the window
`to be reduced to one and initiation of the slow start phase. The
`
`figure clearly shows the rapid growth in window size during
`the slow start phase. However, since
`, buffer overflow
`occurs when
`, and is detected by the time the
`window size reaches
`(see the discussion
`later in this section). A second slow start phase is initiated
`at this point with threshold 25. This window size is reached
`without further loss, at which point slower window growth
`due to congestion avoidance commences. This lasts until the
`window exceeds
`, after which a new cycle begins.
`is greater than 1/3 the window evolution for TCP-
`When
`tahoe is different. This is illustrated in Fig. 2, which shows the
`evolution of window sizes and buffer occupancy for
`.
`Here, packet loss is seen to occur when the window is of
`size
`. As before, detection of this loss causes
`the window size to be reduced to one and initiates slow start.
`However, there is no packet loss in the slow start phase, which
`terminates when the window reaches 91. In the congestion
`avoidance phase that follows, the window grows linearly and
`then more slowly (as explained in the next subsection) until the
`window exceeds
`. This results in a packet loss causing
`the cycle to repeat. The absence of the double slow start results
`in much higher throughput, since the initial window size for
`the congestion avoidance phase (which accounts for most of
`the packets transmitted) is higher.
`We now compute the duration and number of packets
`transmitted during the slow start phase(s) in a given cycle
`for TCP-tahoe. Even though many subtleties in timing are
`glossed over, the computations are accurate enough to re-
`
`CAVIUM-1075
`Cavium, Inc. v. Alacritech, Inc.
`Page 005
`
`
`
`LAKSHMAN AND MADHOW: THE PERFORMANCE OF TCP/IP FOR NETWORKS WITH HIGH BANDWIDTH-DELAY PRODUCTS AND RANDOM LOSS
`
`341
`
`sult in excellent agreement with the throughput obtained by
`simulations.
`: There is only one slow start phase per
`Case 1
`cycle, which ends when the window size reaches
`. We use a simplified version of (2),
`to approximate the duration of this phase as
`The number of packets transmitted in this phase is approxi-
`mated by
`(Since the window size grows by one
`for every acknowledgment during slow start, starting from
`an initial value of one, the number of packets transmitted
`successfully during slow start is well approximated by the
`window size at the end of the slow start period.)
`: There are two slow start phases in a
`Case 2
`cycle in this case. Let
`denote the duration of the first
`slow start phase and let
`denote the number of packets
`successfully transmitted during that phase. Let
`,
`denote the analogous quantities for the second slow start phase.
`Computation of average throughput requires the computation
`of these quantities.
`In the first slow start phase (with threshold
`),
`a buffer overflow occurs when the window size reaches
`.
`The duration of this phase is the time taken to reach
`,
`approximated by
`, together with the time taken to
`detect the loss, which is taken to be one round-trip time
`, so
`that
`The number of packets transmitted
`in this phase is, reasoning as before, taken to be
`Since the buffer overflow in the first slow start phase is
`not detected till the mini-cycle after which it happens, the
`window size at which the overflow is detected can be shown,
`by means of a more careful analysis, to be approximately
`. This implies that the threshold
`for the second slow start phase is
`
`Remark: For
`, there are two slow start
`phases, so that the window size at the beginning of congestion
`avoidance is the slow start threshold for the second slow start
`phase, which is given in (6).
`For TCP-reno, we have
`
`(8)
`
`since the window size is halved after losing a packet.
`In contrast to the slow start phase, the congestion avoidance
`phase, by virtue of its slower window growth, is well-modeled
`by a continuous-time approximation for window evolution.
`Such approximations have been used previously in [24] to
`explain simulation results. Let
`denote the rate of
`window growth with time,
`the rate of window growth
`with arriving acknowledgments, and
`the rate at which
`the acknowledgments are arriving. Then, during the congestion
`avoidance phase,
`
`The acknowledgments arrive back at a rate equal
`instantaneous throughput
`, so that
`
`Combining (9) and (10), we obtain that
`
`(9)
`
`to the
`
`(10)
`
`(11)
`
`grows as
`, the window
`Thus, for
`duration of this period of growth is therefore given by
`
`. The
`
`(6)
`
`(12)
`
`The duration and number of packets transmitted during the
`second slow start phase is now obtained as in Case 1 to be
`and
`, respectively.
`The total time spent in slow start and the number of packets
`transmitted during slow start are then given by
`and
`, respectively.
`
`B. Congestion Avoidance Phase
`We can unify the analysis of this phase for TCP-tahoe and
`TCP-reno by assuming that the congestion avoidance phase
`starts from an arbitrary window size
`and terminates when
`the window size reaches
`. In all cases considered in this
`. For TCP-tahoe,
`equals the slow
`section,
`start threshold for the slow start phase immediately preceding
`the congestion avoidance phase, and is given by
`
`since the initial window size was
`,
`(for
`is always less than
`). The number of packets
`transmitted during this time is given by
`
`(13)
`
`When
`
`, we obtain from (11) that
`. This growth period, and the
`cycle, terminates with buffer overflow when the window size
`exceeds
`, and its duration is given by
`
`(14)
`
`(
`
`if
`
`, although this does not occur for
`). Since the bottleneck link is
`being fully utilized during this period, the number of packets
`transmitted is given by
`
`(7)
`
`(15)
`
`CAVIUM-1075
`Cavium, Inc. v. Alacritech, Inc.
`Page 006
`
`
`
`342
`
`IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 5, NO. 3, JUNE 1997
`
`TABLE II
`LINK UTILIZATION AS A FUNCTION OF
`NORMALIZED BUFFER SIZE ( = 100; = 1)
`
`C. Throughput Computation and Numerical Results
`Due to the periodic evolution, the long-run average through-
`puts for both TCP-tahoe and TCP-reno are equal to the average
`throughputs in a cycle, and are given by
`
`TCP-tahoe
`
`TCP-reno
`
`(16)
`
`(17)
`
`where the preceding quantities are as computed in Sections
`III-A and III-B.
`Table II gives the link utilizations as a function of the
`normalized buffer size
`for both TCP-tahoe and TCP-reno.
`The results obtained using (16) and (17) are compared with
`those obtained using simulation for an example with high
`bandwidth-delay product. The match is within 2%. For TCP-
`tahoe, a clear thresholding effect is seen at
`(recall
`that the analysis predicted a thresholding effect around
`). The utilization for TCP-reno is uniformly higher for all
`values of
`, and there is no thresholding effect for small
`.
`The difference in utilization is small for large
`, since the
`congestion avoidance phase for the two schemes is identical,
`and the duration of the slow start phase is small compared to
`the duration of the cycle.
`
`IV. EVOLUTION WITH RANDOM LOSS
`We assume here that any given packet may be lost with
`probability , and that these random losses are independent. As
`in the previous section, we consider a single connection, and
`show that, for both TCP-tahoe and TCP-reno, the throughput
`is strongly dependent on
`, and deteriorates sharply
`compared to the lossless throughput when this quantity be-
`comes large. Roughly speaking, the throughput degradation
`occurs because packet losses relatively early in a cycle result
`in small initial values for the congestion avoidance phase, in
`which the bulk of the packets are transmitted. This results
`in small window sizes (determined by random losses rather
`than congestion), and therefore low link utilizations, during
`the cycle.
`In principle, it is possible to exactly compute the throughput
`based on a Markov chain analysis, and we sketch this method
`in the following. However, insight into when random loss
`causes throughput deterioration is better obtained by means
`of an approximate analysis, which we pursue in more detail.
`In the absence of random loss, the evolution of a cycle
`in TCP-tahoe is completely determined by the slow start
`threshold
`(which is half the window size at the