`
`Long Le Jay Aikat Kevin Jeffay F. Donelson Smith
`Department of Computer Science
`University of North Carolina at Chapel Hill
`http://www.cs.unc.edu/Research/dirt
`
`Abstract — Active queue management (AQM) in routers has
`been proposed as a solution to some of the scalability issues asso-
`ciated with TCP’s pure end-to-end approach to congestion control.
`A recent study of AQM demonstrated its effectiveness in reducing
`the response times of web request/response exchanges as well as
`increasing link throughput and reducing loss rates [10]. However,
`use of the ECN (explicit congestion notification) signaling protocol
`was required to outperform drop-tail queuing. Since ECN is not
`currently widely deployed on end-systems, we investigate an alter-
`native to ECN, namely applying AQM differentially to flows based
`on a heuristic classification of the flow’s transmission rate. Our
`approach, called differential congestion notification (DCN), distin-
`guishes between “small” flows and “large” high-bandwidth flows
`and only provides congestion notification to large high-bandwidth
`flows. We compare DCN to other prominent AQM schemes and
`demonstrate that for web and general TCP traffic, DCN outper-
`forms all the other AQM designs, including those previously de-
`signed to differentiate between flows based on their size and rate.
`
`1. Introduction
`Congestion control on the Internet has historically been per-
`formed end-to-end with end-systems assuming the responsi-
`bility for detecting congestion and reacting to it appropri-
`ately. Currently, TCP implementations detect instances of
`packet loss, interpret these events as indicators of conges-
`tion, and reduce the rate at which they are transmitting data
`by reducing the connection’s window size. This congestion
`reaction (combined with a linear probing congestion avoid-
`ance mechanism) successfully eliminated the occurrence of
`congestion collapse events on the Internet and has enabled
`the growth of the Internet to its current size.
`
`However, despite this success, concerns have been raised
`about the future of pure end-to-end approaches to conges-
`tion control [1, 5]. In response to these concerns, router-
`based congestion control schemes known as active queue
`management (AQM) have been developed and proposed for
`deployment on the Internet [1]. With AQM, it is now possi-
`ble for end-systems to receive a signal of incipient conges-
`tion prior to the actual occurrence of congestion. The signal
`can be implicit, realized by a router dropping a packet from
`a connection even though resources exist to enqueue and
`forward the packet, or the signal can be explicit, realized by
`a router setting an explicit congestion notification (ECN) bit
`in the packet’s header and forwarding the packet.
`
`In a previous study of the effects of prominent AQM de-
`signs on web performance, we argued that ECN was re-
`quired in order to realize the promise of AQM [10]. This
`
`was a positive result that showed a tangible benefit to both
`users and service providers to deploying AQM with ECN.
`When compared to drop-tail routers, the deployment of par-
`ticular AQM schemes with ECN (and with ECN support in
`end-system protocol stacks) allowed users to experience
`significantly reduced response
`times for web re-
`quest/response exchanges, and allowed service providers to
`realize higher link utilization and lower loss rates. Without
`ECN, certain AQM schemes could realize modest perform-
`ance improvements over simple drop-tail queue manage-
`ment, but the gains were small compared to those achiev-
`able with ECN.
`
`The positive ECN results, however, beg the question of
`whether or not all AQM inherently requires ECN in order to
`be effective, or if it simply is the case that only existing
`AQM designs require ECN in order to be effective. This is a
`significant issue because ECN deployment requires the par-
`ticipation of both routers and end-systems and hence raises a
`number of issues including the cost and complexity of im-
`plementing and deploying ECN, the incremental deploy-
`ability of ECN, and the (largely unstudied) issue of dealing
`with malicious end-systems that advertise ECN support but
`in fact ignore ECN signals or simply have not been config-
`ured appropriately. Other deployment issues include the fact
`that many firewalls and network address translators inten-
`tionally or unintentionally drop all ECN packets or clear
`ECN bits. In a study of TCP behavior, Padhye and Floyd
`found that less than 10% of the 24,030 web servers tested
`had ECN enabled, of which less than 1% had a compliant
`implementation of ECN [14]. More recent results (August
`2003) showed that only 1.1% of 441 web servers tested had
`correctly deployed ECN [15]. This clearly points to obvious
`difficulties in deploying and properly using ECN on the
`end-systems. Thus, AQM could be significantly more ap-
`pealing if ECN were not required for effective operation.
`
`In this paper, we present an AQM design that signals con-
`gestion based on the size and rate of the flow and does not
`require ECN for good performance. Our approach is to dif-
`ferentially signal congestion to flows (through the dropping
`of packets) based upon a heuristic classification of the
`length and rate of the flow. We classify traffic into “mice,”
`short connections that dominate on many Internet links
`(more than 84% of all flows in some cases [21]), and “ele-
`phants,” long connections that, while relatively rare, account
`for the majority of bytes transferred on most links (more
`than 80% of all bytes [21]). Our AQM design attempts to
`
`In: Proceedings of the 12th IEEE International Conference on Network Protocols (ICNP),
`Berlin, Germany, October 2004, pages 118-128.
`
`Cloudflare - Exhibit 1040, page 1
`
`
`
`2
`
`notify only high-bandwidth “elephants” of congestion while
`allowing “slower” (and typically shorter) connections to
`remain unaware of incipient congestion. The motivation for
`this approach, borne out by our analysis of AQM schemes,
`is that providing early congestion notifications to mice only
`hurts their performance by forcing these short TCP connec-
`tions to simply wait longer to transmit their last few seg-
`ments. These short flows are often too short to have a
`meaningful transmission rate and are so short that slowing
`them down does not significantly reduce congestion. In con-
`trast, providing early congestion notification to elephants
`can lead to abatement of congestion and more efficient use
`of the network. Our form of differential congestion notifica-
`tion, called DCN, significantly improves the performance of
`the vast majority of TCP transfers and provides response
`times, link utilizations, and loss ratios that are better than
`those of existing AQM schemes including those that also
`attempt to differentiate between flows based on their size
`and rate.
`
`The remainder of the paper makes the case for differential
`congestion notification based on classification of flow-rate.
`Section 2 discusses previous related work in AQM schemes
`in general and in differential AQM specifically. Section 3
`presents our DCN scheme. Section 4 explains our experi-
`mental evaluation methodology and Section 5 presents the
`results of a performance study of DCN and several promi-
`nent AQM schemes from the literature. The results are dis-
`cussed in Section 6.
`
`2. Background and Related Work
`Several AQM designs have attempted to achieve fairness
`among flows or to control high-bandwidth flows. Here we
`give a description of the AQM designs most related to ours.
`
`The Flow Random Early Drop (FRED) algorithm protects
`adaptive flows from unresponsive and greedy flows by im-
`plementing per-flow queueing limits [11]. The algorithm
`maintains state for each flow that currently has packets
`queued in the router. Each flow is allowed to have up to
`minq but never more than maxq packets in the queue. Packets
`of flows that have more than minq but less than maxq packets
`in the queue are probabilistically dropped. When the number
`of flows is large and a high-bandwidth flow consumes only
`a small fraction of the link capacity, FRED maintains a large
`queue. However, a large queue results in high delay. Fur-
`thermore, the algorithm becomes less efficient with a large
`queue since the search time for flow state is proportional to
`queue length.
`
`The Stabilized Random Early Drop (SRED) algorithm con-
`trols the queue length around a queue threshold independent
`of the number of active connections [13]. The algorithm
`keeps the header of recent packet arrivals in a “zombie list.”
`When a packet arrives, it is compared with a randomly cho-
`sen packet from the zombie list. If the two packets are of the
`same flow, a “hit” is declared. Otherwise, the packet header
`
`in the zombie list is probabilistically replaced by the header
`of the new packet. The number of active connections is es-
`timated as the reciprocal of the average number of hits in a
`given interval (a large number of active connections results
`in a low probability of hits and vice versa). The drop prob-
`ability of a packet is a function of the instantaneous queue
`length and the estimated number of active connections. Hits
`are also used to identify high-bandwidth flows. Packets of
`high-bandwidth flows are dropped with a higher probability
`than other packets.
`
`The CHOKe algorithm heuristically detects and discrimi-
`nates against high-bandwidth flows without maintaining per
`flow state [17]. The algorithm is based on the assumption
`that a high-bandwidth flow is likely to occupy a large
`amount of buffer space in the router. When a new packet
`arrives, CHOKe picks a random packet in the queue and
`compares that packet’s header with the new packet’s header.
`If both packets belong to the same flow, both are dropped,
`otherwise, the new packet is enqueued. As with FRED,
`CHOKe is not likely to work well on a high-speed link and
`in the presence of a large aggregate of flows.
`
`The Stochastic Fair BLUE (SFB) algorithm detects and rate-
`limits unresponsive flows by using accounting bins that are
`organized hierarchically [4]. The bins are indexed by hash
`keys computed from a packet’s IP addresses and port num-
`bers and used to keep track of queue occupancy statistics of
`packets belonging to the bin. High-bandwidth flows can be
`easily identified because their bins’ occupancy is always
`high. These high-bandwidth flows are then rate-limited.
`SFB works well when the number of high-bandwidth flows
`is small. When the number of high-bandwidth flows in-
`creases, more bins become occupied and low bandwidth
`flows that hash to these bins are incorrectly identified as
`high-bandwidth and penalized.
`
`The Approximate Fairness through Differential Dropping
`(AFD) algorithm approximates fair bandwidth allocation by
`using a history of recent packet arrivals to estimate a flow’s
`transmission rate [16]. AFD uses a control theoretic algo-
`rithm borrowed from PI [7] to estimate the “fair share” of
`bandwidth that a flow is allowed to send. Packets of a flow
`are marked or dropped with a probability that is a function
`of the flow’s estimated sending rate. The algorithm uses a
`shadow buffer to store recent packet headers and uses these
`to estimate a flow’s rate. The estimated rate of a flow is
`proportional to the number of that flow’s headers in the
`shadow buffer. When a packet arrives, its header is copied
`to the shadow buffer with probability 1/s, where s is the
`sampling interval, and another header is removed randomly
`from the shadow buffer. Note that while sampling reduces
`implementation overhead, it also reduces the accuracy in
`estimating flows’ sending rate. This problem can be severe
`when most flows only send a few packets per RTT.
`
`The RED with Preferential Dropping (RED-PD) algorithm
`provides protection for responsive flows by keeping state
`
`Cloudflare - Exhibit 1040, page 2
`
`
`
`Generated response sizes
`
`100
`
`1000
`
`10000 100000 1e+06 1e+07 1e+08 1e+09
`Response size (bytes)
`Figure 1: CDF of generated response sizes.
`
`1
`
`0.9
`
`0.8
`
`0.7
`
`0.6
`
`0.5
`
`0.4
`
`0.3
`
`0.2
`
`0.1
`
`0
`
`10
`
`1
`
`0.9
`
`0.8
`
`0.7
`
`0.6
`
`0.5
`
`0.4
`
`0.3
`
`0.2
`
`0.1
`
`Cumulative probability
`
`Percentage of bytes transferred
`
`3
`
`0
`100
`
`1000
`
`10000
`
`1e+06
`100000
`Response size (bytes)
`Figure 2: CDF of percentage of total bytes transferred as a func-
`tion of response sizes.
`
`1e+07
`
`1e+08
`
`1e+09
`
`mission rate that is adaptable. By the time they receive a
`congestion notification signal they have either already com-
`pleted or have only one segment remaining to be sent. Fur-
`thermore, since short flows have a small congestion win-
`dow, they have to resort to timeouts when experiencing a
`packet loss. Thus giving these flows a congestion signal
`does not significantly reduce congestion and can only hurt
`the flows’ performance by delaying their completion. In
`contrast, high-bandwidth flows carrying large responses are
`capable of reducing their transmission rate and hence can
`have an impact on congestion. Unlike short flows, high-
`bandwidth flows do not have to resort to timeouts and in-
`stead can use TCP mechanisms for fast retransmission and
`fast recovery to recover from their packet losses. Our ap-
`proach will also police high-bandwidth non-TCP or non-
`compliant TCP flows that do not reduce their transmission
`rate when congestion occurs.
`
`Our observation about traffic characteristics is also con-
`firmed by other studies of Internet traffic. For example,
`Zhang et al. found that small flows (100KB or less) ac-
`counted for at least 84% of all flows, but carried less than
`15% of all bytes [21]. They also found that large flows ac-
`counted for a small fraction of the number of flows, but car-
`ried most of the bytes. Moreover, the flows that are “large”
`
`for just the high-bandwidth flows and preferentially drop-
`ping packets of these flows [12]. RED-PD uses the history
`of recent packet drops to identify and monitor high-
`bandwidth flows. The algorithm is based on the assumption
`that high-bandwidth flows also have a high number of
`packet drops in the drop history. Packets of a high-
`bandwidth flow are dropped with a higher probability than
`other packets. After being identified as high-bandwidth, a
`flow is monitored until it does not experience any packet
`drop in a certain time period. The absence of packet drops
`of a high-bandwidth flow in the drop history indicates that
`the flow has likely reduced its sending rate. In this case, the
`flow is deleted from the list of monitored flows.
`
`The RIO-PS scheme (Red with In and Out with Preferential
`treatment to Short flows) gives preferential treatment to
`short flows at bottleneck links [6]. With preferential treat-
`ment, short flows experience a lower drop-rate than long
`flows and can thus avoid timeouts. In RIO-PS, edge routers
`maintain per-flow state for flows entering the network. The
`first few packets of a flow are marked as “short” or “in.”
`Subsequent packets of that flow are marked as “long” or
`“out.” Core routers use the standard RIO algorithm [2] and
`drop long or out packets with a higher probability than short
`or in packets.
`
`Our DCN algorithm, described next, is an amalgam of ex-
`isting AQM mechanisms. Like AFD it uses a control theo-
`retic algorithm for selecting packets to drop and like RED-
`PD it maintains state for only the suspected high-bandwidth
`flows. However, we show empirically that our particular
`choice and construction of mechanisms results in better ap-
`plication and network performance than is possible with
`existing differential and non-differential AQM designs.
`
`3. The DCN Algorithm
`The design of DCN is based on the observation that on
`many networks, a small number of flows produce a large
`percentage of the traffic. For example, for the web traffic we
`have used to evaluate AQM designs, Figure 1 shows the cu-
`mulative distribution function (CDF) of the empirical distri-
`bution of HTTP response sizes [18]. Figure 2 shows a CDF
`of the percentage of total bytes transferred in an hour-long
`experiment as a function of HTTP response size. Together,
`these figures show that while approximately 90% of web
`responses are 10,000 bytes or less, these responses account
`for less than 25% of the bytes transferred during an experi-
`ment. Moreover, responses greater than 1 megabyte make
`up less than 1% of all responses, but account for 25% of the
`total bytes.
`
`These data suggest that providing early congestion notifica-
`tion to flows carrying responses consisting of a few TCP
`segments (e.g., flows of 2,000-3,000 bytes, approximately
`70% of all flows), would have little effect on congestion.
`This is because these flows comprise only 6-8% of the total
`bytes and because these flows are too short to have a trans-
`
`Cloudflare - Exhibit 1040, page 3
`
`
`
`4
`
`and “fast” (i.e., high-bandwidth, greater than 10KB/sec)
`account for less than 10% of all flows, but carrying more
`than 80% of all the bytes.
`
`The Differential Congestion Notification (DCN) scheme is
`based on identifying these “large” and “fast” flows, and
`providing congestion notification to only them. While the
`idea is simple, the challenge is to design an algorithm with
`minimal state requirements to identify the few long-lived,
`high-bandwidth flows from a large aggregate of flows and
`provide them with a congestion signal when appropriate. An
`important dimension of this problem is that of all the flows
`carrying large responses, we most want to signal flows that
`are also transmitting at a high-rate. These are the flows that
`are consuming the most bandwidth and hence will produce
`the greatest effect when they reduce their rate. Additionally,
`we must ensure that flows receiving negative differential
`treatment are not subject to undue starvation.
`
`Our DCN AQM design has two main components: identifi-
`cation of high-bandwidth flows and a decision procedure for
`determining when early congestion notification is in order.
`
`3.1 Identifying High-bandwidth Flows
`Our approach to identifying high-bandwidth, long-lived
`flows is based on the idea that packets of high-bandwidth
`flows are closely paced (i.e., their interarrival times are
`short) [3]. DCN tracks the number of packets that have been
`recently seen from each flow. If this count exceeds a thresh-
`old, the flow is considered to be a “long-lived and high-
`bandwidth” flow. The flow’s rate is then monitored and its
`packets are eligible for dropping. As long as a flow remains
`classified as high-bandwidth, it remains eligible for drop-
`ping. If a flow reduces its transmission rate, it is removed
`from the list of monitored flows and is no longer eligible for
`dropping.
`
`DCN uses two hash tables for classifying flows: HB (“high
`bandwidth”) and SB (“scoreboard”). The HB table tracks
`flows that are considered high-bandwidth and stores each
`flow’s flow ID (IP addressing 5-tuple) and the count of the
`number of forwarded packets. The SB table tracks a fixed
`number of flows not stored in HB. For these flows SB stores
`their flow ID and “recent” forwarded packet count.
`
`When a packet arrives at a DCN router, the HB table is
`checked to see if this packet belongs to a high-bandwidth
`flow. If the packet’s flow is found in HB, then it is handled
`as described below. If the packet’s flow ID is not in HB,
`then the packet is enqueued and its flow is tested to see if it
`should be entered into HB. The SB table is searched for the
`flow’s ID. If the flow ID is not present, it is added to SB.1 If
`
`1 If a flow ID hashes to an entry in SB for another flow, then the
`new flow overwrites the entry for the previous flow. This ensures
`that all SB operations can be performed in constant time. Thus, SB
`stores the packet counts for all currently active non-high-
`bandwidth flows, modulo hash function collisions on flow IDs.
`
`Arriving packet
`
`yes
`
`Is Flow ID
`in HB?
`
`Mark or drop
`probabilistically
`
`yes
`
`Has Tdec elapsed
`since last decrease?
`
`yes
`
`no
`
`Enqueue
`
`Is Flow ID
`in SB?
`
`no
`
`Decrease
`pktcount by pref
`
`no
`
`yes
`
`Last update less
`than clearing
`interval ago?
`
`no
`
`Overwrite the
`existing flow entry
`
`(Enqueue if
`not dropped)
`
`Increment pktcount
`
`Reset pktcount
`
`yes
`
`pktcount ≥ 4?
`
`Copy flow entry
`to HB
`
`Figure 3: High-level DCN flowchart.
`
`the flow ID is present in SB, the flow’s packet count is in-
`cremented.
`
`A flow is classified as long-lived and high-bandwidth if the
`number of packets from the flow arriving within a “clearing
`interval,” exceeds a threshold. Once the flow’s packet count
`in SB has been incremented, if the count exceeds the thresh-
`old, the flow’s entry in SB is added to HB.2 If no packets
`have been received for the flow within a clearing interval,
`the flow’s packet count is reset to 0.
`
`A high-level flow chart of the DCN algorithm is given in
`Figure 3. All operations on the SB table are performed in
`O(1) time. Since the number of flows identified as high-
`bandwidth is small (e.g., ~2000 for traffic generated during
`experiments reported herein), hash collisions in HB are rare
`for a table size of a few thousand entries. Thus, operations
`on the HB table are also usually executed in O(1) time.
`
`3.2 Early Congestion Notification
`Packets from a high-bandwidth flow are dropped with a
`probability 1 – pref / pktcount, where pktcount is the number
`of packets from that flow that have arrived at the router
`within a period of Tdec, and pref is the current “fair share” of a
`flow on a congested link. When congestion is suspected in
`the router we target high-bandwidth flows for dropping in
`proportion to their deviation from their fair share (pref pack-
`ets within an interval Tdec) [16, 19].
`DCN uses a simple control theoretic algorithm based on the
`well-known proportional integral controller to compute pref.
`The instantaneous length of the queue in the router is peri-
`odically sampled with period Tupdate. A flow’s fair share of
`the queue at the kth sampling period is given by:
`
`pref(kTupdate) = pref((k–1)Tupdate) + a (cid:1) (q(kTupdate) – qref) –
` b (cid:1) (q((k–1)Tupdate) – qref)
`
`2 If a collision occurs in HB when trying to insert the new flow,
`then a hash chain is used to locate a free table entry.
`
`Cloudflare - Exhibit 1040, page 4
`
`
`
`5
`
`Network Monitor
`
`ISP 1
`Router
`
`ISP 2
`Router
`
`Ethernet
`Switches
`
`Ethernet
`Switches
`
`...
`
`100
`Mbps
`
`1
`Gbps
`
`ISP 2
`Browsers/Servers
`
`100/1,000
`Mbps
`
`1
`Gbps
`
`100
`Mbps
`
`...
`
`ISP 1
`Browsers/Servers
`
`Network
`Monitor
`
`Figure 4: Experimental network setup.
`
`version of dummynet [8] to configure out-bound packet de-
`lays on machines on the edge of the network. These delays
`emulate different round-trip times on each TCP connection
`(thus giving per-flow delays). Our version of dummynet
`delays all packets from each flow by the same randomly-
`chosen minimum delay. The minimum delay in milliseconds
`assigned to each flow is sampled from a discrete uniform
`distribution on the range [10, 150] with a mean of 80 ms.
`The minimum and maximum values for this distribution
`were chosen to approximate a typical range of Internet
`round-trip times within the continental U.S. and the uniform
`distribution ensures a large variance in the values selected
`over this range.
`
`A TCP window size of 16K bytes was used on the end sys-
`tems because widely used OS platforms, e.g., most versions
`of Windows, typically have default windows of 16K or less.
`
`4.1 Synthetic Generation of TCP Traffic
`Two synthetically generated TCP workloads will be used to
`evaluate DCN. The first is an HTTP workload derived from
`a large-scale analysis of web traffic [18]. Synthetic HTTP
`traffic is generated according to an application-level de-
`scription of how the HTTP 1.0 and 1.1 protocols are used by
`web browsers and servers today. The specific model of
`synthetic web browsing is as described in [10], however,
`here we note that the model is quite detailed as it, for exam-
`ple, includes the use of persistent HTTP connections and
`distinguishes between web objects that are “top-level” (e.g.,
`HTML files) and objects that are embedded (e.g., image
`files).
`
`The second workload is based on a more general model of
`network traffic derived from measurements of the full mix
`of TCP connections present on Internet links. For the ex-
`periments here we emulate the traffic observed on an Inter-
`net 2 backbone link between Cleveland and Indianapolis
`[20]. Thus in addition to generating synthetic HTTP con-
`nections, this model will also generate synthetic FTP,
`SMTP, NNTP, and peer-to-peer connections. Details on this
`model can be found in [20].
`
`For both workloads, end-to-end response times for TCP data
`exchanges will be our primary measure of performance.
`Response time is defined as the time interval necessary to
`complete the exchange of application-level data units be-
`
`where a and b, a < b , are control coefficients (constants)
`that depend on the average number of flows and the average
`RTT of flows (see [7] for a discussion), q() is the length of
`the queue at a given time, and qref is a target queue length
`value for the controller. Since a < b, pref decreases when the
`queue length is larger than qref (an indication of congestion)
`and hence packets from high-bandwidth flows are dropped
`with a high probability. When congestion abates and the
`queue length drops below qref, pref increases and the prob-
`ability of dropping becomes low. Pan et al. and Misra et al.
`use the same equation in the design of AFD and PI respec-
`tively [16, 7].
`
`The flow ID of a high-bandwidth flow is kept in the HB
`table as long as the flow’s counter pktcount is positive. After
`each interval Tdec, the counter pktcount is decreased by pref.
`If a high-bandwidth flow’s packet count becomes negative,
`the flow is deleted from HB. We set Tdec to 800 ms in our
`experiments because the maximum RTT in our network can
`be up to 400 ms. Furthermore, we want to avoid situations
`where a new high-bandwidth flow is detected at the end of
`an interval Tdec and immediately removed from HB. We
`experimented with different parameter settings for a, b, Tup-
`date, and Tdec and here report only the results for our empiri-
`cally determined best parameter settings.
`
`4. Experimental Methodology
`To evaluate DCN we ran experiments in the testbed network
`described in [10]. The network, illustrated in Figure 4,
`emulates a peering link between two Internet service pro-
`vider (ISP) networks. The testbed consists of approximately
`50 Intel processor based machines running FreeBSD 4.5.
`Machines at the edge of the network execute one of a num-
`ber of synthetic traffic generation programs described be-
`low. These machines have 100 Mbps Ethernet interfaces and
`are attached to switched VLANs with both 100 Mbps and 1
`Gbps ports on 10/100/1000 Ethernet switches. At the core of
`this network are two router machines running the ALTQ
`extensions to FreeBSD [9]. ALTQ extends IP-output queu-
`ing at the network interfaces to include alternative queue-
`management disciplines. We used the ALTQ infrastructure
`to implement DCN, AFD, RIO-PS, and PI.
`
`Each router has sufficient network interfaces to create either
`a point-to-point 100 Mbps or 1 Gbps Ethernet network be-
`tween the two routers. The Gigabit Ethernet network is used
`to conduct calibration experiments to benchmark the traffic
`generators on an unloaded network. To evaluate DCN and
`compare its performance to other AQM schemes, we create
`a bottleneck between the routers by altering the (static)
`routes between the routers so that all traffic flowing in each
`direction uses a separate 100 Mbps Ethernet segment. This
`setup allows us to emulate the full-duplex behavior of a
`typical wide-area network link.
`
`So that we can emulate flows that traverse a longer network
`path than the one in our testbed, we use a locally-modified
`
`Cloudflare - Exhibit 1040, page 5
`
`
`
`6
`
`tween two endpoints. For example, for the HTTP workload,
`response time is defined as the time between when an
`(emulated) web browser makes a request for content to an
`(emulated) web server and the time when the last byte of the
`response is delivered from the server to the browser.
`
`4.2 Experimental Procedures
`To evaluate DCN we performed experiments on the two
`emulated ISP networks in our testbed connected with 100
`Mbps links. For the evaluation we congest these 100 Mbps
`links with varying degrees of traffic. To quantify the traffic
`load in each experiment we define offered load as the net-
`work traffic (in bits/second) resulting from emulating the
`behavior of a fixed-size population of users (e.g., a popula-
`tion of users browsing the web in the case of the HTTP
`workload). More specifically, load is expressed as the long-
`term average throughput on an uncongested link that would
`be generated by that user population. For example, to de-
`scribe the load offered by emulating a population of 20,000
`users evenly distributed on our network testbed, we would
`first emulate this user population with the two ISP networks
`connected with a gigabit/second link and measure the aver-
`age throughput in one direction on this link. The measured
`throughput, approximately 105 Mbps in the case of the
`HTTP workload, is our value of offered load.
`
`Although the TCP traffic we generate is highly bursty (e.g.,
`for the HTTP workload we generate a long-range dependent
`packet arrival process at the routers [10]), the range of loads
`we attempt to generate in our experiments (approximately
`50-120 Mbps) is such that congestion will never be present
`on the gigabit network. Since experiments are ultimately
`performed with the two ISP networks connected at 100
`Mbps, a calibration process (described in [10]) is used to
`determine the population of users required to achieve a par-
`ticular degree of congestion on the 100 Mbps network.
`
`Each experiment was run using offered loads of 90%, 98%,
`or 105% of the capacity of the 100 Mbps link connecting the
`two router machines. It is important to emphasize that terms
`like “105% load” are used as a shorthand notation for “a
`population of users that would generate a long-term average
`load of 105 Mbps on a 1 Gbps link.” As offered loads ap-
`proach saturation of the 100 Mbps link, the actual link utili-
`zation will, in general, be less than the intended offered
`load. This is because as utilization increases, response times
`become longer and emulated users have to wait longer be-
`fore they can generate new requests and hence generate
`fewer requests per unit time.
`
`Each experiment was run for 120 minutes to ensure very
`large samples (over 10,000,000 TCP data exchanges), but
`data were collected only during a 90-minute interval to
`eliminate startup effects at the beginning and termination
`synchronization anomalies at the end.
`
`The key indicator of performance we use in reporting our
`results are the end-to-end response times for each data ex-
`change (e.g., an HTTP request/response). We report these as
`CDFs of response times up to 2 seconds. We also report the
`fraction of IP datagrams dropped at the link queues, the link
`utilization on the bottleneck link, and the number of re-
`quest/response exchanges completed in the experiment.
`
`5. Experimental Results
`We compared the performance of DCN against a large
`number of AQM designs but report here only the results of
`DCN versus only PI, AFD and RIO-PS. PI was selected
`because it was the best performing non-differential AQM
`scheme from a previous study [10]. AFD and RIO-PS were
`chosen because they were the best performing differential
`schemes. We also include results from experiments with
`drop-tail FIFO queue management to illustrate the perform-
`ance of no AQM (i.e., the performance to beat), and results
`from drop-tail experiments on the uncongested gigabit net-
`work to illustrate the best possible performance. (The ex-
`periments on the uncongested network provide the best pos-
`sible response times as there is no queue present on either
`router at any of the load levels considered here.)
`
`For PI, DCN and AFD, an important parameter is the target
`queue length the algorithm attempts to maintain. After ex-
`tensive experimentation, we found that AFD perform best at
`a target queue length of 240 packets at all loads. DCN and
`PI obtain their best performance with a target queue length
`of 24 packets. We used the parameter settings for RIO-PS
`that were recommended by its inventors [6]. In all cases we
`set the maximum queue size to a number of packets that was
`large enough to ensure tail drops did not occur.
`
`5.1 Experimental Results for the HTTP Workload
`Figures 5-8 give the results for DCN, PI, AFD, and RIO-PS.
`They show the cumulative distribution functions (CDFs) for
`response times of HTTP request/response exchanges at of-
`fered loads of 90%, 98%, and 105% respectively. We also
`report other statistics for the experiments in Table 1.
`
`Figure 5 shows that at all loads, the per