throbber
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna R Gummadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Department of Computer Science & Engineering University of Washington {t zoompy, gummadi, rdunn, gribble, levy}@cs, washington, edu Abstract In the span of only a few years, the Internet has expe- rienced an astronomical increase in the use of specialized content delivery systems, such as content delivery networks and peer-to-peer file sharing systems. Therefore, an under- standing of content delivery on the lnternet now requires a detailed understanding of how these systems are used in practice. This paper examines content delivery from the point of view of four content delivery systems: HTTP web traffic, the Akamai content delivery network, and Kazaa and Gnutella peer-to-peer file sharing traffic. We collected a trace of all incoming and outgoing network traffic at the University of Washington, a large university with over 60,000 students, faculty, and staff. From this trace, we isolated and char- acterized traffic belonging to each of these four delivery classes. Our results (1) quanti~, the rapidly increasing im- portance of new content delivery systems, particularly peer- to-peer networks, (2) characterize the behavior of these sys- tems from the perspectives of clients, objects, and servers, and (3) derive implications for caching in these systems. 1 Introduction Few things compare with the growth of the Internet over the last decade, except perhaps its growth in the last several years. A key challenge for Internet infrastructure has been delivering increasingly complex data to a voracious and growing user population. The need to scale has led to the development of thousand-node clusters, global-scale con- tent delivery networks, and more recently, self-managing peer-to-peer structures. These content delivery mechanisms are rapidly changing the nature of Internet content delivery and traffic; therefore, an understanding of the modern Inter- net requires a detailed understanding of these new mecha- nisms and the data they serve. This paper examines content delivery by focusing on four content delivery systems: HTTP web traffic, the Aka- mai content delivery network, and the Kazaa and Gnutella peer-to-peer file sharing systems. To perform the study, we traced all incoming and outgoing Internet traffic at the Uni- versity of Washington, a large university with over 60,000 students, faculty, and staff. For this paper, we analyze a nine day trace that saw over 500 million transactions and over 20 terabytes of HTTP data. From this data, we pro- vide a detailed characterization and comparison of content delivery systems, and in particular, the latest peer-to-peer workloads. Our results quantify: (1) the extent to which peer-to-peer traffic has overwhelmed web traffic as a lead- ing consumer of Internet bandwidth, (2) the dramatic differ- ences in the characteristics of objects being transferred as a result, (3) the impact of the two-way nature of peer-to-peer communication, and (4) the ways in which peer-to-peer sys- tems are not scaling, despite their explicitly scalable design. For example, our measurements show that an average peer of the Kazaa peer-to-peer network consumes 90 times more bandwidth than an average web client in our environment. Overall, we present important implications for large organi- zations, service providers, network infrastructure, and gen- eral content delivery. The paper is organized as follows. Section 2 presents an overview of the content delivery systems examined in this paper, as well as related work. Section 3 describes the measurement methodology we used to collect and process our data. In Section 4 we give a high-level overview of the workload we have traced at the University of Washing- ton. Section 5 provides a detailed analysis of our trace from the perspective of objects, clients, and servers, focusing in particular on a comparison of peer-to-peer and web traffic. Section 6 evaluates the potential for caching in content de- livery networks and peer-to-peer networks, and Section 7 concludes and summarizes our results. 2 Overview of Content Delivery Systems Three dominant content delivery systems exist today: the client/server oriented world-wide web, content delivery net- works, and peer-to-peer file sharing systems. At a high level, these systems serve the same role of distributing con- tent to users. However, the architectures of these systems differ significantly, and the differences affect their perfor- mance, their workloads, and the role caching can play. In this section, we present the architectures of these systems and describe previous studies of their behavior. USENIX Association 5th Symposium on Operating Systems Design and Implementation 315
`
`Genius Sports Ex. 1042
`p. 1
`
`

`

`2.1 The World-Wide Web (WWW) The basic architecture of the web is simple: using the HTTP [16] protocol, web clients running on users' ma- chines request objects from web servers. Previous stud- ies have examined many aspects of the web, including web workloads [2, 8, 15, 29], characterizing web ob- jects [3, 11], and even modeling the hyperlink structure of the web [6, 21]. These studies suggest that most web ob- jects are small (5-10KB), but the distribution of object sizes is heavy-tailed and very large objects exist. Web objects are accessed with a Zipf popularity distribution, as are web servers. The number of web objects is enormous (in the bil- lions) and rapidly growing; most web objects are static, but an increasing number are generated dynamically. The HTTP protocol includes provisions for consistency management. HTTP headers include caching pragmas that affect whether or not an object may be cached, and if so, for how long. Web caching helps to alleviate load on servers and backbone links, and can also serve to de- crease object access latencies. Much research has focused on Web proxy caching [4, 5, 7, 11, 12] and, more recently, on coordinating state among multiple, cooperating proxy caches [13, 30, 33]; some of these proposals aim to cre- ate global caching structures [27, 34]. The results of these studies generally indicate that cache hit rates of 40-50% are achievable, but that hit rate increases only logarithmi- cally with client population [36] and is constrained by the increasing amount of dynamically generated and hence un- cacheable content. 2.2 Content Delivery Networks (CDNs) Content delivery networks are dedicated collections of servers located strategically across the wide-area Internet. Content providers, such as web sites or streaming video sources, contract with commercial CDNs to host and dis- tribute content. CDNs are compelling to content providers because the responsibility for hosting content is offloaded to the CDN infrastructure. Once in a CDN, content is repli- cated across the wide area, and hence is highly available. Since most CDNs have servers in ISP points of presence, clients can access topologically nearby replicas with low latency. The largest CDNs have thousands of servers dis- persed throughout the Internet and are capable of sustaining large workloads and traffic hot-spots. CDNs are tightly integrated into the existing web archi- tecture, relying either on DNS interposition [19, 32] or on URL rewriting at origin servers to redirect HTTP requests to the nearest CDN replica. As with the web, the unit of transfer in a CDN is an object, and objects are named by URLs. Unlike the web, content providers need not manage web servers, since clients' requests are redirected to replicas hosted by the CDN. In practice, CDNs typically host static content such as images, advertisements, or media clips; con- tent providers manage their own dynamic content, although dynamically generated web pages might contain embedded objects served by the CDN. Previous research has investigated the use and effective- ness of content delivery networks [14], although the pro- prietary and closed nature of these systems tends to im- pede investigation. Two recent studies [22, 23] confirm that CDNs reduce average download response times, but that DNS redirection techniques add noticeable overhead because of DNS latencies. In another study [18], the au- thors argue that the true benefit of CDNs is that they help clients avoid the worst-case of badly performing replicas, rather than routing clients to a truly optimal replica. To the best of our knowledge, no study has yet compared the work- loads of CDNs with other content delivery architectures. 2.3 Peer-to-Peer Systems (P2P) Peer-to-peer file sharing systems have surged in popu- larity in recent years. In a P2P system, peers collaborate to form a distributed system for the purpose of exchanging content. Peers that connect to the system typically behave as servers as well as clients: a file that one peer downloads is often made available for upload to other peers. Participa- tion is purely voluntary, and a recent study [31] has shown that most content-serving hosts are run by end-users, suffer from low availability, and have relatively low capacity net- work connections (modem, cable modems, or DSL routers). Users interact with a P2P system in two ways: they at- tempt to locate objects of interest by issuing search queries, and once relevant objects have been located, users issue download requests for the content. Unlike the web and CDN systems, the primary mode of usage for P2P systems is a non-interactive, batch-style download of content. P2P systems differ in how they provide search capabil- ities to clients [37]. Some systems, such as Napster [28], have large, logically centralized indexes maintained by a single company; peers automatically upload lists of avail- able files to the central index, and queries are answered us- ing this index. Other systems, such as Gnutella [10] and Freenet [9], broadcast search requests over an overlay net- work connecting the peers. More recent P2P systems, in- cluding Kazaa [20], use a hybrid architecture in which some peers are elected as "supernodes" in order to index content available at peers in a nearby neighborhood. P2P systems also differ in how downloads proceed, once an object of interest has been located. Most systems transfer content over a direct connection between the object provider and the peer that issued the download request. A latency- improving optimization in some systems is to download multiple object fragments in parallel from multiple repli- cas. A recent study [24] has found the peer-to-peer traffic of a small ISP to be highly repetitive, showing great poten- tial for caching. 316 5th Symposium on Operating Systems Design and Implementation USENIX Association
`
`Genius Sports Ex. 1042
`p. 2
`
`

`

`3 Methodology We use passive network monitoring to collect traces of traffic flowing between the University of Washington (UW) and the rest of the Internet. UW connects to its ISPs via two border routers; one router handles outbound traffic and the other inbound traffic. These two routers are fully con- nected to four switches on each of the four campus back- bones. Each switch has a monitoring port that is used to send copies of the incoming and outgoing packets to our monitoring host. Our tracing infrastructure is based on software developed by Wolman and Voelker for previous studies [35, 36]. We added several new components to identify, capture, and an- alyze Kazaa and Gnutella peer-to-peer traffic and Akamai CDN traffic. Overall, the tracing and analysis software is approximately 26,000 lines of code. Our monitoring host is a dual-processor Dell Precision Workstation 530 with 2.0 GHz Pentium III Xeon CPUs, a Gigabit Ethernet SysKon- nect SK-9843 network card, and running FreeBSD 4.5. Our software installs a kernel packet filter [26] to de- liver TCP packets to a user-level process. This process re- constructs TCP flows, identifies HTTP requests within the flows (properly handling persistent HTTP connections), and extracts HTTP headers and other metadata from the flows. Because Kazaa and Gnutella use HTTP to exchange files, this infrastructure is able to capture P2P downloads as well as WWW and Akamai traffic. We anonymize sensitive in- formation such as IP addresses and URLs, and log all ex- tracted data to disk in a compressed binary representation. 3.1 Distinguishing Traffic Types Our trace captures two types of traffic: HTTP traffic, which can be further broken down into WWW, Akamai, Kazaa, and Gnutella transfers, and non-HTTP TCP traffic, including Kazaa and Gnutella search traffic. If an HTTP request is directed to port 80, 8080, or 443 (SSL), we clas- sify both the request and the associated response as WWW traffic. Similarly, we use ports 6346 and 6347 to iden- tify Gnutella HTTP traffic, and port 1214 to identify Kazaa HTTP traffic. A small part of our captured HTTP traffic remains unidentifiable; we believe that most of this traffic can be attributed to less popular peer-to-peer systems (e.g., Napster [28]) and by compromised hosts turned into IRC or web servers on ports other than 80, 8080, or 444. For non- HTTP traffic, we use the same Gnutella and Kazaa ports to identify P2P search traffic. Some WWW traffic is served by the Akamai content delivery network [1]. Akamai has deployed over 13,000 servers in more than 1,000 networks around the world [25]. We identify Akamai traffic as any HTTP traffic served by any Akamai server. To obtain a list of Akamai servers, we collected a list of 25,318 unique authoritative name servers, and sent a recursive DNS query to each server for a name in an Akamai-managed domain (e.g., a3 S S. g. akamaitech, net). Because Akamai redirects DNS queries to nearby Akamai servers, we were able to collect a list of 3,966 unique Akamai servers in 928 different net- works. For the remainder of this paper, we will use the following definitions when classifying traffic: Akamai: HTTP traffic on port 80, 8080, or 443 that is served by an Akamai server. WWW: HTTP traffic on port 80, 8080, or 443 that is not served by an Akamai server; thus, for all of the analysis within this paper, "WWW traffic" does not in- clude Akamai traffic. Gnutella: HTTP traffic sent to ports 6346 or 6347 (this includes file transfers, but excludes search and control traffic). Kazaa: HTTP traffic sent to port 1214 (this includes file transfers, but excludes search and control traffic). ® P2P: the union of Gnutella and Kazaa. non-HTTP TCP traffic: any other TCP traffic, in- cluding protocols such as NNTP and SMTP, HTTP traffic to ports other than those listed above, traffic from other P2P systems, and control or search traffic on Gnutella and Kazaa. 3.2 The Traceability of P2P Traffic Gnutella is an overlay network over which search re- quests are flooded. Peers issuing search requests receive a list of other peers that have matching content. From this list, the peer that issued the request initiates a direct connection with one of the matching peers to download content. Be- cause the Gnutella overlay is not structured to be efficient with respect to the physical network topology, most down- loads initiated by UW peers connect to external hosts, and are therefore captured in our traces. Although the details of Kazaa's architecture are propri- etary, some elements are known. The Kazaa network is a two-level overlay: some well-connected peers serving as "supernodes" build indexes of the content stored on nearby "regular" peers. To find content, regular peers issue search requests to their supernodes. Supernodes appear to com- municate amongst themselves to satisfy queries, returning locations of matching objects to the requesting peer. Kazaa appears to direct peers to nearby objects, although the de- tails of how this is done, or how successful the system is at doing it, are not known. To download an object, a peer initiates one or more con- nections to other peers that have replicas of the object. The downloading peer may transfer the entire object in one con- nection from a single peer, or it may choose to download multiple fragments in parallel from multiple peers. USENIX Association 5th Symposium on Operating Systems Design and Implementation 317
`
`Genius Sports Ex. 1042
`p. 3
`
`

`

`HTTP transactions unique objects clients servers bytes transferred median object size mean object size WWW t Akamai Kazaa [ GnuteUa ......... ] ] outbound inbound ~ outbound inbound outbound I inbound 329,072,253 72,818,997 39,285 403,087 1.51 TB 1,976 B 24,687 B outbound ] ..... 73,001,891 3,412,647 1,231,308 9,821 3.02 TB 4,646 B 82,385 B inbound 33,486,508 N/A 1,558,852 N/A 34,801 N/A 350 N/A 64.79 GB N/A 2,001B N/A 12,936 B N/A 11,140,861 19,190,902 111,437 166,442 4,644 611,005 281,026 3,888 1.78 TB 13.57 TB 3.75 MB 3.67 MB 27.78 MB 19.07 MB 1,576,048 5,274 2,151 20,582 28.76 GB 4.26 MB 19.16 MB 1,321,999 2,092 25,336 412 60.38 GB 4.08 MB 9.78 MB Table 1. HTTP trace summary statistics: trace statistics, broken down by content delivery system; inbound refers to transfers from Internet servers to UW clients, and outbound refers to transfers from UW servers to Intemet clients. Our trace was collected over a nine day period, tYom Tuesday May 28th through Thursday June 6th, 2002. 800 700 600 500 /R ~- 400 E 300 200 100 2 .... 2 2 /5 .... o .... i5 6i o /5 "" "~ = .= .~ = = ® -g J¢ Figure 1. TCP bandwidth: total TCP bandwidth consumed by HTTP transfers for different content delivery systems. Each band is cumulative; this means that at noon on the first Wednesday, Akamai consumed approximately 10 Mbps, WWW consumed approximately 100 Mbps, P2P consumed approximately 200 Mbps, and non-HTTP TCP consumed approximately 300 Mbps, for a total of 610 Mbps. The ability for a Kazaa peer to download an object in fragments complicates our trace. Download requests from external peers seen in our trace are often for fragments rather than entire objects. 4 High-Level Data Characteristics This section presents a high-level characterization of our trace data. Table 1 shows summary statistics of object trans- fers. This table separates statistics from the four content de- livery systems, and further separates inbound data (data re- quested by UW clients from outside servers) from outbound data (data requested by external clients from UW servers). Despite its large client population, the University is a net provider rather than consumer of HTTP data, exporting 16.65 TB but importing only 3.44 TB. The peer-to-peer sys- tems, and Kazaa in particular, account for a large percentage of the bytes exported and the total bytes transferred, despite their much smaller internal and external client populations. Much of this is attributable to a large difference in average object sizes between WWW and P2P systems. The number of clients and servers in Table 1 shows the extent of participation in these systems. For the web, 39,285 UW clients accessed 403,437 Internet web servers, while for Kazaa, 4,644 UW clients accessed 281,026 external In- ternet servers. For Akamai, 34,801 UW clients download Akamai-hosted content provided by 350 different Akamai servers. In the reverse direction, 1,231,308 Internet clients accessed UW web content, while 611,005 clients accessed UW-hosted Kazaa content. Figure 1 shows the total TCP bandwidth consumed in both directions over the trace period. The shaded areas show HTTP traffic, broken down by content delivery sys- tem; Kazaa and Gnutella traffic are grouped together un- der the label "P2P." All systems show a typical diurnal cy- cle. The smallest bandwidth consumer is Akamai, which currently constitutes only 0.2% of observed TCP traffic. Gnutella consumes 6.04%, and WWW traffic is the next largest, consuming 14.3% of TCP traffic. Kazaa is currently the largest contributor, consuming 36.9% of TCP bytes. These four content delivery systems account for 57% of to- tal TCP traffic, leaving 43% for other TCP-based network protocols (streaming media, news, mail, and so on). TCP traffic represents over 97% of all network traffic at UW. This closely matches published data on Internet 2 usage [ 17]. 318 5th Symposium on Operating Systems Design and Implementation USENIX Association
`
`Genius Sports Ex. 1042
`p. 4
`
`

`

`Bandwidth Consumed by UW Clients 7C 250 6C 2OO 5C 150 ~. 4c ~ 3¢ ~ 100 2¢ Bandwidth Consumed by UW Servers 50 t2 0 ¸ 8 o° o° °o g g g g o° o° o° g g g g g g g g g g g g g ~ o° o° g g g g o° g o° g g g g Wed Thu Fri Sat Sun Mon Toe Wed Thu Frl Wed Thu Fri Sat Sun Mon True Wed Thu Fri (a) (b) Figure 2, UW client and server TCP bandwidth: bandwidth over time (a) accountable to web and P2P downloads from UW clients, and (b) accountable to web and P2P uploads from UW servers. Figures 2a and 2b show inbound and outbound data bandwidths, respectively. From Figure 2a we see that while both WWW and Kazaa have diurnal cycles, the cycles are offset in time, with WWW peaking in the middle of the day and Kazaa peaking late at night. For UW-initiated requests, WWW and Kazaa peak bandwidths have the same order of magnitude; however, for requests from external clients to UW servers, the peak Kazaa bandwidth dominates WWW by a factor of three. Note that the Y-axis scales of the graphs are different; WWW peak bandwidth is approximately the same in both directions, while external Kazaa clients con- sume 7.6 times more bandwidth than UW Kazaa clients. Figure 3a and 3b show the top 10 content types requested by UW clients, ordered by bytes downloaded and number of downloads. While GIF and JPEG images account for 42% of requests, they account for only 16.3% of the bytes trans- ferred. On the other hand, AVI and MPG videos, which account for 29.3% of the bytes transferred, constitute only 0.41% of requests. HTML is significant, accounting for 14.6% of bytes and 17.8% of requests. The 9.9% of bytes labelled "HASHED" in Figure 3a are Kazaa transfers that cannot be identified; of the non-hashed Kazaa traffic that can be identified, AVI and MPG account for 79% of the bytes, while 13.6% of the bytes are MP3. It is interesting to compare these figures with corre- sponding measurements from our 1999 study of the same population [35]. Looking at bytes transferred as a percent of total HTTP traffic, HTML traffic has decreased 43% and GIF/JPG has decreased 59%. At the same time, AVI/MPG (and Quicktime) traffic has increased by nearly 400%, while MP3 traffic has increased by nearly 300%. (These percent- ages numbers include an estimate of the appropriate portion of the hashed bytes contributing to all content types). In summary, this high-level characterization reveals sub- stantial changes in content delivery systems usage in the Internet, as seen from the vantage point of UW. First, the balance of HTTP traffic has changed dramatically over the last several years, with P2P traffic overtaking WWW traffic as the largest contributor to HTTP bytes transferred. Sec- ond, although UW is a large publisher of web documents, Content Types Ot'dered by Size !1,7% vldeo/x-msvideo all 2.1% texl/plaln mil[ll 4,8% MP3 ~7,1% app/octet-$tream Illmllll 7.5% GIF ~llllmllmmll 8.8% JPG 9.9% HASHED IIIII1~111,3% MPQ IBllBilllllllmlllll~ 14.6% HTML 18.0% AVI Content Types O'~dered by Number of Downloads 10.7% XML 11.0% text/css |1.1% app/x-compre~s m2.0% text/plain BN 3.5% javascrlpt ~3.9% app/octet-stream ~9.8% JPG ~llmlllaIB 1L8% HTML 26,4% No Type mmm 31.8% GIF 0% 10% 20% 0% 10% 20% %All Bytes % Responses (a) (b) 30% 40% Figure 3. Content types downloaded by UW clients: a his- togram of the top 10 content types downloaded by UW clients, across all four systems, ordered by (a) size and (b) number of downloads. P2P traffic makes the University an even larger exporter of data. Finally, the mixture of object types downloaded by UW clients has changed, with video and audio accounting for a substantially larger fraction of traffic than three years ago, despite the small number of requests involving those data types. 5 Detailed Content Delivery Characteristics The changes in Internet workload that we have observed raise several questions, including: (1) what are the proper- ties of the new objects being delivered, (2) how are clients using the new content-delivery mechanisms, and (3) how do servers for new delivery services differ from those for the web? We attempt to answer these questions in the sub- sections below. 5.1 Objects Data in Section 4 suggests that there is a substantial dif- ference in typical object size between P2P and WWW traf- fic. Figure 4 illustrates this in dramatic detail. Not sur- prisingly, Akamai and WWW object sizes track each other fairly closely. The median WWW object is approximately 2KB, which matches previous measurement studies [15]. The Kazaa and Gnutella curves are strikingly different from the WWW; the median object size for these P2P systems is USENIX Association 5th Symposium on Operating Systems Design and Implementation 319
`
`Genius Sports Ex. 1042
`p. 5
`
`

`

`Object Size Distributions 80% - .~m.ai ~ 60% - - 8 .~ 40% ............... 20% O% 0 1 10 100 1,000 10,000 100,000 1,000,000 Object Size (KB) Figure 4. Object size distributions: cumulative distributions (CDFs) of object sizes. ~ 60"/* - 40%- 20% - 0% - Top Bandwidth Consuming Objects 100% ............................................................................................................................................................................ 80% = Abemai. ~ _ 0 200 400 600 800 1000 Number of Objects Figure 5. Top bandwidth consuming objects: a CDF of bytes fetched by UW clients for top 1,000 bandwidth-consuming ob- jects. approximately 4MB - a thousand-fold increase over the av- erage web document size! Worse, we see that 5% of Kazaa objects are over 100MB. This difference has the potential for enormous impact on Internet performance as these sys- tems grow. Figure 5 shows a cumulative distribution of bytes fetched by UW clients for the 1,000 highest bandwidth-consuming objects in each of the four CDNs. The Akamai curve rises steeply, with the top 34 objects accounting for 20% of the Akamai bytes transferred; Akamai traffic is clearly skewed to its most popular documents. For Kazaa, we see that a relatively small number of objects account for a large por- tion of the transferred bytes as well. The top 1,000 Kazaa objects (out of 111K objects accessed) are responsible for 50% of th e bytes transferred. For the web, however, the curve is much flatter: the top 1,000 objects only account for 16% of bytes transferred. To understand this better, we examined the 10 high- est bandwidth-consuming objects for WWW, Akamai and Kazaa, which are responsible for 1.9%, 25% and 4.9% of the traffic for each system, respectively. The details are shown in Table 2. For WWW, we see that the top 10 ob- jects are a mix of extremely popular small objects (e.g., ob- jects 1,2 and 4), and relatively unpopular large objects (e.g., object 3). The worst offender, object 1, is a small object ac- cessed many times. For Akamai, although 8 out of the top Byte Breakdown per Content Delivery System 100"/. ] [] TEXT (T) | [] IMAGES (6) V 80% -~ [] AUDIO (A) ~ .--~ , •VIDEO(V) i___ j ~" cox / OOTHER (O) I V i~r 40% T ! O - --~ iiiiii~ii 20% A 0o o WWW Akamai Gnutella Kazaa Figure 6. Downloaded bytes by object type: the number of bytes downloaded from each system, broken into content type. 10 objects are large ancl unpopular, 2 out of the top 3 worst offenders are small and popular. Kazaa's inbound traffic, on the other hand, is completely consistent; all of its worst of- fenders are extremely large objects (on the order of 700MB) that are accessed only ten to twenty times. Comparing Kazaa inbound and outbound traffic in Ta- ble 2 shows several differences. The objects that contribute most to bandwidth consumption in either direction are simi- larly sized, but UW tends to export these large objects more than it imports them. A small number of UW clients access large objects from a small number of external servers, but nearly thirty times as many external clients access similarly- sized objects from a handful of UW servers, leading to approximately ten times as much bandwidth consumption. This suggests that a reverse cache that absorbs outbound traffic might benefit the University even more than a for- ward cache that absorbs inbound traffic. Figure 3 in the previous section showed a breakdown of all HTTP traffic by content type for UW-client-initiated traf- fic. Figure 6 shows a similar breakdown, by bytes, but for each individual CDN. Not surprisingly, the highest compo- nent of WWW traffic is text, followed closely by images, while Akamai is dominated by images (42% of bytes are GIF and JPEG). In contrast, Kazaa is completely dominated by video (80%), followed by 14% audio; Gnutella is more evenly split with 58% video and 36% audio. 5.2 Clients The previous subsection considered the characteristics of what is transferred (the object view); here we consider who is responsible (the client view). Because WWW and Aka- mai are indistinguishable from a UW client's perspective, this section presents these two workloads combined. Figure 7a shows a cumulative distribution of bytes down- loaded by the top 1000 bandwidth-consuming UW clients for each CDN. It's not surprising that the WWW+Akamai curve is lower; the graph shows only a small fraction of the 39K WWW+Akamai clients, but nearly a quarter of the 4644 Kazaa clients. Nevertheless, in both cases, a small number of clients account for a large portion of the traffic. In the case of the WWW, the top 200 clients (0.5% of the 320 5th Symposium on Operating Systems Design and Implementation USENIX Association
`
`Genius Sports Ex. 1042
`p. 6
`
`

`

`WWW (inbound) Akamai Kazaa (inbound) Kazaa (outbound) object GB object GB # object GB object ~ GB size # requests size size # clients # servers size # cliems # servers (MB) consumed (MB) consumed requests (MB) consuul~ed [ (MB) consumed 1 0.009 12.29 1,412,104 22.37 4.72 218 694.39 8.14 20 164 J 696.92 119.01 397 1 I 2 0.002 6.88 3,007,720 0.07 2.37 45,399 702.17 6.44 14 91 i 699.28 110.56 1000 4 i 3 333 6.83 21 0.11 1.64 68,202 690.34 6.13 22 83 699.09 78.76 390 10 4 0.005 6.82 1,412,105 9.16 1.59 2,222 775.66 5.67 i6 105 700.86 73.30 558 2 5 2.23 3.17 1,457 13.78 1.31 107 698.13 4.70 14 74 634.25 64.99 540 1 6 0.02 2.69 126,625 82.03 1.14 23 712.97 4.69 17 120 690,34 64.97 533 10 7 0.02 2.69 122,453 21.05 1.01 50 715.61 4.49 13 71 690.34 54.90 447 16 8 0.03 1.92 56,842 16.75 1.00 324 579.13 4.30 14 158 699.75 ] 49.47 171 2 i 9 0.01 1.9i 143,780 15.84 0.95 68 617.99 4.12 12 94 696.42 [ 43.35 384 14 10 0.04 1.86 47,676 15.12 0.80 57 167.18 3.83 39 247 662.69 ! 42.28 151 2 Table 2. Top 10 bandwidth consuming objects: the size, bytes consumed, and number of requests (including the partial and unsuccessful ones) for the top t 0 bandwidth consuming objects in each system. For Kazaa, instead of requests, we show the number of clients and servers that participated in (possibly partial) transfers of the object. Top Bandwidth Consuming UW Clients (as fraction of each system) lO0%8o% ] .................................................................................................................................................................................. Gnutel~..~..~.....__d~_~.~_,,~..,._._. j 6o%~ "~''''" 40% WWW + Akamal 20% 0% 0 200 400 600 800 1000 Number of UW Clients Top Bandwidth Consuming UW Clients (as fraction of total HTTP) 100% l 80% I (cid:127) 60% N 4O% Gnutella ..~_~_K~.~L 20%1-- ,,~,.~.,_,....~.~"~ "~ ~WWW + Akamal _ 0 200 400 600 800 1000 Number of UW Clients (a) (b) Figure 7. Top UW bandwidth consuming clients: a CDF of bytes downloaded by the top 1000 bandwidth-consuming UW clients (a) as a fraction of each system, and (b) as a fraction of the total HTTP traffic. population) account for 13% of WWW traffic; for Kazaa, the top 200 clients (4% of the population) account for 50% of Kazaa traffic. The next 200 Kazaa clients account for another 16% of its traffic. Clearly, a very small number of Kazaa clients have a huge overall bandwidth impact. To see the impact more globally, the curves in Figure 7b show the fraction of the total HTTP bytes downloaded by the most bandwidth-consuming clients for each CDN (the curves are cumulative). This allows us to quantify the im- pact of a particular CDN's clients on total HTTP traffic. Gnutella clients have almost no impact as consumers of HTTP bandwidth. In contrast, the Kazaa users are the worst offenders: the top 200 Kazaa clients are responsible for 20% of the total HTTP bytes downloaded. In comparison, the top 200 WWW+Akamai clients are responsible for only 7% of total HTTP bytes. Further out, the top 400 Kazaa and WWW clients

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket