throbber
P2P, the Gorilla in the Cable
`Alexandre Gerber, Joseph Houle, Han Nguyen, Matthew Roughan, Subhabrata Sen
`AT&T Labs - Research
`
`Abstract
`
`
`
`
`
`
`
`
` There is considerable interest in Peer-to-
`peer (P2P) traffic because of its remarkable
`increase over the last few years. By analyzing
`flow measurements
`at
`the
`regional
`aggregation points of several cable operators,
`we are able to study its properties. It has
`become a large part of broadband traffic and
`its characteristics are different from older
`applications, such as the Web. It is a stable
`balanced traffic: the peak to valley ratio
`during a day
`is around 2 and
`the
`Inbound/Outbound traffic balance is close to
`one. Although P2P protocols are based on a
`distributed architecture,
`they don’t show
`strong signs of geographical locality. A cable
`subscriber
`is not much more
`likely
`to
`download a file from a close region than from
`a far region.
`
` It is clear that most of the traffic is
`generated by heavy hitters who abuse P2P
`(and other) applications, whereas most of the
`subscribers only use
`their broadband
`connections to browse the web, exchange e-
`mails or chat. However it is not easy to
`directly block or limit P2P traffic, because
`theses applications adapt themselves to their
`environment:
`the users develop ways of
`eluding the traffic blocks. The traffic that
`could be once
`identified with
`five port
`numbers is now spread over thousands of
`TCP ports, pushing port based identification
`to its limits. More complex methods to identify
`P2P traffic are not a long-term solution, the
`cable industry should opt for a a “pay for
`what you use” model like the other utilities.
`
`
`
`
`
`
`
`
`
`INTRODUCTION
`
`
`File Sharing Applications
`
` KaZaA, Gnutella and DirectConnect are all
`decentralized, self-organizing
`file sharing
`systems with data and index information
`(metadata for searching) distributed over a set
`of end-peers or peers, each of which can be
`both a client and a server of content. Peers can
`join and leave frequently, and organize in a
`distributed fashion into an application-level
`overlay via point-to-point application-level
`connections between a peer and a set of other
`peers (its neighbors). By default, all the
`communications occur over well known ports.
`
` The process of obtaining a file can be
`broadly divided into two phases – a search
`followed by a object retrieval. First, a peer
`uses the P2P protocol to search for the
`existence of a certain file in the P2P system,
`receives one or more responses, and if the
`search is successful, identifies one or more
`target peers from which to download that file.
`The search queries as well as the responses are
`transmitted via the overlay connections using
`protocol-specific application level routing.
`The details of how the signaling is propagated
`through the overlay is protocol-dependent. In
`earlier P2P protocols exemplified by Gnutella
`version 4.0, a peer initiates a query by
`flooding it to all its neighbors in the overlay.
`The neighboring peers in turn, flood to their
`neighbors, using a scoping mechanism to
`control the query flood. In contrast, for both
`KazaA and DirectConnect as well as newer
`versions of Gnutella, queries are forwarded to
`
`1
`
`EX1010
`Palo Alto Networks v. Sable Networks
`IPR2020-01712
`
`

`

`and handled by only a subset of special peers
`(called SuperNodes in KazaA, Hubs in
`DirectConnect, and UltraPeers in Gnutella). A
`peer transmits an index of its content to the
```special peer'' to which it is connected. The
`special peer then uses the corresponding P2P
`protocol to forward the query to other such
`peers in the system.
`
` Once search results are in, the requesting
`peer directly contacts the target peer, typically
`using HTTP (the target peer runs has a HTTP
`server
`listening
`by
`default
`on
`a
`known,protocol-specific port), to get the
`requested resource. Some newer systems,
`such as KazaA and Gnutella, use “file
`swarming” -- a file download is executed by
`retrieving different chunks from multiple
`peers.
`
` Although the earlier P2P systems mostly
`used
`their default network ports
`for
`communication, there is substantial evidence
`to suggest
`that substantial P2P
`traffic
`nowadays is transmitted over a large number
`of non-standard ports. This seems to be
`primariliy motivated by
`the desirwe
`to
`circumvent firewall restrictions as well as
`rate–limiting actions by ISPs targeted at such
`applications - we shall discuss this more later
`in the paper.
`
` Another recent development has been the
`development of tools for allowing an end-user
`to explicitly select the SuperNode it connects
`to. This appears to be an attempt to improve
`the quality of the best-effort search process in
`the P2P system, for files that may exhibit
`locality in storage. For instance, connecting to
`a SuperNode in Brazil may increase the
`chances of locating Samba-related content.
`
`Data Collection
`
` We have access to “flow-level” data at the
`regional aggregation points
`for
`several
`
`
`2
`
`is
`data
`Flow-level
`ISPs.
`broadband
`considerably more detailed than data sets such
`as SNMP, and at least this level of detail if
`needed to perform application classification.
`The regional aggregation points provide the
`MSOs with access to the backbone for traffic
`between regions and to the rest of the Internet,
`where a region typically ranges from an
`extended metropolitan area to a state.
`
` By flow, we mean a sequence of packets
`exchanged by
`two
`applications. More
`precisely we define a flow to be a series of
`uni-directional packets with the same IP
`protocol, source and destination address, and
`source and destination ports (in the case of
`TCP
`and UDP
`traffic).
` The
`flow
`measurements used here are called Cisco
`Netflow; they are implemented in many of
`Cisco’s routers. The data collected about a
`flow (apart from the information above) are
`the duration, the number of packets, and bytes
`transmitted, and which header flags (SYN,
`ACK, …) were used in the flow. Measured
`flows are also constrained in time (Cisco
`Netflow collection sends flows from the
`router at 15 minute intervals), so there is a
`need to reconstruct the actual traffic from a
`single “connection”. After
`reconstruction
`there will be one flow per connection – a
`potentially enormous volume of information.
`
` In order to minimize any performance
`impact on the routers collecting the flow
`measurements the measurements are based on
`sampled packets collected on the routers,
`which then export the flows to aggregators.
`To
`reduce
`the huge data volume
`the
`aggregator further samples the flows using the
`smart sampling algorithm [SAMP] that is
`better suited for heavy tailed distribution, such
`as
`typically found
`in Internet flows. In
`addition to that there is also an uncontrolled
`sampling due to measurement packet losses.
`These
`three
`types of sampling can be
`estimated and corrected and don’t affect our
`
`

`

`results that are based on the weekly or
`monthly average traffic generated by hundreds
`of thousands of cable subscribers.
`
` More precisely, we used data ranging from
`May 2002
`to February 2003 from five
`different MSOs. When we were not collecting
`all the traffic coming from a region, we were
`using SNMP data to extrapolate the actual
`traffic. However, when we analysed
`the
`behaviour per broadband user, we selected
`only regional aggregation points for which we
`were
`collecting
`all
`the
`flow
`level
`measurements.
`
`Identifying Applications
`
` There are a number of ways one could go
`about
`identifying
`individual applications
`within IP traffic. However, as noted, Netflow
`only keeps data on some aspects of flows. The
`most useful of
`these
`for
`application
`breakdowns are the source and destination
`port numbers, and the IP protocol number.
`The protocol numbers used are well
`documented
`[IANA1], with TCP being
`protocol 6, and UDP being 17. TCP, and
`UDP traffic also define (16 bit) source and
`destination port numbers intented (in part) to
`for use by different applications. The port
`numbers are divided into three ranges: the
`Well Known Ports (0-1023), the Registered
`Ports (1024-49,151), and the Dynamic and/or
`Private ports (49,152-65,535).
` A typical TCP connection starts with a
`SYN/ACK handshake from a client to a
`server. The client addresses its initial SYN
`packet to the server port for a particular
`application, and uses a dynamic port as the
`source port for the SYN. The server listens on
`its port for connection. UDP uses ports
`similarly though without connections. All
`future packets in the TCP/UDP flow use the
`same pair of ports at the client and server
`ends. Therefore, in principle the server port
`number can be used to identify the higher
`
`
`3
`
`layer application using TCP or UDP, by
`simply identifying which port is the server
`port (the one from
`the well-known, or
`registered port range) and mapping this to an
`application using the IANA list of registered
`port [IANA2].
`
`to
`there are many barriers
` However
`determining applications from port numbers:
`1. many implementations of TCP seem to
`use registered port ranges as dynamic
`ports ,
`
`2. priveledged applications may use
`dynamic port numbers inside the well-
`known port range (for instance some
`old versions of bind use source and
`destination port 53).,
`
`3. well known and registered ports are
`not defined for all applications (and
`this is typical of P2P applications).
`
`4. an application may use ports other
`than its well-known port because these
`can only be used with
`special
`priveledges, e.g. WWW servers often
`run on ports other than port 80, for
`instance ports 8080, and 8888.
`
`5. an application may run on different
`ports to avoid blocking by firewalls.
`(e.g.
`non-WWW
`servers
`are
`sometimes run on port 80 to avoid
`firewalls, and P2P applications are
`often run on alternate ports for the
`same reason).
`
`6. There are some ambiguities in port
`registrations, e.g. port 888 which is
`used
`for CDDBP (CD Database
`Protocol) and accessbuilder .
`
`7. in some cases server ports are
`dynamically allocated as needed (for
`instance, one might have a control
`
`

`

`connection on which a data port is
`negotiated).
`
`8. trojans and other security attacks (e.g.
`DoS) will break the port mapping.
` Note that the use of firewalls to block
`unauthorized, and/or unknown applications
`from using a network has spawned work
`arounds that have made the mapping from
`port number to application ambiguous.
`
` Despite this a great deal can be said about
`the mapping of port to application, though
`obviously there will still be some ambiguity,
`and chance for errors. Note that both ports
`must be considered as possible candidates for
`the server port, unless other data is available
`to rule out one port.
`
` The algorithm that we have adopted here
`chooses the server port by (1) looking for a
`well known port, (2) a registered port, or (3)
`an unregistered port which is known (from
`reverse engineering of protocols) to be used
`by a particular (unregistered) application. If
`both source and destination port could be the
`server, then we choose the most likely one
`through ranking applications by how prevalent
`they are in detailed (packet level) traffic
`studies – for instance, WWW is considered a
`high ranking application, as are email, and
`P2P applications.
`
` The result is a mapping from flows to
`applications, that while not perfect, has been
`shown to be reasonably effective. The biggest
`problem is that there are still a substantial
`number of flows which cannot be mapped to
`an application. We further classify these
`unknown flows by the size of the flows: the
`category of most interest here is “TCP-big”,
`which consists of unknown flows that transmit
`more than 100kB in less than 30 minutes.
`
` We shall argue in this paper that the TCP-
`big traffic is primarily P2P traffic that is using
`
`
`4
`
`to us. P2P
`unregistered ports unknown
`applications already use unregistered ports,
`and the struture of P2P protocols (with
`separate control and data traffic) allows data
`traffic to be assigned to arbitrary ports. In the
`past the major applications have typically used
`default ports (for instance 1214 for KaZaa)
`but in the recent past many efforts have been
`made to constrain P2P traffic through rate
`limiting single ports or by blocking some
`ports at firewalls, with the result that P2P
`users commonly use work-arounds. Where-
`ever we refer to P2P traffic we are using the
`traffic on the ports known to be directly
`associated with P2P applications: we shall
`keep this separate from TCP-big except where
`explicitly noted. Also note that some P2P
`traffic may be misclassified
`into other
`application classes (for instance WWW), and
`so our estimates of the total volumes of P2P
`traffic are conservative.
`
` We should note that we are not collecting
`any information about URL’s, or individual
`subscribers usage: IP addresses measured are
`not related to individual subscribers, and we
`only view the bulk properties of the traffic,
`such as its distributions.
`
`
`APPLICATION COMPOSITION
`
`
`Overview
`
`traffic
`the application
` Table 1 shows
`composition for 2 MSOs in May 2002 and
`January 2003. For each MSO, we examine
`both the traffic coming from outside the MSO
`to some IP address within the MSO (referred
`to as IN) and the traffic sourced within the
`MSO and destined for outside the MSO
`(OUT). For each time period, MSO, we
`display the per-application traffic volume in
`each direction as a percentage of the total
`traffic
`in
`that direction. For a given
`application we
`also
`show
`the
`traffic
`normalized by dividing by its IN traffic
`
`

`

`volume for May 2002, in order to show the
`IN/Out ratio, and the growth between the two
`periods.
`
` We note that in either direction, for both
`MSOs, the P2P traffic forms a much smaller
`percentage of the overall traffic in January
`2003 than in May 2002. TCP-big registered
`dramatic increases in traffic contribution in
`MSO X
`Applicationx Mix (percentage)
`January 2003
`May 2002
`OUT
`OUT
`IN
`IN
`100.0% 100.0% 100.0% 100.0%
`0.4%
`0.5%
`0.6%
`0.5%
`4.4%
`3.7%
`5.7%
`4.5%
`8.9% 10.5% 47.5% 32.5%
`0.2%
`1.6%
`0.2%
`1.6%
`0.7%
`1.3%
`1.0%
`1.7%
`1.0%
`1.3%
`1.0%
`0.7%
`1.6%
`1.2%
`3.6%
`2.5%
`1.7%
`0.6%
`1.1%
`0.7%
`0.3%
`7.3%
`0.2%
`5.3%
`75.2% 45.6% 32.9% 20.6%
`5.6% 26.4%
`6.2% 29.4%
`
`All
`ESP/GRE
`OTHER
`TCP-BIG
`AUDIO/VIDEO
`CHAT
`FTP
`GAMES
`MAIL
`NEWS
`P2P
`WEB
`
`1
`1
`1
`1
`1
`1
`1
`1
`1
`1
`1
`1
`
`both directions (10.5 times for Outgoing and
`6.02 times for Incoming) over the same
`period. The normalized figures show that the
`P2P incoming and outgoing traffic are very
`similar for either of the 2 months considered.
`For example for MSO X, the ratio between
`incoming and outgoing TCP-big
`traffic
`volumes changes from 1.94:1 in May 2002 to
`a more balanced 1.12:1 in January 2003.
`MSO Y
`Applicationx Mix (percentage)
`Normalized Consumption
`May 2002
`January 2003
`May 2002
`January 2003
`OUT
`IN
`OUT
`OUT
`OUT
`IN
`IN
`IN
`1.65
`1.97
`3.2 100.0% 100.0% 100.0% 100.0%
`1.98
`3.12
`4.3
`0.4%
`0.5%
`0.3%
`0.4%
`1.37
`2.54
`3.23
`4.6%
`3.2%
`5.4%
`3.4%
`1.94
`10.5
`11.68
`9.5% 11.8% 45.3% 32.1%
`16.61
`2.77
`32.64
`0.1%
`1.5%
`0.2%
`1.5%
`3.08
`2.93
`7.93
`0.7%
`1.2%
`0.7%
`1.4%
`2.22
`1.91
`2.4
`1.4%
`1.4%
`0.4%
`0.9%
`1.29
`4.54
`5.15
`1.3%
`1.2%
`3.4%
`2.4%
`0.6
`1.26
`1.28
`1.0%
`0.5%
`0.9%
`0.5%
`38.52
`1.51
`54.55
`0.7% 17.5%
`0.7% 14.6%
`1
`0.86
`0.87
`75.1% 38.5% 36.7% 19.5%
`7.8
`2.2
`16.88
`5.2% 22.8%
`5.9% 23.5%
`
`Normalized Consumption
`May 2002
`January 2003
`OUT
`IN
`OUT
`IN
`2.19
`1.83
`4.08
`2.71
`1.7
`4.67
`1.53
`2.16
`2.97
`2.71
`8.71
`13.72
`23.71
`3.1
`44.29
`3.81
`2.02
`8.67
`2.24
`0.56
`2.64
`1.92
`4.73
`7.43
`1.13
`1.71
`1.88
`54.99
`1.76
`85.33
`1.12
`0.9
`1.06
`9.53
`2.06
`18.27
`
`1
`1
`1
`1
`1
`1
`1
`1
`1
`1
`1
`1
`
`
`
`Outbound P2P
`Inbound P2P
`Outbound Web
`Inbound Web
`Outbound TCP-big
`Inbound TCP-big
`
`Table 1: Application Composition of two MSOs in May 2002 and January 2003.
`
`Time of Day Pattern
`
` We next examine the diurnal behavior of
`P2P traffic. Figure 1 plots the time series of
`the incoming and outgoing traffic volumes
`(P2P, web and TCP-big) for a given MSO
`across a week in February 2003. For each
`application, all the data values are normalized
`by the mean per-hour incoming data volume
`for that application, averaged across that
`week.
`
`1.8
`
`1.6
`
`1.4
`
`1.2
`
`1
`
`0.8
`
`0.6
`
`0.4
`
`0.2
`
`0
`
`
`
`2/9/2003 18:00
`2/9/2003 9:00
`2/9/2003 0:00
`2/8/2003 15:00
`2/8/2003 6:00
`2/7/2003 21:00
`2/7/2003 12:00
`2/7/2003 3:00
`2/6/2003 18:00
`2/6/2003 9:00
`2/6/2003 0:00
`2/5/2003 15:00
`2/5/2003 6:00
`2/4/2003 21:00
`2/4/2003 12:00
`2/4/2003 3:00
`2/3/2003 18:00
`2/3/2003 9:00
`2/3/2003 0:00
`
`Figure 1: Time od day pattern of P2P and Web traffic.
`
`
`three applications exhibit similar
` All
`diurnal behaviors with peak loads (in either
`direction) around 2.00 AM GMT (10.00 PM
`EST, 7.00 PM PST). The P2P traffic exhibits
`less variability across a day than Web traffic.
`The peak load is about 2 times the minimum
`as opposed to 5 times for Web traffic. The
`smaller variance in P2P traffic across a day
`
`
`
`5
`
`

`

` The gravity model can be used to make
`predictions of the traffic volumes between two
`regions based purely on the volumes entering
`and exiting at those two regions, by the
`formula
`T
`
`
`where T is the total volume of traffic across
`the network, T is the traffic entering the
`S
`in
`network at region S, and T is the traffic
`D
`out
`exiting the network at region D. Figure 2
`below shows a comparison of the gravity
`model predictions for inter-regional traffic on
`one cable company. The plot is based on
`netflow traffic collected (from the May time
`interval where we have data across a wider
`spread of regions and MSOs) above the
`regional aggregation routers, and therefore
`shows traffic traversing the backbone between
`regions. The figure shows a scatter plot of the
`real inter-regional traffic versus the gravity
`model prediction, for both P2P traffic, and the
`total traffic to the cable company. On can see
`that in both cases the gravity model predicts
`the true traffic within about ±20%.
`
` What does that tell us? Well the main point
`is that the gravity model above explicitly
`excludes any notion of geographic, or
`topological distance. Therefore, as
`the
`measured traffic fits this model to some
`extent, we may believe that neither P2P traffic
`nor the traffic overall exhibit strong locality at
`the regional
`level. A further, somewhat
`subjective conlusion one might drawn from
`the graph is that P2P traffic actually seems to
`fit the gravity model slightly worse, and so we
`may hypothesize that P2P traffic shows more
`locality than other traffic sources.
`
`
`=,
`DS
`
`may be a function of the programmed
`download feature in P2P applications that
`allow users
`to specify multiple files
`in
`advance,
`that
`can
`be
`downloaded
`asynchronously by the P2P application.
`
`is
`traffic
`the outgoing
` For Web,
`significantly smaller than (atmost 20% of) the
`incoming traffic, suggesting that the MSOs
`clients are mostly consumers of web data. In
`contrast, for P2P, the traffic in the 2 directions
`track each other much more closely, across a
`day and across the week. Another notable here
`is that the TCP-big traffic distribution across
`time is very similar to the P2P traffic. Also,
`just like P2P, the TCP-big traffic in the 2
`directions are similar. These behavorial
`similarities are another indicatior that the
`TCP-big
`
`traffic
`includes some
`
` P2P
`applications. Finally for all 3 applications,
`we do not see significant variations across
`days and beween weekdays and weekends.
`
`
`
`
`
` One of the potential advantages of P2P
`applications is that by distributing content,
`they provide the ability to download this
`content from locations closer to a user. It is
`therefore interesting to consider whether this
`really happens, and moreover to consider the
`question of locality in P2P traffic in general.
`
` We approach this question by considering
`the simplest possible counter examples to
`localized traffic: the simple gravity model [?].
`In this model, a packet entering the network at
`S, makes its decision about its destination D
`independent of the arrival point. That is, the
`packet
`is drawn (as
`if by gravity)
`to
`destinations in proportion to the volume of
`traffic departing at those locations.
`
`
` P2P LOCALITY
`
`
`
`6
`
`TT
`
`D
`out
`
`T
`S
`in
`
`

`

`From/To R1 (PST) R2 (PST) R3 (MST) R4 (MST) R5 (CST) R6 (CST) R7 (EST) R8 (EST)
`0.18
`0.14
`0.126
`0.174
`0.128
`0.124
`0.127
`R1 (PST) -
`0.172 -
`R2 (PST)
`0.141
`0.126
`0.19
`0.132
`0.118
`0.12
`0.132
`0.12 -
`R3 (MST)
`0.189
`0.135
`0.145
`0.139
`0.14
`0.107
`0.111
`0.182 -
`R4 (MST)
`0.124
`0.163
`0.155
`0.158
`0.161
`0.18
`0.136
`0.132 -
`R5 (CST)
`0.135
`0.127
`0.129
`0.107
`0.108
`0.145
`0.155
`0.125 -
`R6 (CST)
`0.187
`0.173
`0.107
`0.106
`0.137
`0.157
`0.127
`0.182 -
`R7 (EST)
`0.184
`
`0.109
`0.111
`0.127
`0.161
`0.128
`0.178
`0.185 -
`R8 (EST)
`Table 2: Normalized inter-regional traffic matrix of MSO X weighted
`by P2P+TCP-big traffic (Longitude defined by the Timezone).
` This super-regional locality could arise for
`a couple of
`reasons
`(other
`than P2P
`applications explicity taking advantage of
`content locality to improve performance).
`Firstly, because of usage patterns (specifically
`the times at which a user is connected to the
`P2P network), there is a slight increase in the
`likelihood that a search will find content in a
`local time zone. Secondly, there may be a
`group of people within a super-region with
`content that is slightly more relevant to the
`local super-region. However, the data so far
`suggests that both of these effects are not
`dominant, and certainly there is no strong
`locality influence such as might be seen if the
`main P2P applications exploited
`locality
`information.
`
`the
`the above examples
` In both of
`monitoring
`location
`(above
`the
`regional
`aggregation router) limits our data to seeing
`only inter-regional traffic. Thus, one might
`argue, we are missing the key component in
`any study of traffic locality: the intra-regional
`traffic.
`
` While the data limitations prevent us from
`seeing the intra-regional traffic on a single
`cable company, we can gain a good view of
`this data by considering the traffic between
`cable companies. If
`locality were being
`exploited in P2P applications, then one would
`expect traffic from company Y, region R to
`prefer going to company X, region R, rather
`than the alternative regions.
`
` Table 3 shows an example, giving the
`normalized probabilies
`that
`traffic
`from
`company Y to X will go from regions M to R.
`Although the regions for the two companies
`
`
`Figure 2: Comparison of the real matrix elements to the estimated
`traffic matrix elements for one MSO. The circles represent purely
`P2P traffic and crosses represents the total traffic. The blue solid
`diagonal line shows equality and the green dashed lines show ± 20%.
`
` To examine these hypothesis in more
`details we present Table 2, which shows the
`normalized traffic volumes between regions
`for the P2P traffic. The table shows the
`normalized probability that traffic originating
`from a particular
`region
`in one cable
`company, will depart from each region in the
`same cable company (given it stays on the
`same cable companies network). Table 2 can
`be seen to have a number of almost identical
`rows (for instance the group of regions R1,
`R2, and R5 are very similar, as is the group
`R6, R7 and R8) indicating a complete lack of
`locality of traffic with reference to these
`regions. Other regions (specifically R3 and
`R4) are not dramatically far away, but rather
`fall somewhere in between the other two
`groups.
`table also shows some
`the
` However
`disparity between the groups of rows. This
`disparity is at its height when comparing the
`regions in the Eastern Standard Timezone
`(EST), with those in the Pacific Timezone
`(PST). This is an indication of some degree
`of weak locality in P2P traffic, at the “super-
`regional” level.
`
`
`
`
`7
`
`

`

`are slightly different,. Regions M3 and R7 are
`very closely matched as are M4 and R8.
`However, we see only very minor bias
`towards traffic from M3 to R7 (compared to
`other EST regions), and similarly from M4 to
`R8.
`
`
`From / To R1 (PST) R2 (PST) R3 (MST) R4 (MST) R5 (CST) R6 (CST) R7 (EST) R8 (EST)
`M1 (MST)
`0.133
`0.121
`0.157
`0.125
`0.118
`0.111
`0.089
`0.146
`M2 (CST)
`0.121
`0.095
`0.114
`0.158
`0.117
`0.145
`0.094
`0.156
`M3 (EST)
`0.12
`0.114
`0.12
`0.138
`0.119
`0.128
`0.14
`0.122
`M4 (EST)
`0.11
`0.115
`0.109
`0.137
`0.135
`0.119
`0.133
`0.142
`0.129
`M5 (EST)
`0.117
`0.115
`0.133
`0.135
`0.129
`0.12
`0.121
`Table 3: Normalized traffic matrix from MSO Y to MSO X weighted
`by P2P+TCP-big traffic.
` Our conclusion is that, although there is
`some evidence for weak locality at a large
`spatial scale, P2P applications do not yet
`exploit such information on a large scale, and
`consequently, P2P traffic does not show
`strong signs of geographic locality. More
`recent developments of Kazaa provide
`methods for selected the super-node to which
`one connects, and so more locality may be
`introduced in the future. (Subho needs to fix
`this line)
`
`
`
` It is well known in the cable industry that
`some heavy hitters consume most of the
`bandwidth. We shall divide subscribers into
`classes by their total usage, and analyze their
`consumption characteristics such as
`the
`application composition and
`the
`traffic
`balance per class. We define three groups of
`users: the heavy users who consume more
`than 1 Gbytes/day in average over a week, the
`medium users who consume between 50
`Mbytes/Day and 1 Gbytes/Day and the light
`users who consume less than 50 Mbytes/Day.
`
`User Distribution
`
` We first compare the distribution of traffic
`per subscriber. In order to see if there are
`consistent patterns we compare two regions of
`
`HEAVY HITTERS AND P2P
`
`
`
`8
`
`one MSO with a region from another MSO,
`all at two different points in time: during the
`week ending June 26th 2002 and during the
`week ending February 9th 2003. In order not to
`bias the results, we choose two MSOs that are
`not multi-homed and regions that have a
`decent size, i.e. between 25,000 subscribers
`and 140,000 subscribers. By subscriber, we
`mean an active IP address. Even though the IP
`address is not statically assigned (the user
`obtains an IP automatically via DHCP), in the
`networks we examined it is “sticky”. That is,
`over a week a subscriber maintains the same
`IP address in practice, because the DHCP
`lease expires only after 4 days and it is
`reassigned to him if it is still available.
`However, the IP address distribution doesn’t
`reflect exactly the subscriber distribution since
`it misses the inactive subscribers and the
`subscribers with a very low usage that may
`not be sampled. For instance, for a given
`region, we
`identified 107,000 unique IP
`addresses whereas the MSO was claiming that
`there were 115,000 subscribers, i.e. a 7.5%
`difference.
` The six distributions in Figure 3 and 4 are
`quite consistent; the two most different
`distributions being the ones belonging to
`different MSOs. In each case, the top 1% of
`the IP addresses account for 18.6—24.4% of
`the total traffic and the top 20% of the active
`IP addresses account for slightly more than
`80% of the traffic. For one MSO the average
`total consumption – the sum of IN and OUT
`traffic – went from 12.5 kbps per IP address in
`June to 13.3 kbps in February in one region,
`and 12.2 kbps to 13.5 kbps for the other
`region. The total consumption of the second
`MSO remainded stable at 14 kbps per unique
`IP address. For all these regions, the median
`consumption was only between 2 and 3 kbps,
`showing that the distribution was strongly
`skewed.
`
`

`

`
`Figure 3: Consumption per percentile of IP addresses of two regions
`of MSO X and one region of MSO Y during a week in June 2002
`and a week in Februray 2003. The mean consumptions are around
`140 Mbytes/Day/IP and the medians are roughly 30 Mbytes/Day/IP.
`
`
`Figure 4: Cumulative Consumption of two regions of MSO X and
`one region of MSO Y during a week in June 2002 and a week in
`Februray 2003.
`
`
`
`User Type
`Direction
`Normalized Traffic per Sub
`AUDIO/VIDEO
`CHAT
`NEWS
`MAIL
`FTP
`GAMES
`ESP/GRE
`P2P
`TCP-BIG
`WEB
`OTHER
`
`Week ending February 9th 2003
`Week ending June 26th 2002
`Heavy
`Heavy Medium Light
`Light
`Medium
`Heavy Medium Light
`Heavy
`Light
`Medium
`IN/OUT IN/OUT IN/OUT
`IN
`OUT
`OUT
`IN
`IN
`IN/OUT IN/OUT IN/OUT OUT
`IN
`OUT
`OUT
`IN
`IN
`OUT
`1.4
`1.8
`4.8
`5.2
`1.1
`26.1
`47.8
`415.1
`1.7
`1.8
`4.8
`288.3
`4.8
`1.0
`27.0
`48.9
`445.5
`266.8
`4.9
`17.3
`28.4
`2.6%
`0.4%
`0.2%
`2.2%
`0.5%
`3.2
`26.4
`29.8
`0.1%
`0.4% 2.7%
`0.1%
`1.9%
`0.3%
`0.1%
`3.0
`3.0
`4.1
`2.3%
`2.6%
`0.7%
`1.2%
`0.6%
`3.2
`2.4
`3.4
`0.3%
`2.9% 2.0%
`0.6%
`0.8%
`0.4%
`0.2%
`49.6
`46.6
`46.2
`1.4%
`0.1%
`0.4% 10.5%
`53.6
`54.1
`55.1
`1.0% 32.8%
`0.2% 2.1%
`0.5% 13.5%
`1.1% 34.9%
`2.7
`0.9
`1.6
`2.7%
`8.1%
`1.3%
`0.7%
`0.5
`0.5
`1.4
`0.1%
`0.3%
`8.3% 2.3%
`1.5%
`0.4%
`0.4%
`0.1%
`1.4
`2.8
`1.9
`0.2%
`0.6%
`0.5%
`0.8%
`2.2
`3.5
`1.7
`0.8%
`0.7%
`0.8% 0.3%
`0.6%
`1.1%
`0.7%
`0.9%
`0.8
`1.2
`1.7
`1.0%
`2.9%
`4.1%
`2.7%
`2.0
`1.7
`1.7
`3.3%
`1.9%
`2.8% 1.0%
`1.5%
`1.5%
`0.4%
`0.5%
`5.6
`2.5
`2.5
`3.1%
`6.0%
`1.0%
`1.4%
`6.9
`3.0
`2.6
`0.1%
`0.3%
`5.3% 2.8%
`0.7%
`1.1%
`0.0%
`0.2%
`0.9
`0.9
`1.6
`2.3%
`7.0%
`0.8
`1.0
`1.8
`37.7% 22.9% 29.5% 14.0%
`87.4% 44.0% 82.3% 43.2% 18.5% 6.8%
`0.9
`1.1
`2.5
`6.8%
`2.0
`3.4
`5.1
`51.2% 30.5% 47.6% 29.3% 13.1%
`6.9%
`8.4%
`3.3%
`6.3%
`2.4% 2.5%
`5.7
`9.0
`7.5
`10.1
`9.5
`7.5
`1.6%
`6.5%
`6.4% 31.5% 46.7% 72.3%
`0.9%
`5.3%
`5.1% 26.6% 46.2% 71.6%
`1.1
`1.3
`2.1
`4.3
`1.7
`2.3
`3.9%
`3.1%
`8.2%
`5.8% 12.5%
`5.3%
`2.0%
`5.1%
`4.0%
`3.7% 12.2% 5.7%
`
`
`
`Table 4: Comparison of the application composition of the heavy, medium and light users of a region having more than 100 000 subscribers.
`lightly using one of these applications and
`Consumption Characteristics
`they generate less than 2% of the total traffic
`
`of these applications.
` Since the median consumption is 4 to 5
`times smaller than the average consumption, it
`
`is clear that the average consumption doesn’t
`reflect
`the behaviour of most of
`the
`subscribers. This still holds if we compare the
`application composition of each group of
`users, as defined earlier, with the average
`application composition that were studied
`earlier in this paper. Indeed, in a close look at
`one of these regions Table 4 shows that the
`light users (67% of the IP addresses) are still
`mainly browsing the web, exchanging e-mail
`and chatting online. Their traffic balance – the
`IN/OUT ratio – is 4.8, which is far from the
`that of the heavy and medium users at 1.4-1.7
`and 1.8, respectively. Table 5 makes it clear
`that they are not familiar with P2P or News
`since only 12.6 % of these light users are
`
`
`
`Table 5: P2P and News Users in a region having more than 100 000
`subscribers.
` On the other hand the heavy users are
`mainly generating file sharing traffic. Those
`who are using the popular P2P applications
`are now becoming a new type of content
`provider since their P2P traffic balance is
`below 1. Eventhough that subscriber group
`accounts for only 2.9% of the subscriber
`population, it generates almost half of the P2P
`
`Direction
`User Class
`IP address Percentage
`Traffic Percentage
`NEWS
`P2P
`TCP-BIG
`WEB
`P2P Users in that Class
`News Users in that Class
`News or P2P Users
`
`Week ending June 26th 2002
`Outbound
`Inbound
`Heavy Medium Light Heavy Medium Light
`2.9% 30.1% 67.0% 2.9% 30.1% 67.0%
`46.6% 49.4%
`4.1% 41.6% 47.9% 10.5%
`68.6% 30.4%
`1.0% 68.4% 30.5%
`1.1%
`49.6% 49.5%
`0.9% 46.2% 52.1%
`1.8%
`64.9% 33.1%
`2.0% 51.5% 44.5%
`4.0%
`8.5% 52.2% 39.3%
`9.8% 56.6% 33.6%
`83.6% 63.4% 10.1% 83.6% 63.4% 10.1%
`25.8% 12.4%
`2.6% 25.8% 12.4%
`2.6%
`96.7% 71.6% 12.6% 96.7% 71.6% 12.6%
`
`9
`
`

`

`However, P2P applications have evolved
`rapidly in a direction which makes accurate
`accounting of the traffic more difficult. In
`particular, previously the applications used
`default TCP ports, and it was possible to
`account for the bulk of the P2P traffic by
`monitoring a relatively small number of ports.
`However, the current widespread use port-
`hopping makes such mapping exceedingly
`impractical. We next present specific evidence
`of this trend and then discuss the implications
`for managing this traffic.
`
`Kazaa Rate limiting Experiment
`
`
`Traffic to Region X of MSO Y (MBytes / Day / Subs)
`
`Week Ending 07/28, Before
`
`Week Ending 08/18, One Week
`Later
`Week Ending 09/08, One
`Month Later
`Week Ending 10/25, Two
`Months Later
`
`45
`
`40
`
`35
`
`30
`
`25
`
`20
`
`15
`
`10
`
`05
`
`
`
`web
`
`p2p
`
`Application
`
`TCP-big
`
`Figure 5: Mutation of P2P traffic into TCP-big traffic.
` We first show an interesting case study
`which

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket