IPR2021-00909, No. 1047-47 Exhibit - Ex 1047 Karagiannis (P.T.A.B. May. 7, 2021)

Transport Layer Identiﬁcation of P2P Trafﬁc
`
`Thomas Karagiannis
`UC Riverside
`
`Andre Broido
`CAIDA, SDSC
`
`Michalis Faloutsos
`UC Riverside
`
`Kc claffy
`CAIDA, SDSC
`
`ABSTRACT
`Since the emergence of peer-to-peer (P2P) networking in the
`late ’90s, P2P applications have multiplied, evolved and es-
`tablished themselves as the leading ‘growth app’ of Internet
`traﬃc workload.
`In contrast to ﬁrst-generation P2P net-
`works which used well-deﬁned port numbers, current P2P
`applications have the ability to disguise their existence through
`the use of arbitrary ports. As a result, reliable estimates of
`P2P traﬃc require examination of packet payload, a method-
`ological landmine from legal, privacy, technical, logistic, and
`ﬁscal perspectives. Indeed, access to user payload is often
`rendered impossible by one of these factors, inhibiting trust-
`worthy estimation of P2P traﬃc growth and dynamics. In
`this paper, we develop a systematic methodology to identify
`P2P ﬂows at the transport layer, i.e., based on connection
`patterns of P2P networks, and without relying on packet
`payload. We believe our approach is the ﬁrst method for
`characterizing P2P traﬃc using only knowledge of network
`dynamics rather than any user payload. To evaluate our
`methodology, we also develop a payload technique for P2P
`traﬃc identiﬁcation, by reverse engineering and analyzing
`the nine most popular P2P protocols, and demonstrate its
`eﬃcacy with the discovery of P2P protocols in our traces
`that were previously unknown to us. Finally, our results
`indicate that P2P traﬃc continues to grow unabatedly, con-
`trary to reports in the popular media.
`
`Categories and Subject Descriptors
`C.2.5 [Computer-Communication Networks]: Local and Wide-
`Area Networks
`
`General Terms
`Algorithms, Measurement
`
`Keywords
`Peer-to-peer, Measurements, Traﬃc classiﬁcation
`
`Permission to make digital or hard copies of all or part of this work for
`personal or classroom use is granted without fee provided that copies are
`not made or distributed for proﬁt or commercial advantage and that copies
`bear this notice and the full citation on the ﬁrst page. To copy otherwise, to
`republish, to post on servers or to redistribute to lists, requires prior speciﬁc
`permission and/or a fee.
`IMC’04, October 25–27, 2004, Taormina, Sicily, Italy.
`Copyright 2004 ACM 1-58113-821-0/04/0010 ...$5.00.
`
`1.
`
`INTRODUCTION
`Over the last few years, peer-to-peer (P2P) ﬁle-sharing
`has relentlessly grown to represent a formidable component
`of Internet traﬃc. P2P volume is suﬃciently dominant on
`some links to incent increased local peering among Inter-
`net Service Providers [25], to observable yet unquantiﬁed
`eﬀect on the global Internet topology and routing system
`not to mention competitive market dynamics. Despite this
`dramatic growth, reliable proﬁling of P2P traﬃc remains
`elusive. We no longer enjoy the ﬂeeting beneﬁt of ﬁrst-
`generation P2P traﬃc, which was relatively easily classi-
`ﬁed due to its use of well-deﬁned port numbers. Current
`P2P networks tend to intentionally disguise their generated
`traﬃc to circumvent both ﬁltering ﬁrewalls as well as legal
`issues most emphatically articulated by the Recording In-
`dustry Association of America (RIAA). Not only do most
`P2P networks now operate on top of nonstandard, custom-
`designed proprietary protocols, but also current P2P clients
`can easily operate on any port number, even HTTP’s port
`80.
`These circumstances portend a frustrating conclusion: ro-
`bust identiﬁcation of P2P traﬃc is only possible by examin-
`ing user payload. Yet packet payload capture and analysis
`poses a set of often insurmountable methodological land-
`mines:
`legal, privacy, technical, logistic, and ﬁnancial ob-
`stacles abound, and overcoming them leaves the task of re-
`verse engineering a growing number of poorly documented
`P2P protocols. Further obfuscating workload characteriza-
`tion attempts is the increasing tendency of P2P protocols
`to support payload encryption. Indeed, the frequency with
`which P2P protocols are introduced and/or upgraded ren-
`ders packet payload analysis not only impractical but also
`glaringly ineﬃcient.
`In this paper we develop a systematic methodology to
`identify P2P ﬂows at the transport layer, i.e., based on ﬂow
`connection patterns of P2P traﬃc, and without relying on
`packet payload. The signiﬁcance of our algorithm lies in its
`ability to identify P2P protocols without depending on their
`underlying format, which oﬀers a distinct advantage over
`payload analysis: we can identify previously unknown P2P
`protocols. In fact during our analysis we detected traﬃc of
`three distinct P2P protocols previously unknown to us. To
`validate our methodology we also developed a payload-based
`technique for P2P traﬃc identiﬁcation, by reverse engineer-
`ing and analyzing the nine most popular P2P protocols.
`Speciﬁcally, the highlights of our paper include:
`• We develop a systematic methodology for P2P traﬃc
`proﬁling by identifying ﬂow patterns and character-
`
`121
`
`Cloudflare - Exhibit 1047, page 121
`
`

`Day
`Date
`Set Bb
`D09N 2 2003-05-07 Wed
`D09S 2 2003-05-07 Wed
`D10N 2 2004-01-22 Thu
`D10S 2 2004-01-22 Thu
`D11S 2 2004-02-25 Wed
`D13N 2 2004-04-21 Wed
`D13S 2 2004-04-21 Wed
`
`Table 1: Bulk sizes of OC-48 datasets
`Aver.Util. Ut.%
`Dur
`Dir
`Src.IP
`Dst.IP
`Flows Packets Bytes
`Start
`651 Mbps
`26.2
`2 h Nbd (1)
`904 K 2992 K 56.7 M 930.4 M 603 G
`10:00
`376 Mbps
`15.1
`2 h
`Sbd (0)
`466 K 2527 K 47.3 M 624.2 M 340 G
`10:00
`60 m Nbd (1)
`812 K 2181 K 23.6 M 412.7 M 288 G 638.9 Mbps
`25.7
`14:00
`60 m Sbd (0)
`279 K 4177 K 18.6 M 252.7 M 117 G 260.4 Mbps
`10.5
`14:00
`2 h
`Sbd (0)
`410 K 7465 K 25.3 M 249.6 M 98.5 G 109.4 Mbps
`4.4
`10:00
`20:00 122 m Nbd (1)
`1971 K 6956 K 86.4 M 1263 M 852 G 930.6 Mbps
`37.4
`20:00 122 m Sbd (0)
`306 K 10847 K 27.8 M 266.4 M 106 G 115.5 Mbps
`4.6
`
`istics of P2P behavior, without examination of user
`payload.
`• Our methodology eﬀectively identiﬁes 99% of P2P ﬂows
`and more than 95% of P2P bytes (compared to pay-
`load analysis), while limiting false positives to under
`10%.
`• Our methodology is capable of identifying P2P ﬂows
`missed by payload analysis. Using our methodology
`we identify approximately 10% additional P2P ﬂows
`over payload analysis.
`• Using data collected at an OC48 (2.5Gbps) link of a
`Tier1 Internet Service Provider (ISP), we provide re-
`alistic estimates and trends of P2P traﬃc in the wide-
`area Internet over the last few years. We ﬁnd that in
`contrast to claims of a sharp decline, P2P traﬃc has
`been constantly growing.
`Our methodology can be expanded to support proﬁling of
`various types of traﬃc. Since mapping applications by port
`numbers is no longer substantially valid, a generalized ver-
`sion of our algorithm can support traﬃc characterization
`tasks beyond P2P workload. Indeed, to minimize false pos-
`itives in P2P traﬃc identiﬁcation, we assess, and then ﬁlter
`by, connection features of numerous protocols and applica-
`tions (such as mail or DNS).
`The rest of this paper is structured as follows: Section 2
`describes our backbone traces, which span from May 2003
`to April 2004. Section 3 discusses previous work in P2P
`traﬃc estimation and analysis. Sections 4 and 5 describe in
`detail our payload and nonpayload methodologies for P2P
`traﬃc identiﬁcation. Section 6 presents an evaluation of our
`algorithm by comparing the volume of P2P identiﬁed by
`our methods. In section 7 we challenge media claims that
`the pervasive litigation undertaken by the RIAA is causing
`an overall decline in P2P ﬁle-sharing activity. Section 8
`concludes our paper.
`
`2. DATA DESCRIPTION
`Part of the analyzed traces in this paper are included in
`CAIDA’s Backbone Data Kit (BDK) [1], consisting of packet
`traces captured at an OC-48 link of a Tier 1 US ISP connect-
`ing POPs from San Jose, California to Seattle, Washington.
`Table 1 lists general workload dimensions of our datasets:
`counts of distinct source and destination IP addresses and
`the numbers of ﬂows, packets, and bytes observed. We pro-
`cessed traces with CAIDA’s Coral Reef suite [20].
`We analyze traces taken on May 5, 2003 (D09), January
`22, 2004 (D10) February 25, 2004 (D11) and April 21,2004
`(D13). We captured the traces with Dag 4 monitors [14]
`and packet capture software from the University of Waikato
`and Endace [12] that supports observation of one or both
`directions of the link.
`For our older traces (D01-D10), our monitors captured
`44 bytes of each packet, which includes IP and TCP/UDP
`headers and an initial 4 bytes of payload for some packets.
`
`However, approximately 60%-80% of the packets in these
`traces are encapsulated with an extra 4-byte MPLS label
`which leaves no space for payload bytes.
`Fortunately we were able to capture the February and
`April 2004 traces (D11 and D13) with 16 bytes of TCP/UDP
`payload which allows us to evaluate our nonpayload method-
`ology. To protect privacy, our monitoring system anonymized
`the IP addresses in these traces using the Cryptography-
`based Preﬁx-preserving Anonymization algorithm (Crypto-
`PAn) [33].
`
`3. PREVIOUS WORK
`Most P2P traﬃc research has thus far emphasized detailed
`characterization of a small subset of P2P protocols and/or
`networks [19] [15], often motivated by the dominance of that
`protocol in a particular provider’s infrastructure or during
`a speciﬁc time period. Typical data sources range from aca-
`demic network connections [27], [21] to Tier 2 ISPs [22].
`Other P2P measurement studies have focused on topo-
`logical characteristics of P2P networks based on ﬂow level
`analysis [29], or investigating properties such as bottleneck
`bandwidths [27], the possibility of caching [22], or the avail-
`ability and retrieval of content [3] [13].
`Recently, Sen et al. developed a signature-based payload
`methodology [28] to identify P2P traﬃc. The authors focus
`on TCP signatures that characterize ﬁle downloads in ﬁve
`P2P protocols based on the examination of user payload.
`The methodology in [28] is similar to our payload analysis
`and it is further discussed in section 4.
`A number of Sprint studies [8] report on P2P traﬃc as
`observed in a major Tier 1 provider backbone. However,
`their volume estimates taxonomize applications based on
`ﬁxed port numbers from CoralReef’s database [23], which
`captures a small and decreasing fraction of p2p traﬃc.
`Our approach diﬀers from previous work in three ways:
`• We analyze traﬃc sources of exceptionally high diver-
`sity, from major Tier 1 ISPs at the Internet core.
`• We study all popular P2P applications available: Nei-
`ther of our methodologies (payload and nonpayload)
`are limited to a subset of P2P networks. On the con-
`trary we study those P2P applications that currently
`contribute the vast majority of P2P traﬃc.
`• We combine and cross-validate identiﬁcation methods
`that use ﬁxed ports, payload, and transport layer dy-
`namics.
`
`4. PAYLOAD ANALYSIS OF P2P TRAFFIC
`AND LIMITATIONS
`Our payload analysis of P2P traﬃc is based on identify-
`ing characteristic bit strings in packet payload that poten-
`tially represent control traﬃc of P2P protocols. We mon-
`itor the nine most popular P2P protocols: eDonkey [10]
`
`122
`
`Cloudflare - Exhibit 1047, page 122
`
`

`(also includes the Overnet and eMule [11] networks), Fast-
`track which is supported by the Kazaa client, BitTorrent [4],
`OpenNap and WinMx [32], Gnutella, MP2P [24], Soulseek [30],
`Ares [2] and Direct Connect [7].
`Each of these P2P networks operate on top of nonstan-
`dard, usually custom-designed proprietary protocols. Hence,
`payload identiﬁcation of P2P traﬃc requires separate anal-
`ysis of the various P2P protocols to identify the speciﬁc
`packet format used in each case. This section describes lim-
`itations that inhibit accurate identiﬁcation of P2P traﬃc at
`the link level. In addition, we present our methodology to
`identify P2P ﬂows.
`4.1 Limitations
`We had to carefully consider several issues throughout our
`study. While some of these restrictions are data related, oth-
`ers originate from the nature of P2P protocols. Speciﬁcally,
`these limitations are the following:
`Captured payload size: CAIDA monitors capture the
`ﬁrst 16 bytes of user payload1 of each packet (see section 2)
`for our February and April traces. While our payload heuris-
`tics would be capable of eﬀectively identifying all P2P pack-
`ets if the whole payload were available, this 16-byte payload
`restriction limits the number of heuristics that can reliably
`pinpoint P2P ﬂows. Furthermore, our older traces (May
`2003, January 2004) only contain 4 bytes of payload for a
`limited number of packets, since our monitors were used to
`capture 44 bytes for each packet (e.g., TCP options will push
`payload bytes out of the captured segment. Limitations for
`our older traces are described in detail in section 7).
`
`HTTP requests: Several P2P protocols use HTTP re-
`quests and responses to transfer ﬁles, and it can be impos-
`sible to distinguish such P2P traﬃc from typical web traﬃc
`given only 16 bytes of payload, e.g., “HTTP/1.1 206 Partial
`Content” could represent either HTTP or P2P .
`
`Encryption : An increasing number of P2P protocols rely
`on encryption and SSL to transmit packets and ﬁles. Pay-
`load string matching misses all P2P encrypted packets.
`
`Other P2P protocols: The widespread use of ﬁle-sharing
`and P2P applications yields a broad variety of P2P proto-
`cols. Thus our analysis of the top nine P2P protocols cannot
`guarantee identiﬁcation of all P2P ﬂows, especially given the
`diversity of the OC48 backbone link. However, our experi-
`ence with P2P applications and traﬃc analysis convinces
`us that these nine protocols represent the vast majority of
`current P2P traﬃc.
`
`Unidirectional traces: Some of our traces reﬂect only
`one direction of the monitored link. In these cases we cannot
`identify ﬂows that carry the TCP acknowledgment stream
`of a P2P download, since there is no payload. Even if we
`monitored both directions of the link, asymmetric routing
`renders it unlikely to ﬁnd both streams (data and acknowl-
`edgment) of a TCP ﬂow on the same link.
`We can overcome these limitations with our nonpayload
`methodology described in section 5.
`
`4.2 Methodology
`Our analysis is based on identifying speciﬁc bit strings
`in the application-level user data. Since documentation for
`1Privacy issues and agreement with the ISP prohibit the
`examination of more bytes of user payload.
`
`Table 2: Strings at the beginning of the payload of P2P
`protocols. The character “0x” below implies Hex strings.
`
`P2P Protocol
`eDonkey2000
`
`Fasttrack
`
`BitTorrent
`Gnutella
`
`MP2P
`Direct Connect
`
`Ares
`
`String
`0xe319010000
`0xc53f010000
`“Get /.hash”
`0x270000002980
`“0x13Bit”
`“GNUT”, “GIV”
`“GND”
`GO!!, MD5, SIZ0x20
`“$MyN”,”$Dir”
`“$SR”
`“GET hash:”
`“Get sha1:”
`
`Trans. prot. Def. ports
`TCP/UDP
`4661-4665
`
`TCP
`UDP
`TCP
`TCP
`UDP
`TCP
`TCP
`UDP
`TCP
`
`1214
`
`6881-6889
`6346-6347
`
`41170 UDP
`411-412
`
`-
`
`P2P protocols is generally poor, we empirically derived a set
`of distinctive bit strings for each case by monitoring both
`TCP and UDP traﬃc using tcpdump[31] after installing var-
`ious P2P clients. Table 2 lists a subset of these strings for
`some of the analyzed protocols for TCP and UDP. Table 2
`also presents the well-known ports for these P2P protocols.
`The complete list of bit strings we used is in [18].
`We classify packets into ﬂows, deﬁned by the 5-tuple source
`IP, destination IP, protocol, source port and destination
`port. We use the commonly accepted 64-second ﬂow time-
`out [6], i.e., if no packet arrives in a speciﬁc ﬂow for 64 sec-
`onds, the ﬂow expires. To address the limitations described
`in the previous section, we apply three diﬀerent methods to
`estimate P2P traﬃc, listed by increasing levels of aggres-
`siveness as to which ﬂows it classiﬁes as P2P :
`
`M1:
`If a source or destination port number of a ﬂow
`matches one of the well-known port numbers (Table 2) the
`ﬂow is ﬂagged as P2P.
`
`M2: We compare the payload (if any) of each packet in a
`ﬂow against our table of strings. In case of a match between
`the 16-byte payload of a packet and one of our bit strings,
`we ﬂag the ﬂow as P2P with the corresponding protocol,
`e.g., Fasttrack, eDonkey, etc. If none of the packets match,
`we classify the ﬂow as non-P2P.
`
`M3: If a ﬂow is ﬂagged as P2P, both source and destina-
`tion IP addresses of this ﬂow are hashed into a table. All
`ﬂows that contain an IP address in this table are ﬂagged
`as “possible P2P” even if there is no payload match. To
`avoid recursive misclassiﬁcation of non-P2P ﬂows as P2P,
`we perform this type of IP tracking only for host IPs that
`M2 identiﬁed as P2P .
`
`In all P2P networks, P2P clients maintain a large number
`of connections open even if there are no active ﬁle transfers.
`There is thus increased probability that a host identiﬁed as
`P2P from M2 will participate in other P2P ﬂows. These
`ﬂows will be ﬂagged as “possible P2P” in M3. On the other
`hand, a P2P user may be browsing the web or sending email
`while connected to a P2P network. Thus, to minimize false
`positives we exclude from M3 all ﬂows whose source or des-
`tination port implies web, mail, FTP, SSL, DNS (i.e., ports
`80, 8000, 8080, 25, 110, 21, 22, 443, 53) for TCP and online
`gaming and DNS (e.g., 27015-27050, 53) for UDP 2.
`In general, we believe that M3 will provide an estimate
`closer to the real intensity of P2P traﬃc, especially with lim-
`
`2Since nothing prevents P2P clients from using these ports
`also, excluding speciﬁc protocols by looking at port numbers
`may result in underestimating P2P ﬂows.
`
`123
`
`Cloudflare - Exhibit 1047, page 123
`
`

`ited 4-byte payload traces, while M2 provides a loose lower
`bound on P2P volume. M3 takes advantage of our ability to
`identify IPs participating in P2P ﬂows as determined by M2,
`facilitating identiﬁcation of ﬂows for which payload analysis
`fails. M3 is used only in section 7, where we examine the
`evolution of the volume of P2P traﬃc. In that section, we
`use M3 to overcome the problem of the limited 4-byte payload
`in our older traces. For all other analysis, payload P2P esti-
`mates are strictly based on payload string matching, namely
`M2.
`Recently, Sen et al. developed a similar signature-based
`payload methodology [28]. The authors concentrate on TCP
`signatures that characterize ﬁle downloads in ﬁve P2P proto-
`cols and identify P2P traﬃc based on the examination of all
`user payload bytes. [28] describes a subset of the signatures
`included in our methodology, since we also use UDP-based
`as well as protocol signaling signatures for a larger number
`of P2P protocols/networks (e.g., the WinMx/OpenNap net-
`work is not analyzed in [28], although it corresponds to a
`signiﬁcant portion of P2P traﬃc [17]). On the other hand,
`[28] presents the advantage of examining all user payload
`bytes. While examining all bytes of the payload should in-
`crease the amount of identiﬁed P2P traﬃc, we expect only
`a minimum diﬀerence in the number of identiﬁed P2P ﬂows
`between [28] and the methodology described in this section.
`First, characteristic signatures or bit strings of P2P packets
`appear at the beginning of user payload; thus, 16 bytes of
`payload should be suﬃcient to capture the majority of P2P
`ﬂows. Second, we expect that missed ﬂows due to the pay-
`load limitation will be identiﬁed by our M3 method and/or
`by TCP and UDP control traﬃc originating from the speciﬁc
`IPs.
`
`5. NONPAYLOAD IDENTIFICATION OF P2P
`TRAFFIC
`We now describe our nonpayload methodology for P2P
`traﬃc proﬁling (PTP). Our method only examines the packet
`header to detect P2P ﬂows, and does not in any way exam-
`ine user payload. To our knowledge, this is a ﬁrst attempt to
`identity P2P ﬂows on arbitrary ports without any inspection
`of user payload.
`Our heuristics are based on observing connection patterns
`of source and destination IPs. While some of these patterns
`are not unique to P2P hosts, examining the ﬂow history of
`IPs can help eliminate false positives and reveal distinctive
`features.
`We employ two main heuristics that examine the behavior
`of two diﬀerent types of pairs of ﬂow keys. The ﬁrst exam-
`ines source-destination IP pairs that use both TCP and UDP
`to transfer data (TCP/UDP heuristic, section 5.1). The sec-
`ond is based on how P2P peers connect to each other by
`studying connection characteristics of {IP, port} pairs (sec-
`tion 5.2). A high level description of our algorithm is as
`follows:
`• Data processing: We build the ﬂow table as we observe
`packets cross the link, based on 5-tuples, similar to the
`payload method. At the same time we collect infor-
`mation on various characteristics of {IP, port} pairs,
`including the sets of distinct IPs and ports that an
`{IP, port} pair is connected to, packet sizes used and
`transferred ﬂow sizes.
`
`Table 3: Excluded ports for TCP/UDP IP pairs heuristic.
`Ports
`Application
`135,137,139,445
`NETBIOS
`53
`DNS
`123
`NTP
`500
`ISAKMP
`554,7070,1755,6970,5000,5001
`streaming
`7000, 7514, 6667
`IRC
`6112, 6868, 6899
`gaming
`3531
`p2pnetworking.exe
`
`• Identiﬁcation of potential P2P pairs: We ﬂag potential
`ﬂows as P2P based on TCP/UDP usage and the {IP,
`port} connection characteristics.
`• False positives: We eliminate false positives by com-
`paring ﬂagged P2P ﬂows against our set of heuristics
`that identify mail servers, DNS ﬂows, malware, etc.
`5.1 TCP/UDP IP pairs
`Our ﬁrst heuristic identiﬁes source-destination IP pairs
`that use both TCP and UDP transport protocols. Six out
`of nine analyzed P2P protocols use both TCP and UDP as
`layer-4 transport protocols. These protocols include eDon-
`key, Fasttrack, WinMx, Gnutella, MP2P and Direct Con-
`nect. Generally, control traﬃc, queries and query-replies
`use UDP, and actual data transfers use TCP. To identify
`P2P hosts we can thus look for pairs of source-destination
`hosts that use both transport protocols (TCP and UDP).
`While concurrent usage of both TCP and UDP is deﬁ-
`nitely typical for the aforementioned P2P protocols, it is also
`used for other application layer protocols such as DNS or
`streaming media. To determine non-P2P applications in our
`traces that use both transport protocols, we examined all
`source-destination host pairs for which both TCP and UDP
`ﬂows exist. We found that besides P2P protocols, only a few
`applications use both TCP and UDP transport protocols:
`DNS, NETBIOS, IRC, gaming and streaming, which collec-
`tively typically use a small set of port numbers such as 135,
`137, 139, 445, 53, 3531, etc. Table 3 lists all such applica-
`tions found, together with their well-known ports. Port 445
`is related to the Microsoft NETBIOS service. Port 3531 is
`used by an application called p2pnetworking.exe which is au-
`tomatically installed by Kazaa. Although p2pnetworking.exe
`is related to P2P traﬃc, we choose to exclude it from our
`analysis since it is not under user control3 and speciﬁc only
`to the Kazaa client. Excluding ﬂows using ports presented in
`Table 3, 98.5% of the remaining IP source-destination pairs
`that use both TCP and UDP in our traces are P2P, based
`on the payload analysis with M2 described in Section 4. In
`summary, if a source-destination IP pair concurrently uses
`both TCP and UDP as transport protocols, we consider ﬂows
`between this pair P2P so long as the source or destination
`ports are not in the set in Table 3.
`
`5.2 {IP, port} pairs
`Our second heuristic is based on monitoring connection
`patterns of {IP, port} pairs.
`Since the lawsuit against Napster, the prevalence of cen-
`tralized P2P networks has decreased dramatically, and dis-
`tributed or hybrid P2P networks have emerged. To connect
`to these distributed networks, each P2P client maintains a
`3The user cannot change the port number or control its
`functionality, and all ﬂows of p2p.networking.exe use port
`3531.
`
`124
`
`Cloudflare - Exhibit 1047, page 124
`
`

`Figure 1: Initial connection from a new P2P host A to the P2P network. Host A connects to a superpeer picked from its
`host cache. Peer A informs the superpeer of its IP address and the port willing to accept connections from other peers. The
`superpeer propagates the {IP, port} pair to the rest of the P2P network. Peers willing to connect to host A, use the advertised
`{IP, port} pair. For the {IP, port} pair {A,1}, the number of distinct IPs (C,B) connected to it is equal to the number of
`distinct ports (10,15) used to connect to it. Our {IP, port} pair heuristic is based on such equality between the number of
`distinct ports and the number of distinct IPs aﬃliated with a pair in order to identify potential P2P pairs.
`
`starting host cache. Depending on the network, the host
`cache may contain the IP addresses of other peers, servers
`or supernodes/superpeers.4 This pool of hosts facilitates
`the initial connection of the new peer to the existing P2P
`network.
`As soon as a connection exists to one of the IPs in the host
`cache (we will henceforth refer to these IPs as superpeers),
`the new host A informs that superpeer of its IP address and
`port number at which it will accept connections from peers.
`Host A also provides other information speciﬁc to each P2P
`protocol but not relevant here. While in ﬁrst-generation
`P2P networks the listening port was well-deﬁned and spe-
`ciﬁc to each network, simplifying P2P traﬃc classiﬁcation,
`newer versions of all P2P clients allow the user to conﬁg-
`ure a random port number (some clients even advise users
`to change the port number to disguise their traﬃc). The
`superpeer must propagate this information, mainly the {IP,
`port} pair of the new host A, to the rest of the network. This
`{IP, port} pair is essentially the new host’s ID, which other
`peers need to use to connect to it. In summary, when a P2P
`host initiates either a TCP or a UDP connection to peer A,
`the destination port will also be the advertised listening port
`of host A, and the source port will be an ephemeral random
`port chosen by the client.
`Normally, peers maintain at most one TCP connection to
`each other peer, but there may also be a UDP ﬂow to the
`same peer, as described previously. Keeping in mind that
`multiple connections between peers is rare in our data sets,
`we consider what happens when twenty peers all connect
`to peer A. Each peer will select a temporary source port
`and connect to the advertised listening port of peer A. The
`advertised {IP, port} pair of host A would thus be aﬃliated
`with 20 distinct IPs and 20 distinct ports 5. In other words,
`for the advertised destination {IP, port} pair of host A, the
`number of distinct IPs connected to it will be equal to the
`number of distinct ports used to connect to it. Figure 1
`illustrates the procedure whereby a new host connects to
`the P2P network and advertises its {IP, port} pair.
`4Superpeers/supernodes are P2P hosts that handle ad-
`vanced functionality in the P2P network, such as routing
`and query propagation.
`5The probability that two distinct hosts pick the same ran-
`dom source port at the same time is extremely low.
`
`On the other hand, consider what happens in the case of
`web and HTTP. As in the P2P case, each host connects to
`a pre-speciﬁed {IP, port} pair, e.g., the IP address of a web
`server W and port 80. However, a host connecting to the
`web server will initiate usually more than one concurrent
`connection in order to download objects in parallel. In sum-
`mary, web traﬃc will have a higher ratio than P2P traﬃc of
`the number of distinct ports versus number of distinct IPs
`connected to the {IP, port} pair {W,80}.
`5.3 Methodology
`Our nonpayload methodology builds on insights from pre-
`vious sections 5.1 and 5.2. Speciﬁcally, for a time interval
`t we build the ﬂow table for the link, based on the ﬁve-
`tuple key and 64-second ﬂow timeout as with the payload
`methodology described in section 4. We then examine our
`two primary heuristics:
`• We look for source-destination IP pairs that concur-
`rently use both TCP and UDP during t. If such IP
`pairs exist and they do not use any ports from table 3,
`we consider them P2P.
`• We examine all source {srcIP, srcport} and destination
`{dstIP, dstport} pairs during t (use of pairs will hence-
`forth imply both source and destination {IP, port}
`pairs). We seek pairs for which the number of dis-
`tinct connected IPs is equal to the number of distinct
`connected ports. All pairs for which this equality holds
`are considered P2P . In contrast, if the diﬀerence be-
`tween connected IPs and ports for a certain pair is
`large (e.g., larger than 10), we regard this pair as non
`P2P.
`
`These two simple heuristics eﬃciently classify most pairs
`as P2P or nonP2P. In particular the {IP, port} heuristic
`can eﬀectively identify P2P and nonP2P pairs given a suf-
`ﬁciently large sample of connections for the speciﬁc pair.
`For example, with time interval t of 5 minutes there are no
`false positives for pairs with more than 20 connections in
`our February 2004 trace (D11 of Table 1.) That is, for this
`speciﬁc trace, if an IP pair has more than 20 IPs connect
`to it, we can classify it with high conﬁdence as P2P or not
`P2P.
`
`125
`
`Cloudflare - Exhibit 1047, page 125
`
`

`Whether a ﬂow is considered P2P depends on the classiﬁ-
`cation of its {IP, port} pairs. If one of the pairs in the 5-tuple
`ﬂow key has been classiﬁed as P2P, this ﬂow is deemed P2P.
`Similarly, if one of the pairs is classiﬁed as non P2P, so is
`the ﬂow. Additionally, if one of the IPs in a ﬂow has been
`found to match the TCP/UDP heuristic, the ﬂow is also
`considered as P2P.
`5.4 False positives
`We now describe heuristics developed to decrease the risk
`of false positives. Considering the diversity of backbone
`links that feature a vast number of IPs and ﬂows, we ex-
`pect the previous methodology to yield false positives, i.e.,
`classifying nonP2P pairs as P2P. False positives are most
`common in pairs with few connections, and also more fre-
`quent for speciﬁc applications/protocols whose connection
`behavior matches the P2P proﬁle of our heuristics (e.g., one
`connection per {IP,port} pair), e.g., e-mail (SMTP, POP),
`DNS and gaming.
`To decrease the rate of false positives we review the con-
`nection and ﬂow history of all pairs where the probability
`of a misclassiﬁcation is high, e.g., the source or destination
`port is equal to 25 and implies SMTP. Past ﬂow history for
`these pairs enables accurate classiﬁcation by investigating
`properties of speciﬁc IPs. In the following subsections, we
`describe heuristics that augment our basic methodology to
`limit the magnitude of false positives.
`
`5.4.1 Mail
`In our data sets, e-mail protocols such as Simple Mail
`Transfer Protocol (SMTP) or Post Oﬃce Protocol (POP)
`contribute most false positives. Mail false positives are not
`surprising since connection behavior resembles our {IP, port}
`heuristic. However, analysis of mail ﬂows and connection
`patterns allows for identiﬁcation of mail servers in our traces,
`forestalling misidentiﬁcation of traﬃc to such IP addresses
`as P2P.
`We examine all ﬂows where one of the port numbers is
`equal to 25 (SMTP), 110 (POP) or 113 (authentication ser-
`vice commonly used by mail servers). In fact we treat these
`three port numbers as one (we consider ports 110 and 113
`equal to 25), since for our purpose their behavior is the same.
`We identify mail servers based on their port usage history
`and whether they have diﬀerent ﬂows during the same time
`interval t that use port 25 for both source and destination
`port. The following observed ﬂow pattern illustrates this
`characteristic behavior of mail servers by examining the us-
`age of port 25 by IP 238.30.35.43 :
`
`dstport
`srcport
`proto
`dst IP
`src IP
`3267
`25
`6
`115.78.57.213
`238.30.35.43
`25
`22092
`6
`238.45.242.104
`238.30.35.43
`50827
`25
`6
`0.32.132.109
`238.30.35.43
`25
`22175
`6
`71.199.74.68
`238.30.35.43
`25
`21961
`6
`4.87.3.29
`238.30.35.43
`25
`22016
`6
`4.87.3.29
`238.30.35.43
`3301
`25
`6
`4.170.125.67
`238.30.35.43
`25
`22066
`6
`5.173.60.126
`238.30.35.43
`25
`22067
`6
`5.173.60.126
`238.30.35.43
`25
`22265
`6
`227.186.155.214
`238.30.35.43
`25
`22266
`6
`227.186.155.214
`238.30.35.43
`3872
`25
`6
`5.170.237.207
`238.30.35.43
`This case shows ﬂows for IP 238.30.35.43 6 with port 25
`as source port for some ﬂows and destination port for other
`ﬂows. This behavior is characteristic of mail servers that
`
`6Note that IP addresses are anonymized.
`
`initiate connections to other mail servers to propagate e-
`mail messages. To identify this pattern, we monitor the set
`of destination port numbers for each IP for which there ex-
`ists a source pair {IP,25}.
`If this set of destination port
`numbers also contains port 25, we consider this IP a mail
`server and classify all its ﬂows as nonP2P. Similarly for the
`set of source ports of an IP for which there exists a desti-
`nation pair {IP,25}. In the above example, for the source
`pair {238.30.35.43,25}, the set of destination ports is [3267,
`25, 50827, 3301, 3872]. Since port 25 appears in this set, we
`infer that IP 238.30.35.43 is a mail server and deem all of its
`ﬂows nonP2P. We keep all IPs identiﬁed as mail servers in a
`mailserver list to avoid future application of our heuristics
`to them.
`
`5.4.2 DNS
`The Domain Name Serv

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases