`
`-DocDtra lThese
`
`-Morp-hi x APPo eM APPo tbaPs dSaePy mMo xfMfSyMna
`ufePofPe xIIPaa
`
`xnepMoca()
`ihRRTrtnd ,rto
`
`Ant:hIbehMf lbeP)
`M220
`
`APoybfPfe DhfL)
`
`Tcc4ep::nDs/Dt.:g2/13M3:hcT9 r 220z23z0M
`
`khRpea g DhIPfaP)
`
`7R ID4Cts.Tc yDR IDNNhtosra meh UhtNscchn
`
`lTse 4r.h Pre .hRhtrchn rwcDNrcsoraaC w4DR nDPRaDrn utDN cTh
`sRuDtNrcsDR 4ahreh oDRewac cTh lhtNe Du weh /
`
`flE HwtsoT ihehrtoT IDaahocsDR / ZDt NDth
`
`flE FsLtrtC
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 1 of 307
`
`
`
`Diss. ETHNo. 15420
`
`MorphMix - A Peer-to-Peer-based
`System for Anonymous Internet
`Access
`
`A dissertation submitted to the
`
`SWISS FEDERAL INSTITUTE OF TECHNOLOGY
`
`ZURICH
`
`for the degree of
`Doctor of Technical Sciences
`
`presented by
`MARC RENNHARD
`
`Dipl. El.-Ing. ETH
`born 10th February 1972
`citizen of Böttstein (AG)
`
`accepted on the recommendation of
`Prof. Dr. Bernhard Plattner, examiner
`Dr. Laurent Mathy, co-examiner
`
`2004
`
`(Examination date 23rd January 2004)
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 2 of 307
`
`
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 3 of 307
`
`
`
`TIK-SCHRIFTENREIHE NR. 61
`
`Diss. ETHNo. 15420
`
`Marc Rennhard
`
`MorphMix-A Peer-to-Peer-based
`System for Anonymous
`Internet Access
`
`TIK-Schriftenreihe
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 4 of 307
`
`
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 5 of 307
`
`
`
`Abstract
`
`Contrary to popular belief, using the Internet is not anonymous at all. Since
`the Internet is a packet-switching network, every IP packet must carry the IP
`addresses of both communication endpoints. Consequently, anyone capable
`of observing at least one packet of a communication relationship can tell who
`is communicating with whom. The problem with this lack of anonymity is
`that it limits the privacy protection of Internet users. Today, privacy issues
`in the Internet are usually addressed by legislations that require operators of
`servers to clearly state their privacy practices and by encrypting the applica¬
`tion data exchanged between two communicating parties. In general, privacy
`practices are difficult to enforce and encrypting the application data does not
`hide the IP addresses in the IP packets. However, learning the endpoints of
`communications relationships often reveals a lot of information about indi¬
`vidual Internet users' preferences, habits, and problems; for instance when
`accessing web sites that provide medical information, religious sites, or the
`web site of a credit institution. These privacy issues can only be solved by
`enabling anonymous Internet communication.
`
`In this thesis, we work on the problem of achieving anonymous Internet
`access for low-latency applications such as web browsing. With anonymous
`Internet access, we mean that a client can connect to and communicate with
`a server such that the server does not learn the client's IP address and an at¬
`
`tacker interested in learning who is communicating with whom cannot find
`out the IP addresses of both client and server. Unlike encryption, anonymity
`cannot be "produced" by the communication endpoints themselves, but must
`be provided by a third party infrastructure. The concept of mix networks is
`widely considered to be the most promising approach for such an infrastruc¬
`ture, and consequently, we focus on mix networks in these thesis.
`
`The main contribution of our work is MorphMix, which fulfils the princi-
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 6 of 307
`
`
`
`11
`
`pal goal of this thesis: to develop a practical mix network that enables anony¬
`mous low-latency Internet access for a large number of users. With practical,
`we mean that (1) everybody owning a state-of-the-art computer connected to
`the Internet can use the system, (2) the performance it offers is good enough
`such that users won't turn away from it, (3) it provides good protection from
`attacks by a realistic adversary, and (4) it scales well and can handle a large
`number of users.
`
`We first analyse traditional mix networks that strictly separate between
`the mix network infrastructure and clients that access servers through the mix
`network. The conclusion is that traditional mix networks are not well suited
`
`to achieve our principal goal for various reasons. To overcome their limita¬
`tions, we propose MorphMix, which presents a novel way of operating and
`organising a mix network. In contrast to traditional mix networks, MorphMix
`does no longer distinguish between clients and the mix network. Rather, the
`clients themselves build the mix network infrastructure in a peer-to-peer fash¬
`ion. After describing the basic functionality of MorphMix, we give detailed
`analyses to show that MorphMix scales very well and provides good protec¬
`tion from a realistic adversary. To analyse the performance MorphMix offers
`to its users, we have implemented a simulator. The simulation results show
`that the expected performance of MorphMix is indeed good enough to attract
`users, and that the requirements to use MorphMix are modest. We have also
`specified the complete MorphMix protocol and have implemented a proto¬
`type. The main conclusion of our work is that with respect to our principal
`goal, MorphMix overcomes the limitations of traditional mix networks and is
`the first practical system that enables anonymous low-latency Internet access
`for a large number of users.
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 7 of 307
`
`
`
`Zusammenfassung
`
`Entgegen der weit verbreiteten Meinung ist die Benutzung des Internets nicht
`anonym. Weil das Internet ein paketvermittelndes Netwerk ist, muss jedes
`IP-Paketdie TP-Adressen beider Kommunikationsendpunkte enthalten. Folg¬
`lich kann jeder, der mindestens ein Paket einer Kommunikationsbeziehung
`beobachtet, sagen, wer mit wem kommuniziert. Dieses Problem der fehlen¬
`den Anonymität führt dazu, dass der erreichbare Schutz der Privatsphäre von
`Internetbenutzern limitiert wird. Derzeit wird die Privatsphäre im Internet
`üblicherweise so geschützt, dass Gesetze erlassen werden, welche die Be¬
`treiber von Servern verpflichten, ihre Praktiken im Umgang mit vertraulichen
`Benutzerdaten publik zu machen. Zusätzlich können die Anwendungsdaten,
`die zwischen den Kommunikationspartnern übertragen werden, mittels Ver¬
`schlüsselung geschützt werden. Im Allgemeinen ist es jedoch schwierig zu
`überprüfen, ob die Betreiber ihre publizierten Praktiken einhalten, und trotz
`Verschlüsselung der Anwendungsdaten sind die IP-Adressen der Kommu¬
`nikationspartner immer noch in den IP-Paketen sichtbar. Die Information,
`wer mit wem kommuniziert, liefert jedoch häufig bereits Erkenntnisse über
`die Vorlieben, Gewohnheiten und Probleme von individuellen Internetbenut¬
`zern, zum Beispiel wenn Daten von einem Webserver mit medizinischen oder
`religiösen Inhalten heruntergeladen werden oder wenn der Webserver eines
`Kreditinstituts kontaktiert wird.
`Solche Probleme betreffend des Schutzes
`
`der Privatsphäre können nur durch anonyme Internetkommunikation gelöst
`werden.
`
`In dieser Arbeit beschäftigen wir uns mit dem Problem der Anonymisie¬
`rung zeitkritischer Internetanwendungen wie Web-Browsing. Unter Anony¬
`misierung verstehen wir, dass ein Client eine Verbindung zu einem Server
`aufbauen und mit diesem kommunizieren kann, ohne dass der Server die
`IP-Adresse des Clients erfährt. Darüber hinaus darf ein Angreifer, der er-
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 8 of 307
`
`
`
`IV
`
`fahren möchte, wer mit wem kommuniziert, nicht zugleichbeide IP-Adressen
`von Client und Server herausfinden. Im Gegensatz zu Verschlüsselung kann
`Anonymität nicht von den Kommunikationspartnern selbst "erzeugt" wer¬
`den, sondern muss von einer Infrastruktur, welche von Dritten betrieben wird,
`gewährleistet werden. In der Forschungsgemeinde wird angenommen, dass
`das Konzept der Mix-Netzwerke am besten geeignet ist, eine solche Infra¬
`struktur bereit zu stellen. Folglich beschränken wir uns in dieser Arbeit auf
`Mix-Netzwerke.
`
`Der Hauptbeitrag dieser Arbeit ist das System MorphMix, welches unser
`Hauptziel erfüllt: ein praktikables Mix-Netwerk zu entwickeln, welches den
`anonymen Internetzugang für eine grosse Zahl von Benutzern ermöglicht.
`Unter praktikabel verstehen wir, dass (1) jeder, der einen zeitgemässen Com¬
`puter besitzt, von dem System Gebrauch machen kann, dass (2) die Perfor-
`manz des Systems für anwenderfreundliche Nutzung ausreicht, dass es (3)
`guten Schutz vor Attacken eines realistischen Angreifers bietet und dass es
`(4) gut skaliert und viele Benutzer gleichzeitig unterstützen kann.
`
`Wir analysieren zuerst traditionelle Mix-Netzwerke welche strikt zwi¬
`schen der Mix-Netzwerk Infrastruktur und den Clients, die mit Servern durch
`das Mix-Netzwerk kommunizieren, unterscheiden. Es wird gezeigt, dass sich
`traditionelle Mix-Netzwerke nicht gut eignen, um unser Hauptziel zu errei¬
`chen. Deshalb schlagen wir das System MorphMix vor, welches eine neue
`Art des Betriebs und der Organisation eines Mix-Netzwerks darstellt. Im
`Gegensatz zu traditionellen Mix-Netzwerken unterscheidet MorphMix nicht
`zwischen Clients und dem Mix-Netzwerk. Vielmehr bilden die Clients selbst
`
`die Mix-Netzwerk Infrastruktur auf Peer-to-Peer Basis. Nach der Beschrei¬
`
`bung der grundlegenden Funktionalität von MorphMix liefern wir detaillierte
`Analysen, welche zeigen, dass MorphMix sehr gut skaliert und guten Schutz
`vor einem realistischen Angreifer bietet. Um die Performanz von MorphMix
`zu analysieren, haben wir einen MorphMix Simulator implementiert. Die
`Simulationsresultate zeigen, dass die erwartete Performanz die Benutzerzu¬
`friedenheit gewährleisten kann, und dass die Hardwareanforderungen von
`MorphMix von jedem zeitgemässen Computer erfüllt werden können. Des
`Weiteren haben wir das vollständige MorphMix-Protokoll spezifiziert und
`einen Prototypen implementiert. Insgesamt wird gezeigt, dass MorphMix
`unter Berücksichtigung unseres Hauptziels signifikante Vorteile im Vergle¬
`ich zu traditionellen Mix-Netzwerken aufweist und das erste praktikable Sys¬
`tem darstellt, welches die Anonymisierung zeitkritischer Internetanwendun¬
`gen für eine grosse Benutzerbasis ermöglicht.
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 9 of 307
`
`
`
`Contents
`
`1
`
`Introduction
`
`1.1
`
`1.2
`
`Invading Privacy at the Application Level
`
`Invading Privacy at the Network Level
`
`1.3 Why do we need Anonymity
`
`1.4
`
`Benefits versus Drawbacks
`
`1.5
`
`Terminology and Definitions
`
`1.6
`
`Problem Statement and Contributions of this Work
`
`1.7
`
`Outline
`
`2
`
`The Mix Network Approach
`
`2.1
`
`The Mix Network Idea and Terminology
`
`2.2 Mix Networks based on Chaumian Mixes
`
`2.2.1
`
`2.2.2
`
`Basic Functionality
`
`Measures to Maintain the Sender's Anonymity ...
`
`2.2.3
`
`Basic Attacks on Mix Networks
`
`2.3
`
`Circuit-based Mix Networks
`
`2.3.1
`
`2.3.2
`
`Basic Functionality
`
`Measures to Maintain the Client's Anonymity . ...
`
`2.3.3
`
`Attacks on Circuit-based Mix Networks
`
`2.3.4
`
`Ways of Operating Circuit-based Mix Networks .
`
`.
`
`.
`
`2.4
`
`Summary
`
`3
`
`Related Work
`
`3.1 Mix Networks Designs and Implementations
`
`1
`
`3
`
`7
`
`8
`
`11
`
`12
`
`14
`
`15
`
`17
`
`17
`
`25
`
`26
`
`28
`
`29
`
`30
`
`31
`
`35
`
`35
`
`37
`
`40
`
`41
`
`41
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 10 of 307
`
`
`
`Contents
`
`3.1.1
`
`Chaumian Mix Networks
`
`3.1.2
`
`Circuit-Based Mix Networks
`
`3.2 Mix Networks Analysis and Attacks
`
`3.3
`
`Techniques beyond Mix Networks
`
`3.3.1
`
`3.3.2
`
`3.3.3
`
`3.3.4
`
`Simple Remailers
`
`Proxy Forwarders
`
`Broadcast-Based Approaches
`
`Anonymous Publishing
`
`Other Applications
`
`Economics of Anonymity and Reputation
`
`Measuring Anonymity
`
`Summary
`
`3.4
`
`3.5
`
`3.6
`
`3.7
`
`A Detailed Analysis of Mix Networks
`
`4.1 Why Anonymity is so Hard
`
`4.1.1
`
`Global Passive External Attackers
`
`4.1.2
`
`Partial Active Internal Attackers
`
`4.1.3
`
`Summary
`
`4.2 A Quantitative Analysis of Mix Networks
`
`4.2.1
`
`No Dummy Traffic
`
`4.2.2 Dummy Traffic between Clients and Mixes
`
`4.2.3
`
`End-to-End Dummy Traffic
`
`4.2.4
`
`Mix Cascades
`
`4.2.5
`
`Summary
`
`4.3 A Realistic Threat Model
`
`4.3.1
`
`The Passive External Attacker
`
`4.3.2
`
`The Active Internal Attacker
`
`4.3.3
`
`Summary
`
`4.4
`
`Comparison of Mix Network Approaches
`
`4.4.1
`
`Static Mix Networks as Commercial Services ....
`
`4.4.2
`
`4.4.3
`
`4.4.4
`
`Static Mix Networks Operated by Volunteers ....
`
`Dynamic, Peer-to-Peer-based Mix Networks ....
`
`Summary
`
`4.5
`
`Conclusions
`
`42
`
`44
`
`49
`
`54
`
`55
`
`55
`
`56
`
`57
`
`60
`
`61
`
`62
`
`62
`
`64
`
`64
`
`66
`
`70
`
`71
`
`72
`
`73
`
`75
`
`77
`
`78
`
`80
`
`82
`
`82
`
`85
`
`86
`
`87
`
`87
`
`90
`
`91
`
`92
`
`93
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 11 of 307
`
`
`
`Contents
`
`5 MorphMix
`
`5.1
`
`Motivation and Goals
`
`5.2
`
`Basic Functionality of MorphMix
`
`5.2.1
`
`Overview
`
`5.2.2
`
`5.2.3
`
`5.2.4
`
`Anonymous Tunnels and Anonymous Connections .
`
`Cells and Messages
`
`Anonymous End-to-End Communication
`
`5.3
`
`Requirements to Break the Anonymity
`
`5.4
`
`ThreatModel
`
`5.4.1
`
`The Passive External Attacker
`
`5.4.2
`
`The Active Internal Attacker
`
`5.4.3
`
`Summary
`
`5.5
`
`Establishing Anonymous Tunnels
`
`5.5.1
`
`5.5.2
`
`5.5.3
`
`5.5.4
`
`Anonymous Tunnel Setup
`
`Analysis of the Anonymous Tunnel Setup
`
`Policy For Using the Virtual Links to Neighbours .
`
`.
`
`Why Relaying Data for Other Nodes is Good ....
`
`5.6
`
`Collusion Detection Mechanism
`
`5.6.1
`
`Correlation and Correlation Distribution
`
`95
`
`96
`
`97
`
`98
`
`99
`
`101
`
`102
`
`106
`
`108
`
`109
`
`109
`
`112
`
`113
`
`113
`
`119
`
`122
`
`123
`
`123
`
`124
`
`5.6.2
`
`Selection Size and Size of Extended Selections List .
`
`128
`
`5.6.3
`
`Detecting Malicious Tunnels
`
`5.7
`
`Peer Discovery Mechanism
`
`5.7.1
`
`5.7.2
`
`5.7.3
`
`Initial Peer Discovery
`
`Continuous Peer Discovery
`
`Organising and Accessing Information about other
`Nodes
`
`5.8
`
`Scalability and Requirements to Run a Node
`
`5.8.1
`
`Scalability and General Requirements
`
`5.8.2 NAT Gateways and Dynamic IP Addresses
`
`5.9 An Outlook on IPv6
`
`5.10 Summary
`
`6
`
`Attacks on MorphMix
`
`6.1
`
`Basic Attack Model
`
`6.1.1
`
`The Node Simulator
`
`129
`
`131
`
`131
`
`133
`
`133
`
`140
`
`140
`
`142
`
`144
`
`148
`
`151
`
`151
`
`153
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 12 of 307
`
`
`
`vin
`
`Contents
`
`6.1.2
`
`Basic Scenario
`
`6.2
`
`Varying the Attack Level
`
`6.2.1
`
`6.2.2
`
`6.2.3
`
`The Adversary Attacks Always
`
`The Adversary Attacks Selectively
`
`Summary
`
`6.3
`
`Attacks Including Malicious Witnesses
`
`6.4
`
`Denial of Service Attacks
`
`6.5
`
`Exploiting the Peer Discovery Mechanism
`
`6.6 Why Counting the Occurrences of Subnets does not Work .
`
`.
`
`6.7
`
`Summary
`
`7
`
`Analysis of the Collusion Detection Mechanism
`
`7.1
`
`Joining MorphMix for the first Time
`
`7.2
`
`Honest and Malicious Nodes in the same/16 Subnet
`
`7.3
`
`Large Realistic Systems
`
`7.3.1
`
`The Nodes have Abundant Capabilities and are Con¬
`tinuously Participating in MorphMix
`
`7.3.2
`
`The Nodes have Different Capabilities and Up-Times
`
`7.4
`
`Optimising the Quality of Anonymous Tunnels
`
`154
`
`155
`
`155
`
`160
`
`161
`
`164
`
`165
`
`168
`
`170
`
`171
`
`174
`
`174
`
`177
`
`180
`
`180
`
`182
`
`188
`
`7.5
`
`The Subnets Contain Different Numbers of Honest Nodes .
`
`.
`
`193
`
`7.6
`
`7.7
`
`Varying the Tunnel Length
`
`Summary
`
`8 MorphMix Simulation and Results
`
`8.1
`
`8.2
`
`The MorphMix Simulator
`
`Basic Simulator Settings
`
`8.2.1
`
`8.2.2
`
`8.2.3
`
`8.2.4
`
`8.2.5
`
`Protocol Settings
`
`Virtual Link Settings
`
`Tunnel Settings
`
`Node Settings
`
`Web Browsing Scenario Settings
`
`8.3
`
`Simulation Results
`
`8.3.1
`
`8.3.2
`
`8.3.3
`
`Contacting the Web Server Directly
`
`Contacting the Web Server through MorphMix .
`
`.
`
`.
`
`Optimising the Throughput of Anonymous Tunnels .
`
`194
`
`194
`
`198
`
`198
`
`200
`
`200
`
`201
`
`201
`
`202
`
`204
`
`205
`
`205
`
`207
`
`208
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 13 of 307
`
`
`
`Contents
`
`8.3.4
`
`8.3.5
`
`8.3.6
`
`8.3.7
`
`8.3.8
`
`8.3.9
`
`Using Multiple Anonymous Tunnels in Parallel .
`
`.
`
`.
`
`Bandwidth Usage and Overhead
`
`The Influence of Failed Tunnel Setups and Rejected
`Tunnels
`
`The Influence of the Tunnel Length
`
`The Influence of the Cell Length
`Crashing Nodes and Blocked Virtual Links
`
`8.4
`
`Summary
`
`9
`
`Conclusions
`
`9.1
`
`9.2
`
`9.3
`
`Summary
`
`Achievement of Goals and Assessment
`
`Comparison with Other Systems
`
`9.3.1
`
`9.3.2
`
`Comparison with Crowds
`
`Comparison with Tarzan
`
`9.4
`
`FurtherWork
`
`Bibliography
`
`Acknowledgements
`
`Biography
`
`A MorphMix Protocol and Prototype Implementation
`
`A.l
`
`Notation
`
`A.2 Basic Protocol Properties
`
`A.2.1
`
`Cryptographic Algorithms
`
`A.2.2
`
`Cell Format
`
`A.2.3 Node Levels
`
`A.2.4
`
`Encoding
`
`A. 3 Messages between Neighbours
`
`A.3.1
`
`Establishing a Virtual Link
`
`A.3.2
`
`Appending a Node to a Tunnel
`
`A.3.3
`
`Peer Discovery Messages
`
`A.3.4
`
`Virtual Link Status Information Messages
`
`A.3.5
`
`Terminating an Anonymous Tunnel
`
`ix
`
`213
`
`214
`
`220
`
`221
`
`224
`
`225
`
`227
`
`231
`
`231
`
`234
`
`237
`
`238
`
`240
`
`242
`
`245
`
`257
`
`259
`
`260
`
`260
`
`261
`
`261
`
`262
`
`264
`
`264
`
`267
`
`267
`
`268
`
`269
`
`271
`
`271
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 14 of 307
`
`
`
`x
`
`Contents
`
`A.3.6
`
`Flow Control Messages
`
`A.3.7
`
`Virtual Link Data Messages
`
`A.4 End-to-End Messages
`
`A.4.1
`
`Appending a Node to a Tunnel
`
`272
`
`273
`
`274
`
`274
`
`A.4.2
`
`Initiating and Terminating an Anonymous Connection 275
`
`A.4.3
`
`End-to-End Status Information Messages
`
`A.4.4
`
`End-to-end Data Messages
`
`A.5 Virtual Link and Tunnel Usage
`Virtual Links and Tunnel Lifetimes
`
`A.5.1
`
`A.5.2
`
`Policy for Using Virtual Links
`
`A.6 Quantitative Analysis of the Data Overhead
`
`A.6.1
`
`A.6.2
`
`A.6.3
`
`Tunnel Setup and Teardown Overhead
`Virtual Link Setup Overhead
`Virtual Link Status Information Overhead
`
`A.6.4
`
`End-to-End Status Information Overhead
`
`A.6.5
`
`Other Protocol Overhead
`
`A.6.6
`
`Protocol Overhead Summary
`A. 7 MorphMix Prototype Implementation
`
`277
`
`278
`
`278
`
`279
`
`280
`
`281
`
`281
`
`284
`
`285
`
`285
`
`285
`
`286
`
`286
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 15 of 307
`
`
`
`List of Figures
`
`2.1
`
`2.2
`
`2.3
`
`2.4
`
`The mix overlay network and the underlying physical network.
`
`Sending application data directly from hc to hs
`
`Sending application data via the mix network from h c to hs.
`
`19
`
`20
`
`22
`
`Sending application data AD through a Chaumian mix network. 26
`
`2.5 A circuit-based mix network
`
`2.6
`
`Different ways of operating circuit-based mix networks. .
`
`.
`
`.
`
`Traffic analysis at a mix
`
`End-to-end Traffic analysis
`
`End-to-end Traffic analysis by an internal attacker
`
`Basic idea of MorphMix
`
`32
`
`37
`
`66
`
`68
`
`71
`
`98
`
`4.1
`
`4.2
`
`4.3
`
`5.1
`
`5.2
`
`5.3
`
`5.4
`
`5.5
`
`Multiple anonymous connections within one anonymous tunnel. 100
`
`Virtual links and layers of encryption along an anonymous
`tunnel
`
`Anonymous connections and cell forwarding
`
`Appending a node to a tunnel and establishing the layer of
`encryption
`
`5.6
`
`Correlation distribution with 10000 nodes
`
`5.7 Node Lookup list
`
`6.1
`
`6.2
`
`fam if the adversary attacks always with the same attack level. 156
`
`Correlation distribution when varying the attack level from
`0-14
`
`6.3
`
`fam if the adversary attacks with different attack levels. .
`
`.
`
`.
`
`101
`
`103
`
`117
`
`127
`
`135
`
`157
`
`159
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 16 of 307
`
`
`
`xii
`
`List of Figures
`
`6.4
`
`6.5
`
`6.6
`
`6.7
`
`6.8
`
`fam if the adversary attacks always with the same attack level
`but only if he controls the first intermediate node
`
`fam if the adversary attacks with different attack levels but
`only if he controls the first intermediate node
`
`fam if the adversary attacks with different attack levels but
`only if he controls the first intermediate node, assuming he
`does not always guess correctly
`
`fam if the adversary hopes for a malicious witness when the
`final node is appended
`
`fam if the adversary attacks always with the same attack level
`and refuses to forward data along any tunnel where he con¬
`trols at least one node but not both the first intermediate and
`
`the final node
`
`161
`
`162
`
`163
`
`165
`
`166
`
`6.9
`
`Occurrences of/16 subnets in the extended selections list. .
`
`.
`
`171
`
`6.10 fam without and with collusion detection, and if the adversary
`plays fair.
`
`7.1
`
`Performance with 10000 nodes
`
`7.2
`
`Performance with 50000 nodes
`
`7.3
`
`7.4
`
`7.5
`
`7.6
`
`7.7
`
`7.8
`
`7.9
`
`fam depending on the fraction controlled by the adversary in
`subnets with malicious nodes
`
`100000 honest, 10000 malicious nodes (abundant capabili¬
`ties, always participating)
`
`1000000 honest, 10000 malicious nodes (abundant capabili¬
`ties, always participating)
`
`100000 honest, 10000 malicious nodes (different capabilities
`and participation probabilities)
`
`1000000 honest, 10000 malicious nodes (different capabili¬
`ties and participation probabilities)
`
`100000 honest, 10000 malicious nodes (with tunnel optimi¬
`sation)
`
`1000000 honest, 10000 malicious nodes (with tunnel optimi¬
`sation)
`
`7.10 1000000 honest, 10000 malicious nodes (different numbers
`of honest nodes in the/16 subnets)
`
`173
`
`175
`
`176
`
`179
`
`181
`
`182
`
`185
`
`188
`
`190
`
`192
`
`193
`
`7.11 1000000 honest, 10000 malicious nodes (diff. tunnel lengths). 195
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 17 of 307
`
`
`
`List of Figures
`
`Download times, accessing the web server directly
`
`Download times, accessing the web server through MorphMix. 207
`
`Download times with optimised tunnel throughput
`
`208
`
`Download times with even more optimised tunnel throughput. 209
`
`xni
`
`206
`
`8.1
`
`8.2
`
`8.3
`
`8.4
`
`8.5
`
`8.6
`
`8.7
`
`8.8
`
`8.9
`
`Ratio between the download times when accessing the web
`server through MorphMix and directly using HTTP 1.1. .
`
`.
`
`.
`
`Download times for a single file
`
`Download times for a single file (more detailed illustration).
`
`Download times using multiple tunnels in parallel
`
`Bandwidth usage
`
`210
`
`211
`
`212
`
`213
`
`216
`
`217
`
`219
`
`8.10 Data sent and received by the nodes
`
`8.11 Download times bandwidth usage with reading time = 0.
`
`.
`
`.
`
`8.12 Data sent and received by the nodes with reading time = 0.
`
`220
`
`.
`
`8.13 Download times with failed and rejected tunnels
`
`8.14 Bandwidth usage with failed and rejected tunnels
`
`8.15 Download times depending on the tunnel length
`
`8.16 Bandwidth usage depending on the tunnel length
`
`8.17 Download times using different cell lengths
`
`8.18 Download times when nodes crash or are temporarily blocked
`from their neighbours
`
`A.l
`
`Cell format.
`
`221
`
`221
`
`222
`
`223
`
`225
`
`227
`
`262
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 18 of 307
`
`
`
`List of Tables
`
`4.1 Minimum capacities to support 100000 users (web browsing,
`5 MB per day and user)
`
`Realistic bandwidth distribution of MorphMix nodes
`
`Acceptable intermediate and final nodes
`
`81
`
`183
`
`189
`
`7.1
`
`7.2
`
`8.1
`
`8.2
`
`Data volume depending on the cell length (all lengths in bytes) 224
`
`Percentage of pages that failed during their first download. .
`
`228
`
`A. 1 Node levels in MorphMix
`
`A.2 Encoding of fields in the pay load
`
`A.3 Encoding of message types
`
`A.4 Fields and cells to establish a virtual link
`
`A. 5
`
`Fields and cells (corresponding to messages 4-9) to append a
`node to an anonymous tunnel
`
`A.6 Fields and cells to learn about other nodes
`
`A.7 Fields and cells to exchange status mformationbetween neigh¬
`bours
`
`A. 8
`
`Cell to terminate an anonymous tunnel
`
`A. 9
`
`Cell to reset the credit of a tunnel on a virtual link
`
`A. 10 Cell to transport end-to-end messages
`
`A. 11 Fields and cell payloads (corresponding to messages 1-3 and
`10) to append a node to an anonymous tunnel
`
`A. 12 Fields and cell payloads to initiate and terminate anonymous
`connections
`
`264
`
`265
`
`266
`
`268
`
`270
`
`271
`
`272
`
`272
`
`273
`
`273
`
`276
`
`277
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 19 of 307
`
`
`
`List of Tables
`
`A. 13 Fields and cell payloads to exchange status information be¬
`tween endpoints of a tunnel
`
`A. 14 cell payloads to transport end-to-end data
`
`A. 15 Overhead for the initiator to set up a tunnel
`A. 16 Overhead for the ith intermediate node to set up a tunnel. .
`A. 17 Overhead for the final node to set up a tunnel
`
`A. 18 Overhead for a witness to set up a tunnel
`
`A. 19 Overhead summary to set up and tear down a tunnel with
`lengthfive
`
`xv
`
`278
`
`278
`
`281
`
`283
`
`283
`
`284
`
`282
`
`.
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 20 of 307
`
`
`
`Chapter 1
`
`Introduction
`
`Until the early 1990s, the Internet was mainly an academic research network
`where security and privacy issues were of little importance. However, driven
`by the huge popularity of the World Wide Web (WWW) due to graphical web
`browsers, the Internet has become a platform used by hundreds of millions of
`people everyday for activities that often have been shifted from the physical
`to the online world.
`
`Soon, it was recognised that especially the growth of e-commerce calls
`for some security mechanisms. There is no universal definition of computer
`or network security because it always depends on what must be protected, but
`in the Internet context, security often means secure communication, which
`can be defined as follows1 :
`
`Definition 1 Secure Communication between two parties A andB is defined
`as a communication relationship with the following three properties:
`
`• Confidentiality: data exchanged between A and B cannot be read by
`an eavesdropper
`• Integrity: data exchanged between A and B cannot be altered (acci¬
`dentally or intentionally) in transit in a way that is not detectable by
`the recipient
`• Authentication: A (or B) can be sure she is indeed communicating
`with B (or A)
`
`1 Other security properties include availability and non-repudiation, which are less important
`for secure communication
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 21 of 307
`
`
`
`2
`
`1 Introduction
`
`Applied to an e-commerce scenario, a customer should be sure she is
`indeed communicating with the e-store she intends to and transmitting the
`payment information from the user to the e-store should be protected from
`modification or observation by third parties. Using the secure socket layer
`(SSL) protocol [51] or its successor, the transport layer security (TLS) pro¬
`tocol [33], together with X.509 digital certificates [60] solves these problems
`and brings confidentiality, authenticity, and integrity to the Internet. Another
`protocol for secure communication is IPSec [66], which "patches" the IP pro¬
`tocol [40] with security mechanisms to enable, for example, virtual private
`networks (VPNs) [108]. To put it briefly, regarding e-commerce and other
`business transactions in the Internet and leaving out mobile and ad-hoc net¬
`working scenarios, the basic security problems are well understood, solved,
`and widely accepted and deployed.
`
`Beyond security, there is privacy. Applied to the Internet context, privacy
`can be defined as follows [54] :
`
`Definition 2 Privacy refers to the ability of an individual to control the in¬
`formation about herself. This does not necessarily mean that no information
`is revealed to anyone. Rather, a system that respects the privacy of its users
`allows them to select what information about them is revealed, and to whom.
`
`Unlike security, mechanisms to bring more privacy to the Internet are
`not yet widely deployed. One proposal is the World Wide Web Consortium
`(W3C)'s Platform for Privacy Preferences (P3P) project [93]. Its goal is to
`clarify a web site's privacy practices to its users. When a web site or e-
`shop is contacted, the user is informed of the privacy practices of that site.
`The user can accept these practices, reject them and ask for an alternative
`proposal, or send another proposal herself. If an agreement between user and
`web site is reached, the communication continues, otherwise it is terminated.
`However, P3P only specifies the protocol for exchanging structured data to
`reach an agreement, but it cannot do anything to enforce the privacy practices
`a web site has proposed: even if an e-shop has promised not to give away
`information about the user to third parties, there is no way for the user to
`check if the e-shop complies with the rules.
`
`This thesis is primarily about anonymity, which is closely related to pri¬
`vacy. We introduce the term anonymity more formally in Section 1.5, but for
`now, anonymity can be defined as follows [54] :
`
`Definition 3 Anonymity means privacy of identity. A system that offers
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 22 of 307
`
`
`
`1.1 Invading Privacy at the Application Level
`
`3
`
`anonymity is one where the user gets control who learns her identity. In the
`Internet context, identity not only means the true name of the user, but also
`her e-mail or IP address.
`
`For completeness, we also introduce the term pseudonymity, which is a
`special case of anonymity:
`
`Definition 4 Pseudonymity enhances anonymity with pseudonymous iden¬
`tities. Such identities are also called nyms [13] and make it possible for an
`Internet user to have an identity while her true name is kept secret. There are
`nyms that are completely unrelated to the individual's real identity (for in¬
`stance a self-chosen alias) and other nyms that make it possible to unambigu¬
`ously uncover its owner's identity if certain conditions are met (for instance
`if a court order has been issued).
`
`In the remainder of this chapter, we first discuss common practices in use
`today, which invade the privacy of Internet users. We also point out possibili¬
`ties and limitations for a user to protect herself from such invasions and show
`that anonymous Internet access is one way to overcome these limitations. We
`discuss benefits and drawbacks of anonymity and finally, we state the main
`goal of our work and describe our contributions.
`
`1.1
`
`Invading Privacy at the Application Level
`
`One way to invade the privacy of Internet users is to do so at the applica¬
`tion level. A prominent example is tracking users as they navigate through
`the WWW, which is possible by combining several mechanisms of how web
`browsers access web pages using the hypertext transfer protocol (HTTP) [8,
`45] and its secure version HTTPS [105]. We briefly describe these mecha¬
`nisms below:
`
`• The HTTP référer2 field in the header of HTTP requests field tells a
`web server or an eavesdropper the uniform resource locator (URL) of
`the web object the user has downloaded before. For instance, if a web
`site is accessed from the results page of a search engine, it contains the
`entire search string the user entered.
`
`2Note that référer (for referrer) was spelt incorrectly in the original standard and made it
`into the first implementations of the HTTP protocol For backward compatibility, the misspelled
`word is still used in newer implementations of the protocol as of today
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1008
`Page 23 of 307
`
`
`
`4
`
`1 Introduction
`
`• Cookies [69] were originally invented to create a session over several
`HTTP request/reply pairs, thereby allowing a web server to track a user
`as she navigates through different web pages at the same site. A cookie
`is a small piece of information that a web server sends to the browser
`within an HTTP reply. If a page at the same site is requested later using
`the same browser, the cookie is sent to the web server as part of the
`HTTP request, which allows the web server to recognise subsequent
`If a user ever registers at the web site, the
`visits by the same user.
`server can associate her identity with the pages she has visited and form
`a profile of her browsing interests.
`• Embedded objects of a web page are automatically downloaded by
`the web browser from their respective servers. Embedded objects must
`not reside on the same server as the page containing them, but can
`be locates on any server. One type of embedded objects are banners
`for advertisement, and many institutions allow third parties to place
`banners on their web pages in return for monetary compensation.
`
`Combining HTTP referers, cookies, and embedded objects (usually in
`the form of banners) make it possible for a third party to track users across
`different web sites and accumulate detailed profiles. To do so, a company C
`interested in collecting data about users places banners on several web pages
`If a user downloads a page containing a banner from C,
`at different sites.
`the browser automatically requests the banner form C's web server. Since the
`browser also includes the HTTP référer in the request, C learns the URL of
`the page the user is downloading. When sending the HTTP reply containing
`the banner, C's web server includes a cookie, which is stored in the user's
`browser. If the user later visits the same or another web site that contains a
`
`banner from C, the browser includes this cookie in the HTTP requests to fetch
`the banner, which allows C to recognise the user.
`
`While this does not seem to be a significant loss of privacy at first glance,
`it gets much more serious when looking at a real example. We visit Health-
`Central, com, a site providing medical information, and enter "cancef ' in the
`search form on their entry page, which results in an HTTP request sent to the
`web server that contains the following fields:
`
`GET /search asp'?query=cancer HTTP/1 1
`Host search healthcentral com
`
`Référer http //www healthcentral com/home/home cfm
`
`Code200,