`
`Michael J. Freedman
`NYU Dept of Computer Science
`715 Broadway #715
`New York, NY 10003 USA
`mfreed@cs.nyu.edu
`
`Robert Morris
`MIT Lab for Computer Science
`200 Technology Sq. #509
`Cambridge, MA 02139 USA
`rtm@lcs.mit.edu
`
`ABSTRACT
`Tarzan is a peer-to-peer anonymous IP network overlay. Because
`it provides IP service, Tarzan is general-purpose and transparent
`to applications. Organized as a decentralized peer-to-peer overlay,
`Tarzan is fault-tolerant, highly scalable, and easy to manage.
`Tarzan achieves its anonymity with layered encryption and multi-
`hop routing, much like a Chaumian mix. A message initiator
`chooses a path of peers pseudo-randomly through a restricted topol-
`ogy in a way that adversaries cannot easily influence. Cover traffic
`prevents a global observer from using traffic analysis to identify an
`initiator. Protocols toward unbiased peer-selection offer new direc-
`tions for distributing trust among untrusted entities.
`Tarzan provides anonymity to either clients or servers, without
`requiring that both participate. In both cases, Tarzan uses a net-
`work address translator (NAT) to bridge between Tarzan hosts and
`oblivious Internet hosts.
`Measurements show that Tarzan imposes minimal overhead over
`a corresponding non-anonymous overlay route.
`
`1.
`
`INTRODUCTION
`The ultimate goal of Internet anonymization is to allow a host to
`communicate with an arbitrary server in such a manner that nobody
`can determine the host’s identity. Toward this goal, we envision a
`system that uses an Internet-wide pool of nodes, numbered in the
`thousands, to relay each others’ traffic to gain anonymity.
`Different entities may be interested in exposing the host’s iden-
`tity, each with varying capabilities to do so: curious individuals
`or groups may run their own participating machines to snoop on
`traffic; parties skirting legality may break into a limited number of
`others’ machines; and large, powerful organizations may tap and
`monitor Internet backbones.
`Clearly, each type of adversary suggests different design criteria
`for an anonymizing system. Prior systems have either underesti-
`mated the ease of cracking or crashing individual machines, or dis-
`counted the prevalence of wide-spread eavesdropping capabilities,
`exemplified by the “Great Firewall of China” [30], the FBI’s Carni-
`vore system [11], or subpoenas of Tier-1 ISP traffic for copyright-
`protection compliance [22].
`
`Permission to make digital or hard copies of all or part of this work for
`personal or classroom use is granted without fee provided that copies are
`not made or distributed for profit or commercial advantage and that copies
`bear this notice and the full citation on the first page. To copy otherwise, to
`republish, to post on servers or to redistribute to lists, requires prior specific
`permission and/or a fee.
`CCS’02, November 18–22, 2002, Washington, DC, USA.
`Copyright 2002 ACM 1-58113-612-9/02/0011 ...$5.00.
`
`This paper describes Tarzan, a practical system aimed at realiz-
`ing anonymity against all three flavors of adversary. First, however,
`we discuss why less ambitious approaches are not adequate.
`In the simplest alternative, a host sends messages to a server
`through a proxy, such as Anonymizer.com [1]. This system fails
`if the proxy reveals a user’s identity [18] or if an adversary can
`observe the proxy’s traffic. Furthermore, servers can easily block
`these centralized proxies and adversaries can prevent usage with
`denial-of-service attacks.
`To overcome this single point of failure, a host can connect
`to a server through a set of mix relays [3]. The anonymous re-
`mailer system [10], Onion Routing [26], and Zero-Knowledge’s
`Freedom [13] offer such a model, providing anonymity through a
`small, fixed core set of relays. However, if a corrupt relay receives
`traffic from a non-core node, the relay can identify that node as
`the ultimate origin of the traffic. Colluding entry and exit relays
`can use timing analysis to determine both source and destination.
`Even an external adversary can mount the same attack. Therefore,
`the connecting host remains vulnerable to individual relay failures,
`and these relays provide obvious targets for attacking or blocking.
`Few of these systems attempt to provide anonymity against an
`adversary that can passively observe all network traffic. Such pro-
`tection requires fixing traffic patterns or using cover traffic to make
`such traffic analysis more difficult. Proposals that do exist have
`several shortcomings, however. Some protect only the core of the
`static mix network and thus allow traffic analysis on its edges [2,
`26]. Some simulate full synchrony and thus trivial DoS attacks halt
`their operation in entirety [7]. And some require central control and
`knowledge of the entire network [15].
`Tarzan, on the other hand, does not suffer from these same weak-
`nesses. Its main contributions are two-fold.
`First, Tarzan extends known mix-net designs to a peer-to-peer
`environment. Tarzan nodes communicate over sequences of mix
`relays chosen from an open-ended pool of volunteer nodes, without
`any centralized component. We present techniques to securely dis-
`cover and select other nodes as communication relays: All peers are
`potential originators of traffic; all peers are potential relays. Such a
`scalable design lessens the significance of targeted attacks and in-
`hibits network-edge analysis, as a relay cannot tell if it is the first
`hop in a mix path. Furthermore, we leverage our new concept of a
`domain to remove potential adversarial bias: An adversary may run
`hundreds of virtual machines, yet is unlikely to control hundreds of
`different IP subnets.
`Second, Tarzan introduces a scalable and practical technique
`for cover traffic that uses a restricted topology for packet routing:
`Packets can be routed only between mimics, or pairs of nodes as-
`signed by the system in a secure and universally-verifiable manner.
`This technique is practical in that it does not require network syn-
`
`CODE200 ET AL. EXHIBIT 1033
`Page 1 of 14
`
`
`
`chrony and consumes only a small factor more bandwidth than the
`data traffic to be hidden, and it is powerful as it shields all network
`participants, not only core routers.
`Tarzan allows client applications on participating hosts to talk to
`non-participating Internet servers through special IP tunnels. The
`two ends of a tunnel are a Tarzan node running a client application
`and a Tarzan node running a network address translator; the lat-
`ter forwards the client’s traffic to its ultimate Internet destination.
`Tarzan is transparent to both client applications and servers, though
`it must be installed and configured on participating nodes.
`Tarzan supports a systems-engineering position: anonymity can
`be built-in at the transport layer, transparent to most systems, triv-
`ial to incorporate, and with a tolerable loss of efficiency compared
`to its non-anonymous counterpart. This approach immediately
`reduces the effort required for application writers to incorporate
`anonymity into existing designs, and for users to add anonymity
`without changing existing non-anonymous applications. In the long
`term, the ability for individual anonymizing relays to easily partic-
`ipate in multiple kinds of traffic may make it easier to achieve a
`critical mass of anonymizing relays.
`The rest of this paper is structured as follows. Section 2 explains
`Tarzan’s design goals and threat models. Section 3 describes the de-
`sign of Tarzan: its tunneling architecture, peer discovery and selec-
`tion protocols, and restricted topology and cover traffic mechanism.
`Section 4 presents an analysis of Tarzan’s anonymity properties.
`Section 5 describes Tarzan’s implementation, and Section 6 evalu-
`ates its performance. Section 7 discusses integration transparency,
`Section 8 describes related work, and Section 9 concludes.
`
`2. DESIGN GOALS AND NETWORK MODEL
`This paper uses the following terminology. A node is an Internet
`host’s virtual identity in the system, created by running an instan-
`tiation of the Tarzan software on a single IP address. A tunnel is
`a virtual circuit for communication spread across an ordered se-
`quence of nodes. A relay is a node acting as a packet forwarder as
`part of a tunnel.
`We designed Tarzan to meet a number of goals. Ordered by pri-
`ority, these goals are the following:
`
`1. Application independence: Tarzan should be transparent to
`existing applications and allow users to interact with existing
`services. To achieve this, Tarzan should provide the abstrac-
`tion of an IP tunnel.
`
`2. Anonymity against malicious nodes: Tarzan should pro-
`vide sender or recipient anonymity against colluding nodes.
`That is, a particular host should not be uniquely linkable
`as the sender (recipient) of any message, or that a message
`should not be linkable to any sender (recipient) [20]. We
`consider these properties in terms of an anonymity set: the
`set of possible senders of a message. The larger this set, the
`“more” anonymous an initiator remains.
`
`These properties implies the weaker relationship anonymity:
`an adversary should not be able to identify a pair of hosts as
`communicating with each other, irrespective of which host is
`running Tarzan.
`
`3. Fault-tolerance and availability: Tarzan should resist an
`adversary’s attempts to overload the entire system or to block
`system entry or exit points. Tarzan should minimize the dam-
`age any one adversary can cause by running a few compro-
`mised machines.
`
`unswitched
`LAN
`
`local subnet
`
`border gateway
`
`X
`
`honest routers
`malicious routers
`corrupted domains
`
`honest nodes
`malicious nodes
`spoofed nodes
`
`Figure 1: Tarzan network model. In relation to node X, ad-
`versarial machines can control address spaces and can spoof
`virtual nodes within corrupted domains.
`
`4. Performance: Tarzan should maximize the performance of
`tunnel transmission, subject to our anonymity requirements,
`to make Tarzan a viable IP-level communication channel.
`
`5. Anonymity against a global eavesdropper: An adversary
`observing the entire network should be unable to determine
`which Tarzan relay initiates a particular message. Therefore,
`a node’s traffic patterns should be statistically independent of
`it originating data traffic.
`
`Because anyone can join Tarzan, the system will likely be tar-
`geted by misbehaving users. While a correct host runs only one
`honest node—which forwards packets properly, does not log ad-
`dressing or timing information, and so on—an adversary can run
`potentially many malicious nodes or spoof many fake addresses. A
`node is malicious if it modifies, drops, or records packets, analyzes
`traffic patterns, returns incorrect network information, or otherwise
`does not properly follow the protocols.
`From a naive viewpoint, the fraction of Tarzan nodes that are ma-
`licious determines the probability that a tunnel relay is malicious.
`Yet, a single compromised computer may operate on multiple IP
`addresses and thus present multiple Tarzan identities.
`To defend against such a situation, we make the observation that
`a single machine likely controls only a contiguous range of IP ad-
`dresses, typically by promiscuously receiving packets addressed to
`any IP address on a particular LAN or by acting as a gateway router.
`This observation is useful in bounding the damage each mali-
`cious node can cause. We will call this subnet controllable by a
`single malicious machine a domain.1
`A node belongs to a /d domain if the node’s d-bit IP prefix
`matches that of the domain. Figure 1 shows the dependence of
`intra-domain node failure: a malicious machine “owns” all of the
`address space behind it.
`Domains capture some notion of fault-independence: While an
`adversary can certainly subvert nodes within the same domain in
`
`1Our domain notion is completely unrelated to DNS and applies to
`both IPv4 and IPv6.
`
`CODE200 ET AL. EXHIBIT 1033
`Page 2 of 14
`
`
`
`client
`app
`
`initiator
`
`(App
`
`Priv)
`
`kernel
`divert
`
`src: App
`dst: Dest
`
`relayd
`
`relayd
`
`relayd
`
`(31
`
`17)
`
`(17
`
`59)
`
`Tag: 31
`
`src: Priv
`dst: Dest
`
`Tag: 17
`src: Priv
`dst: Dest
`
`Tag: 59
`src: Priv
`dst: Dest
`
`PNAT
`(Priv
`PNAT)
`
`Internet
`Dest
`
`src: PNAT
`dst: Dest
`
`relayd
`
`Figure 2: Tarzan Architecture Overview: An IP packet is diverted to the local tunnel initiator, which NATs it to a private address
`space, wraps it in several layers of encryption, and sends it to the first relay in UDP. Based on the packet’s flow tag, the relay decrypts
`one layer of the encryption and sends the result to the next relay. The PNAT decrypts the last layer, extracts the original IP packet,
`NATs the packet to its own public address, and writes the raw packet to the Internet.
`
`a dependent fashion, nodes in different domains may fail indepen-
`dently. Therefore, when selecting relays, Tarzan should consider
`the notion of distinct domains, not that of distinct nodes.
`Ideally, we would know the actual size of each domain in address
`space and count all nodes within that address space as a single en-
`tity. However, this internetwork topology is non-uniform and dif-
`ficult to measure. Therefore, Tarzan chooses some fixed IP prefix
`size as its granularity for counting domains: first among /16 sub-
`net masks, then among /24 masks. We believe that this provides a
`reasonable notion of distinct physical and administrative control.2
`
`3. ARCHITECTURE AND DESIGN
`This section describes Tarzan’s design: its basic tunnel mecha-
`nism, its peer-discovery protocol, and its cover-traffic technique.
`Figure 2 shows a simple Tarzan overlay network. All participat-
`ing nodes run software that 1) discovers other participating nodes,
`2) intercepts packets generated by local applications that should be
`anonymized, 3) manages tunnels through chains of other nodes to
`anonymize these packets, 4) forwards packets to implement other
`nodes’ tunnels, and 5) operates a NAT (network address translator)
`to forward other participants’ packets onto the ordinary Internet.
`Typical use proceeds in three stages. First, a node running an
`application that desires anonymity selects a set of nodes to form a
`path through the overlay network. Next, this source-routing node
`establishes a tunnel using these nodes, which includes the distri-
`bution of session keys. Finally, it routes data packets through this
`tunnel. The exit point of the tunnel is a NAT. This NAT forwards
`the anonymized packets to servers that are not aware of Tarzan, and
`it receives the response packets from these servers and reroutes the
`packets over this tunnel.
`Tarzan restricts route selection to pairs of nodes that use cover
`traffic to maintain traffic levels independent of data rates. The sys-
`tem enforces this topology by assigning neighbors in a decentral-
`ized yet verifiable manner.
`Tarzan operates at the IP (Internet Protocol) level and offers a
`best-effort delivery model. End hosts must provide functionality
`like reliability or authentication.
`Tarzan uses layered encryption similar to Chaumian mixes [3]:
`each leg of the tunnel removes or adds a layer of encryption, de-
`pending upon the packet’s direction of traversal. The tunnel initia-
`tor sanitizes IP headers, as well as TCP headers if applicable.
`
`3.1 Packet relay
`A Tarzan tunnel passes two distinct types of messages between
`nodes: data packets, to be relayed through existing tunnels, and
`control packets, containing commands and responses that establish
`and maintain these virtual circuits. Tarzan encapsulates both packet
`types inside UDP.
`A flow tag (similar to MPLS [23]) uniquely identifies each link
`of each tunnel. A relay rapidly determines how to route a packet
`tag. Symmetric encryption hides data, and a MAC protects its in-
`tegrity, on a per-relay basis. Separate keys are used in each direc-
`tion of each relay.
`In the forward path, the tunnel initiator clears each IP packet’s
`source address field, performs a nested encoding for each tun-
`nel relay, and encapsulates the result in a UDP packet. More
`precisely, consider a tunnel that consists of a sequence of nodes
`T = (h1, h2, . . . , hl, hpnat).3 Let the forward encryption and in-
`tegrity keys for each node be ekhi and ikhi , respectively, and let
`seq be the packet sequence number, initiated to zero at the time
`of tunnel establishment. Then, an initiator produces the following
`block Bi for each relay hi in the tunnel, starting with hpnat:
`ci = ENC(ekhi ,{Bi+1})
`ai = MAC(ikhi ,{seq, ci})
`Bi = {seq, ci, ai}
`The origin tags block B1 with the first relay’s flow identifier and
`forwards the result to h1. The first relay extracts the packet’s pay-
`load, determines the relevant keys by its flow identifier, checks the
`block’s integrity, decrypts the block (i.e., strips off one layer of en-
`cryption), retags the resulting block B2, encapsulates it in a new
`UDP packet, and forwards the packet on to the next relay. The
`node drops any packet that fails its integrity check. This process
`continues until the packet reaches the last relay, which strips off the
`innermost layer of encryption, revealing the initiator’s IP packet.
`On the reverse path, each successive relay performs a single en-
`cryption with its appropriate key for the reverse direction, re-tags
`and forwards the packet back towards the origin. This process
`wraps the packet in layers of encryption, which the origin of the
`tunnel must unwrap by performing l +1 decryptions. This design
`places the bulk of the encryption workload on the node seeking
`anonymity. Nodes that are merely relaying perform only a single
`symmetric key operation per packet that is processed.4
`
`2Even years since the introduction of CIDR, active Internet ad-
`dresses still disproportionally belong to network prefixes that re-
`flect classful addressing [17].
`
`3Section 3.7 explains our strange bookkeeping with the last relay.
`4Section 3.7 describes an additional encryption-decryption used
`between immediate nodes.
`
`CODE200 ET AL. EXHIBIT 1033
`Page 3 of 14
`
`
`
`anonymity, any two users can communicate by each creating a tun-
`nel to a different PNAT; each user’s application connects to the
`other’s PNAT to form a double-blinded channel.
`3.4 Tunnel failure and reconstruction
`A tunnel fails if one of its relays stops forwarding packets. To de-
`tect failure, the initiator regularly sends ping messages to the PNAT
`through the tunnel and waits for acknowledgments. Upon multiple
`unsuccessful retries, the initiator attempts to determine the point-
`of-failure by sending pings through the tunnel to each relay.
`If the PNAT is the point-of-failure, i.e., hl still responds to pings,
`the initiator selects a new hpnat for the tunnel. Otherwise, it at-
`tempts to rebuild the tunnel to the original PNAT, so that higher-
`level connections, such as TCP, do not die upon tunnel failure.
`If the furthest node to reply to the ping is hi, for i < l, interme-
`diate relay hi+1 is unreachable. So, the initiator attempts to rebuild
`(cid:2)
`the tunnel from hi forward, T
`, hpnat).
`+1,. . ., h
`= (h1,. . ., hi, h
`Upon multiple unsuccessful attempts, the initiator decrements i by
`one and reattempts reconstruction.
`3.5 Peer discovery
`A Tarzan node requires some means to learn about all other
`nodes in the network, knowing initially only a few other nodes.
`Anything less than near-complete network information allows an
`adversary to bias the distribution of a node’s neighbor set towards
`malicious peers, leaks information through combinatorial profiling
`attacks, and results in inconsistencies during relay selection. Sec-
`tion 4.1 discusses these attacks in more depth.
`Tarzan uses a simple gossip-based protocol for peer discovery.
`Tarzan’s goal—to learn about all network resources—differs from
`recent peer-to-peer lookup protocols [25], which spend great effort
`to achieve immediate information propagation and load balancing
`in a flat namespace, often at the cost of security.
`Gossiping offers a simple mechanism for nodes to learn about
`new neighbors.5 A node can prune inactive neighbors lazily when
`they do not respond to cover traffic establishment requests, which
`we explain further in Section 3.7.
`This problem can be modeled as a directed graph: vertices rep-
`resent Tarzan nodes; edges correspond to the relation that node a
`knows about, and thus can communicate with, node b. Edges are
`added to the graph as nodes discover other peers. We assume that
`the graph is initially weakly connected; otherwise, nodes in sepa-
`rate network partitions could never learn of one another. Tarzan’s
`peer discovery goal is to make this graph fully connected.
`Our technique to grow this network graph is similar to the Name-
`Dropper resource discovery protocol [16]. In each round of Name-
`Dropper, node a simply contacts one neighbor at random and trans-
`fers its entire neighbor set.
`The Tarzan discovery protocol supports three related operations:
`initialization, redirection, and maintenance. Initialization provides
`the bulk-transfer functionality of Name-Dropper, which allows fast
`information propagation. Redirection allows nodes to shed load
`by redirecting new nodes to random neighbors. As this protocol
`progresses, nodes sending entire neighbor sets will transmit many
`elements already known to their recipients, wasting bandwidth.
`In response, maintenance messages provide an incremental up-
`date P of a node’s peer database with only new information
`(P ∩ db = ∅). Tarzan calculates these set differences efficiently
`by performing k-ary searches on prefix-aggregated hashes of the
`set elements. This mechanism is briefly described in Section 4.1.
`5“Gossiping” is a slight misnomer. Traditional gossiping protocols
`assume a fully-connected or fixed network and seek to optimize the
`broadcast of extra information, such as link state.
`
`(cid:2)i
`
`(cid:2)l
`
`h0 = initiator;
`h1 ∈R {h0.neighbors};
`for i = 1 to l
`hi+1 ∈R {hi.neighbors}
`send establish request(hi−1, hi+1) to hi via tunnel;
`rc =wait for establish response;
`if rc ∈ {!ok, timeout}
`i = i − 1;
`while rc ∈ {!ok, timeout}
`if max retries exceeded
`decrement i and break;
`hi+1 ∈R {hi.neighbors};
`send reset forward request(hi+1) to hi;
`rc =wait for reset forward response;
`send establish response(hl) to hpnat via tunnel;
`
`Figure 3: Pseudocode for tunnel establishment protocol
`
`3.2 Tunnel setup
`When forming a tunnel, a Tarzan node pseudo-randomly selects
`a series of nodes from the network based on its local topology (see
`Section 3.7). The initiator is responsible for iteratively setting up
`the entire tunnel, one relay at a time. This process consists mainly
`of generating and distributing the symmetric keys, encrypted un-
`der the relays’ public keys. Section 3.5 describes how an initiator
`discovers nodes and their corresponding public keys. Each node
`generates its public key locally the first time it enters the network.
`The high-level establishment algorithm is shown in Figure 3. An
`establish request sent to node hi is relayed as a normal data packet
`from h1 through hi−1. Node hi cannot distinguish whether the
`packet originated from node hi−1 or from one of that node’s prede-
`cessors; node hi−1 cannot distinguish successive establish requests
`from ordinary tunneled data.
`The initiating node creates an establish request by using the pub-
`lic key of node hi to encrypt the initial forward session key, there-
`after used to decrypt packets received from hi−1. This session key
`encrypts the forward integrity key, the subsequent reverse keys for
`packets from hi+1, the addresses of hi−1 and hi+1, and the flow
`identifiers that will be used to tag packets going in each direction.
`When hi has successfully stored the state for this request, it re-
`sponds to the origin for an end-to-end check of correctness.
`For path length l, this algorithm takes O(l) public-key opera-
`tions and O(l2) inter-relay messages to complete. This overhead is
`sufficiently small for realistic choices of l.
`3.3 IP packet forwarding
`Tarzan provides a client IP forwarder and a server-side pseudony-
`mous network address translator (PNAT) to create a generic anony-
`mizing IP tunnel. The IP forwarder diverts certain packets from
`the client’s network stack that matches user-specified IP firewall
`rules and ships them over a Tarzan tunnel. The client forwarder
`replaces its real address in the packets with a random address as-
`signed by the PNAT from the reserved private address space. The
`PNAT translates this private address to one of its real addresses.
`Remote hosts can communicate with PNAT normally, as if it origi-
`nated the traffic. Correspondingly, response packets are deNAT’ed
`twice, once at each end of the tunnel.
`The IP forwarder only hides the Internet Protocol address and
`special header fields, such as origin port numbers, for TCP and
`UDP packets. Section 7 discusses ways of coping with applications
`that require more work than this to anonymize.
`The pseudonymous NAT also offers port forwarding to allow or-
`dinary Internet hosts to connect through Tarzan tunnels to anony-
`mous servers.
`In fact,
`to achieve both sender and recipient
`
`CODE200 ET AL. EXHIBIT 1033
`Page 4 of 14
`
`
`
`// Let Ua be the set of a’s unvalidated known peers
`// Let Va be the set of a’s validated known peers
`a.gossip()
`while true
`if (Ua = ∅), Ua = Va;
`b ∈R Ua;
`|Vb|)
`if (|Va| < 1
`c
`b.busy ? a.redirect(b) : a.initialize(b);
`else if (|Vb| < 1
`|Va|)
`c
`a.busy ? b.redirect(a) : b.initialize(a);
`else
`a.maintain(b);
`
`b.maintain(a);
`
`Figure 4: Pseudocode for the peer discovery protocol
`
`Tarzan differentiates between unvalidated addresses (Ua) and
`validated addresses (Va) in a node’s peer database. A node learns
`{ipaddr, port, hash(pubkey)} tuples through gossiping: these un-
`validated values can easily be forged.
`A node validates a tuple once its corresponding peer correctly re-
`sponds to a discovery request sent directly to its gossiped address.
`The request includes a random nonce. This two-way network hand-
`shake is a weak yet practical authentication mechanism to show a
`node “speaks for” its address. This validation distinction stops an
`adversary from injecting arbitrary tuples into a peer database and
`later impersonating a streak of invalid addresses following it in a
`tunnel (see Section 4.1).
`Figure 4 shows the main gossip protocol. To join the system, a
`new node a contacts some existing node b to discover a new set
`of unvalidated addresses. Node a validates b once a receives a
`response. Node a successively contacts the new neighbors in Ua
`before retrying neighbors in Va.
`Running the discovery protocol, a node learns about and vali-
`dates all other nodes in the network in O(n) connections.
`
`3.6 Peer selection
`This section describes Tarzan’s method for selecting nodes from
`this peer database.
`One may be tempted to simply choose nodes completely at ran-
`dom from Va. This approach is problematic: if an adversary runs as
`many Tarzan nodes as IP addresses to which it has access, a user is
`very likely to select malicious nodes. However, these addresses are
`rarely scattered uniformly through the IP address space. Instead,
`they are often located in the same IP prefix space. Thus, we choose
`among distinct IP prefixes, not among all known IP addresses.
`We select nodes by choosing randomly among the populated do-
`mains at each level of the table in Figure 5. Tarzan uses a three-level
`hierarchy: first among all known /16 subnets, then among /24 sub-
`nets belonging to this 16-bit address space, then among the relevant
`IP addresses.
`A node generates this table by inserting all peers in Va into their
`corresponding identifier rings. The leading d-bits of a node’s IP
`address are transformed to an identifier via hash(ipaddr /d , date),
`where hash is a cryptographic hash function and date is the day-
`of-the-month according to GMT. Identifiers are ordered on their
`|id|
`corresponding rings modulo 2
`.
`Therefore, Tarzan’s lookup(key) method selects peers as fol-
`lows: Node a first generates identifier id16 via hash(key/16 , date)
`and finds the smallest identifier ≥ id16 (with wrap-around) on the
`/0 identifier ring; it repeats this process recursively with id24 and
`id32 on their corresponding rings of increased specificity.
`Note that a node executes lookup completely locally, based on in-
`formation already accumulated in its peer database. Therefore, two
`
`/0
`
`18D3
`
`3CB8
`
`49A1
`
`58E2
`
`712F
`
`9D37
`
`B541
`
`CA13
`
`F72A
`
`18.26/16
`
`21F8
`
`3A25
`
`45F1
`
`5212
`
`7C38
`
`94D1
`
`B1E3
`
`E436
`
`18.26.4/24
`
`23A5
`
`4F9A
`
`61D1
`
`974F
`
`B11A
`
`Shown is a
`Figure 5: Peer selection on validated nodes.
`lookup(key) with id16 = 541A, id24 = 82F1, and id32 = 261B.
`This ultimately maps to the hash value 4F9A, which yields a
`node with IP address 18.26.4.9.
`
`nodes may have slightly different lookup structure replicas, which
`can yield temporarily inconsistent results. We return to the impact
`of inconsistencies in the next section.
`Tarzan includes the date in identifier hashes to daily reorder of
`ring elements. This randomization stops any particular domain or
`address from owning a larger space in the ring for any duration.
`Furthermore, this rebalancing reorders the validated set daily, ran-
`domizing how nodes propagate their neighbors during maintain.
`3.7 Cover traffic and link encoding
`If the pattern of inter-node Tarzan traffic varied with usage, a
`wide-spread eavesdropper could analyze the patterns to link mes-
`sages to their initiators. Prior work has suggested the use of cover
`traffic to provide more time-invariant traffic patterns independent of
`bandwidth demands [3]. Such traffic provides a node with stronger
`plausible deniability that it is the actual message initiator.
`Our key contributions include introducing the concept of a traffic
`mimic. We propose traffic invariants between a node and its mimics
`that protect against information leakage. These invariants require
`some use of cover traffic and yield an anonymity set exponential in
`path length.
`
`3.7.1 Selecting mimics
`Upon joining the network, node a asks k other nodes to exchange
`mimic traffic with it. Similarly, an expected k nodes select a as they
`look for their own mimics. Thus, each node has κ mimics, where
`E(κ) = 2k for some global parameter k. Mimics are assigned
`verifiably at random from the set of nodes in the network.
`A node establishes a bidirectional, time-invariant packet stream
`with a mimic node, into which real data can be inserted, indistin-
`guishable from the cover traffic.
`This mimic relationship must be symmetric for three reasons.
`First, a otherwise would send data only on its outgoing links, not
`trusting its incoming mimic connections. This practice halves a’s
`anonymity set on average. Second, and related, variations in host
`density behind different IP prefixes may account for some nodes
`receiving few incoming connections. Second, a otherwise would
`not be incentivized to provide cover traffic on its incoming links.
`
`CODE200 ET AL. EXHIBIT 1033
`Page 5 of 14
`
`
`
`PNAT
`
`Figure 6: Mimic topology and traffic flows for k = 3: Each node
`has κ ≈ 6 mimics (shown connected by solid lines). A node es-
`tablishes its tunnel over mimic links (arrows in bold) and com-
`pletes it with a random PNAT (dotted line). Nodes inject cover
`traffic with data traffic to yield a uniform total traffic T that
`flows bidirectionally over the solid lines. All packets on these
`links are pair-wise encrypted and padded to the same size.
`
`This mimic relationship must be universally verifiable to stop an
`adversary from selecting more than k mimics. As mimics will be
`used for tunnel establishment, an adversary could otherwise bias a
`node’s choice of tunnel relays.
`Node a chooses its ith mimic, or Ma
`i , as the peer returned by
`lookupi(a.ipaddr). This function is similar to that in Section 3.6,
`except that the identifier idi
`d is generated by recursively applying
`the cryptographic hash function i times to {a.ipaddr /d , date},
`i ≤ (k +1). Therefore, we map domains to domains. We explain
`this design choice in Section 4.1.
`Node a sends a mimic request, including the tuple {a.ipaddr , i},
`to Ma
`i , which we refer to as b. Node b accepts a mimic establish-
`ment request from node a if and only if the following hold:
`• 1 < i ≤ (k+1)
`• b.lookupi(a.ipaddr) = b
`If the lookupi check fails, we must consider two cases. First,
`node b can have a different network view than a:
`its execution
`b.lookupi(a.ipaddr) maps to a different node. This inconsistency
`occurs when b knows a ring identifier between a.ipaddr/d and b/d,
`unknown to a. Node b declines the mimic request but returns the
`identifier’s corresponding node, to which a sends a new request.
`Second, a may initially contact node c and receive no response,
`signifying c’s failure. Node a removes c from Va, and a.lookupi
`now maps to b. However, b is not aware of this failure, thus a’s new
`request includes the message that c has failed. To verify, b pings
`c and waits for the acknowledgment to timeout, at which time b
`removes c from Vb and thus accepts a’s mimic request.
`If a looses connectivity to its ith mimic, a removes it from Va
`and replaces it with the new node mapped from lo