throbber
OASIS: Anycast for Any Service
`∗‡, Karthik Lakshminarayanan†, David Mazières‡
`Michael J. Freedman
`∗
`New York University, †U.C. Berkeley, ‡Stanford University
`http://www.coralcdn.org/oasis/
`
`Abstract
`
`Global anycast, an important building block for many dis-
`tributed services, faces several challenging requirements.
`First, anycast response must be fast and accurate. Sec-
`ond, the anycast system must minimize probing to re-
`duce the risk of abuse complaints. Third, the system must
`scale to many services and provide high availability. Fi-
`nally, and most importantly, such a system must integrate
`seamlessly with unmodified client applications. In short,
`when a new client makes an anycast query for a service,
`the anycast system must ideally return an accurate reply
`without performing any probing at all.
`This paper presents OASIS, a distributed anycast sys-
`tem that addresses these challenges. Since OASIS is
`shared across many application services, it amortizes de-
`ployment and network measurement costs; yet to facil-
`itate sharing, OASIS has to maintain network locality
`information in an application-independent way. OASIS
`achieves these goals by mapping different portions of the
`Internet in advance (based on IP prefixes) to the geo-
`graphic coordinates of the nearest known landmark. Mea-
`surements from a preliminary deployment show that OA-
`SIS, surprisingly, provides a significant improvement in
`the performance that clients experience over state-of-the-
`art on-demand probing and coordinate systems, while in-
`curring much less network overhead.
`
`1
`
`Introduction
`
`Many Internet services are distributed across a collec-
`tion of servers that handle client requests. For example,
`high-volume web sites are typically replicated at mul-
`tiple locations for performance and availability. Con-
`tent distribution networks amplify a website’s capacity by
`serving clients through a large network of web proxies.
`File-sharing and VoIP systems use rendezvous servers to
`bridge hosts behind NATs.
`The performance and cost of such systems depend
`highly on the servers that clients select. For example,
`file download times can vary greatly based on the local-
`ity and load of the chosen replica. Furthermore, a service
`provider’s costs may depend on the load spikes that the
`
`server-selection mechanism produces, as many data cen-
`ters charge customers based on the 95th-percentile usage
`over all five-minute periods in a month.
`Unfortunately, common techniques for replica selec-
`tion produce sub-optimal results. Asking human users to
`select the best replica is both inconvenient and inaccurate.
`Round-robin and other primitive DNS techniques spread
`load, but do little for network locality.
`More recently, sophisticated techniques for server-
`selection have been developed. When a legacy client ini-
`tiates an anycast request, these techniques typically probe
`the client from a number of vantage points, and then use
`this information to find the closest server. While efforts,
`such as virtual coordinate systems [6, 28] and on-demand
`probing overlays [40, 46], seek to reduce the probing
`overhead, the savings in overhead comes at the cost of
`accuracy of the system.
`Nevertheless, significant on-demand probing is still
`necessary for all these techniques, and this overhead is
`reincurred by every new deployed service. While on-
`demand probing potentially offers greater accuracy, it has
`several drawbacks that we have experienced first-hand in
`a previously deployed system [10]. First, probing adds
`latency, which can be significant for small web requests.
`Second, performing several probes to a client often trig-
`gers intrusion-detection alerts, resulting in abuse com-
`plaints. This mundane problem can pose real operational
`challenges for a deployed system.
`This paper presents OASIS (Overlay-based Anycast
`Service InfraStructure), a shared locality-aware server se-
`lection infrastructure. OASIS is organized as an infras-
`tructure overlay, providing high availability and scalabil-
`ity. OASIS allows a service to register a list of servers,
`then answers the query, “Which server should the client
`contact?” Selection is primarily optimized for network
`locality, but also incorporates liveness and load. OA-
`SIS can, for instance, be used by CGI scripts to redi-
`rect clients to an appropriate web mirror. It can locate
`servers for IP anycast proxies [2], or it can select dis-
`tributed SMTP servers in large email services [26].
`To eliminate on-demand probing when clients make
`anycast requests, OASIS probes clients in the back-
`ground. One of OASIS’s main contributions is a set of
`
`USENIX Association
`
`NSDI ’06: 3rd Symposium on Networked Systems Design & Implementation
`129
`Data Co Exhibit 1047
`Data Co v. Bright Data
`
`

`

`Keyword
`abuse
`attack
`blacklist
`block
`complaint
`flood
`
`Threads Msgs
`198
`888
`98
`462
`32
`158
`168
`898
`216
`984
`4
`30
`
`Keyword
`ICMP
`IDS
`intrusion
`scan
`trojan
`virus
`
`Threads Msgs
`64
`308
`60
`222
`14
`104
`118
`474
`10
`56
`24
`82
`
`Figure 1: Frequency count of keywords in PlanetLab support-
`community archives from 14-Dec-04 through 30-Sep-05, com-
`prising 4682 messages and 1820 threads. Values report num-
`ber of messages and unique threads containing keyword.
`
`techniques that makes it practical to measure the entire
`Internet in advance. By leveraging the locality of the IP
`prefixes [12], OASIS probes only each prefix, not each
`client; in practice, IP prefixes from BGP dumps are used
`as a starting point. OASIS delegates measurements to the
`service replicas themselves, thus amortizing costs (ap-
`proximately 2–10 GB/week) across multiple services, re-
`sulting in an acceptable per-node cost.
`To share OASIS across services and to make back-
`ground probing feasible, OASIS requires stable network
`coordinates for maintaining locality information. Unfor-
`tunately, virtual coordinates tend to drift over time. Thus,
`since OASIS seeks to probe an IP prefix as infrequently as
`once a week, virtual coordinates would not provide suf-
`ficient accuracy.
`Instead, OASIS stores the geographic
`coordinates of the replica closest to each prefix it maps.
`OASIS is publicly deployed on PlanetLab [34] and
`has already been adopted by a number of services, in-
`cluding ChunkCast [5], CoralCDN [10], Na Kika [14],
`OCALA [19], and OpenDHT [37]. Currently, we have
`implemented a DNS redirector that performs server se-
`lection upon hostname lookups, thus supporting a wide
`range of unmodified client applications. We also provide
`an HTTP and RPC interface to expose its anycast and
`locality-estimation functions to OASIS-aware hosts.
`Experiments from our deployment have shown rather
`surprisingly that the accuracy of OASIS is competitive
`with Meridian [46], currently the best on-demand probing
`system. In fact, OASIS performs better than all replica-
`selection schemes we evaluated across a variety of met-
`rics, including resolution and end-to-end download times
`for simulated web sessions, while incurring much less
`network overhead.
`
`2 Design
`
`An anycast infrastructure like OASIS faces three main
`challenges. First, network peculiarities are fundamen-
`tal to Internet-scale distributed systems. Large latency
`fluctuations, non-transitive routing [11], and middleboxes
`such as transparent web proxies, NATs, and firewalls can
`
`produce wildly inaccurate network measurements and
`hence suboptimal anycast results.
`Second, the system must balance the goals of accu-
`racy, response time, scalability, and availability. In gen-
`eral, using more measurements from a wider range of
`vantage points should result in greater accuracy. How-
`ever, probing clients on-demand increases latency and
`may overemphasize transient network conditions. A bet-
`ter approach is to probe networks in advance. However,
`services do not know which clients to probe apriori, so
`this approach effectively requires measuring the whole
`Internet, a seemingly daunting task.
`A shared infrastructure, however, can spread measure-
`ment costs over many hosts and gain more network van-
`tage points. Of course, these hosts may not be reliable.
`While structured peer-to-peer systems [39, 42] can, the-
`oretically, deal well with unreliable hosts, such protocols
`add significant complexity and latency to a system and
`break compatibility with existing clients. For example,
`DNS resolvers and web browsers deal poorly with un-
`available hosts since hosts cache stale addresses longer
`than appropriate.
`Third, even with a large pool of hosts over which to
`amortize measurement costs, it is important to minimize
`the rate at which any network is probed. Past experi-
`ence [10] has shown us that repeatedly sending unusual
`packets to a given destination often triggers intrusion de-
`tection systems and results in abuse complaints. For ex-
`ample, PlanetLab’s support-community mailing list re-
`ceives thousands of complaints yearly due to systems that
`perform active probing; Figure 1 lists the number and
`types of complaints received over one ten-month period.
`They range from benign inquiries to blustery threats to
`drastic measures such as blacklisting IP addresses and en-
`tire netblocks. Such measures are not just an annoyance;
`they impair the system’s ability to function.
`This section describes how OASIS’s design tackles the
`above challenges. A two-tier architecture (§2.1) com-
`bines a reliable core of hosts that implement anycast with
`a larger number of replicas belonging to different services
`that also assist in network measurement. OASIS mini-
`mizes probing and reduces susceptibility to network pe-
`culiarities by exploiting geographic coordinates as a ba-
`sis for locality (§2.2.2). Every replica knows its latitude
`and longitude, which already provides some information
`about locality before any network measurement. Then,
`in the background, OASIS estimates the geographic co-
`ordinates of every netblock on the Internet. Because the
`physical location of IP prefixes rarely changes [36], an
`accurately pinpointed network can be safely re-probed
`very infrequently (say, once a week). Such infrequent,
`background probing both reduces the risk of abuse com-
`plaints and allows fast replies to anycast requests with no
`need for on-demand probing.
`
`130
`
`NSDI ’06: 3rd Symposium on Networked Systems Design & Implementation
`
`USENIX Association
`
`

`

`Service 1
`
`Service 2
`
`Service 1
`Replica
`Srv
`
`Replica
`Srv
`
`OASIS
`
`Clients
`
`Figure 2: OASIS system overview
`
`2.1 System overview
`
`Figure 2 shows OASIS’s high-level architecture. The sys-
`tem consists of a network of core nodes that help clients
`select appropriate replicas of various services. All ser-
`vices employ the same core nodes; we intend this set of
`infrastructure nodes to be small enough and sufficiently
`reliable so that every core node can know most of the oth-
`ers. Replicas also run OASIS-specific code, both to report
`their own load and liveness information to the core, and
`to assist the core with network measurements. Clients
`need not run any special code to use OASIS, because the
`core nodes provide DNS- and HTTP-based redirection
`services. An RPC interface is also available to OASIS-
`aware clients.
`Though the three roles of core node, client, and replica
`are distinct, the same physical host often plays multiple
`roles. In particular, core nodes are all replicas of the OA-
`SIS RPC service, and often of the DNS and HTTP redi-
`rection services as well. Thus, replicas and clients typi-
`cally use OASIS itself to find a nearby core node.
`Figure 3 shows various ways in which clients and
`services can use OASIS. The top diagram shows an
`OASIS-aware client, which uses DNS-redirection to se-
`lect a nearby replica of the OASIS RPC service (i.e., a
`core node), then queries that node to determine the best
`replica of Service 1.
`The middle diagram shows how to make legacy clients
`select replicas using DNS redirection.
`The service
`provider advertises a domain name served by OASIS.
`When a client looks up that domain name, OASIS first
`redirects the client’s resolver to a nearby replica of the
`DNS service (which the resolver will cache for future ac-
`cesses). The nearby DNS server then returns the address
`of a Service 2 replica suitable for the client. This result
`can be accurate if clients are near their resolvers, which
`is often the case [24].
`The bottom diagram shows a third technique, based on
`service-level (e.g., HTTP) redirection. Here the replicas
`of Service 3 are also clients of the OASIS RPC service.
`Each replica connects to a nearby OASIS core node se-
`lected by DNS redirection. When a client connects to a
`replica, that replica queries OASIS to find a better replica,
`
`3: APP
`
`Client
`
`Resolv
`
`2: RPC
`
`J
`OASIS
`core
`
`OASIS
`
`I
`OASIS
`core
`
`1: DNS
`
`Service 2
`Replica
`Srv
`
`Replica
`Srv
`
`3: APP
`
`Client
`
`Resolv
`
`2: DNS
`
`J
`OASIS
`core
`
`OASIS
`
`I
`OASIS
`core
`
`1: DNS
`
`Service 3
`Replica
`Srv
`
`K
`Replica
`Srv
`
`0: DNS
`
`3: APP
`
`1: APP
`
`Client
`
`2: RPC
`
`J
`OASIS
`core
`
`OASIS
`
`I
`OASIS
`core
`
`Figure 3: Various methods of using OASIS via its DNS or RPC
`interfaces, and the steps involved in each anycast request.
`
`then redirects the client. Such an approach does not re-
`quire that clients be located near their resolvers in order
`to achieve high accuracy.
`This paper largely focuses on DNS redirection, since it
`is the easiest to integrate with existing applications.
`
`2.2 Design decisions
`
`Given a client IP address and service name, the primary
`function of the OASIS core is to return a suitable service
`replica. For example, an OASIS nameserver calls its core
`node with the client resolver’s IP address and a service
`name extracted from the requested domain name (e.g.,
`coralcdn.nyuld.net indicates service coralcdn).
`Figure 4 shows how OASIS resolves an anycast re-
`quest. First, a core node maps the client IP address to
`a network bucket, which aggregates adjacent IP addresses
`into netblocks of co-located hosts. It then attempts to map
`the bucket to a location (i.e., coordinates). If successful,
`OASIS returns the closest service replica to that location
`(unless load-balancing requires otherwise, as described
`in §3.4). Otherwise, if it cannot determine the client’s
`location, it returns a random replica.
`The anycast process relies on four databases main-
`tained in a distributed manner by the core: (1) a service
`table lists all services using OASIS (and records policy
`information for each service), (2) a bucketing table maps
`IP addresses to buckets, (3) a proximity table maps buck-
`ets to locations, and (4) one liveness table per service in-
`
`USENIX Association
`
`NSDI ’06: 3rd Symposium on Networked Systems Design & Implementation
`
`131
`
`

`

`20
`
`40
`
`120
`100
`80
`60
`Geographic coodinate distance (degrees)
`
`140
`
`160
`
`180
`
`500
`
`400
`
`300
`
`200
`
`100
`
`0
`
`0
`
`Round-trip-time(ms)
`
`anycast request
`
`IP addr
`
`name
`
`bucketing
`
`IP prefix
`
`service
`
`policy
`
`proximity
`
`coords
`
`liveness
`
`response
`
`Figure 4: Logical steps to answer an anycast request
`
`Figure 5: Correlation between round-trip-times and geo-
`graphic distance across all PlanetLab hosts [43].
`
`cludes all live replicas belonging to the service and their
`corresponding information (e.g., coordinates, load, and
`capacity).
`
`2.2.1 Buckets: The granularity of mapping hosts
`
`OASIS must balance the precision of identifying a
`client’s network location with its state requirements. One
`strawman solution is simply to probe every IP address
`ever seen and cache results for future requests. Many
`services have too large a client population for such an
`approach to be attractive. For DNS redirection, probing
`each DNS resolver would be practical if the total num-
`ber of resolvers were small and constant. Unfortunately,
`measurements at DNS root servers [23] have shown many
`resolvers use dynamically-assigned addresses, thus pre-
`cluding a small working set.
`Fortunately, our previous research has shown that IP
`aggregation by prefix often preserves locality [12]. For
`example, more than 99% of /24 IP prefixes announced
`by stub autonomous systems (and 97% of /24 prefixes
`announced by all autonomous systems) are at the same lo-
`cation. Thus, we aggregate IP addresses using IP prefixes
`as advertised by BGP, using BGP dumps from Route-
`Views [38] as a starting point.1
`However, some IP prefixes (especially larger prefixes)
`do not preserve locality [12]. OASIS discovers and
`adapts to these cases by splitting prefixes that exhibit
`poor locality precision,2 an idea originally proposed by
`IP2Geo [30]. Using IP prefixes as network buckets not
`only improves scalability by reducing probing and state
`requirements, but also provides a concrete set of targets
`to precompute, and hence avoid on-demand probing.
`
`2.2.2 Geographic coordinates for location
`
`OASIS takes a two-pronged approach to locate IP pre-
`fixes: We first use a direct probing mechanism [46] to
`
`1For completeness, we also note that OASIS currently supports ag-
`gregating by the less-locality-preserving autonomous system number,
`although we do not present the corresponding results in this paper.
`2We deem that a prefix exhibits poor locality if probing different IP
`addresses within the prefix yields coordinates with high variance.
`
`find the replica closest to the prefix, regardless of ser-
`vice. Then, we represent the prefix by the geographic co-
`ordinates of this closest replica and its measured round-
`trip-time to the prefix. We assume that all replicas know
`their latitude and longitude, which can easily be obtained
`from a variety of online services [13]. Note that OASIS’s
`shared infrastructure design helps increase the number of
`vantage points and thus improves its likelihood of having
`a replica near the prefix.
`While geographic coordinates are certainly not optimal
`predictors of round-trip-times, they work well in practice:
`The heavy band in Figure 5 shows a strong linear cor-
`relation between geographic distance and RTT. In fact,
`anycast only has the weaker requirement of predicting a
`relative ordering of nodes for a prefix, not an accurate
`RTT estimation. For comparison, we also implemented
`Vivaldi [6] and GNP [28] coordinates within OASIS; §5
`includes some comparison results.
`
`Time- and service-invariant coordinates. Since geo-
`graphic coordinates are stable over time, they allow OA-
`SIS to probe each prefix infrequently. Since geographic
`coordinates are independent of the services, they can be
`shared across services—an important requirement since
`OASIS is designed as a shared infrastructure. Geographic
`coordinates remain valid even if the closest replica fails.
`In contrast, virtual coordinate systems [6, 28] fall short of
`providing either accuracy or stability [40, 46]. Similarly,
`simply recording a prefix’s nearest replica—without its
`corresponding geographic coordinates—is useless if that
`nearest replica fails. Such an approach also requires a
`separate mapping per service.
`
`Absolute error predictor. Another advantage of our
`two-pronged approach is that the RTT between a prefix
`and its closest replica is an absolute bound on the accu-
`racy of the prefix’s estimated location. This bound sug-
`gests a useful heuristic for deciding when to re-probe a
`prefix to find a better replica.
`If the RTT is small (a
`few milliseconds), reprobing is likely to have little ef-
`fect. Conversely, reprobing prefixes having high RTTs
`to their closest replica can help improve accuracy when
`
`132
`
`NSDI ’06: 3rd Symposium on Networked Systems Design & Implementation
`
`USENIX Association
`
`

`

`previous attempts missed the best replica or newly-joined
`replicas are closer to the prefix. Furthermore, a prefix’s
`geographic coordinates will not change unless it is probed
`by a closer replica. Of course, IP prefixes can physically
`move, but this happens rarely enough [36] that OASIS
`only expires coordinates after one week. Moving a net-
`work can therefore result in sub-optimal predictions for
`at most one week.
`
`Sanity checking. A number of network peculiarities
`can cause incorrect network measurements. For exam-
`ple, a replica behind a transparent web proxy may erro-
`neously measure a short RTT to some IP prefix, when in
`fact it has only connected to the proxy. Replicas behind
`firewalls may believe they are pinging a remote network’s
`firewall, when really they are probing their own. OASIS
`employs a number of tests to detect such situations (see
`§6). As a final safeguard, however, the core only accepts
`a prefix-to-coordinate mapping after seeing two consis-
`tent measurements from replicas on different networks.
`In hindsight, another benefit of geographic coordinates
`is the ability to couple them with real-time visualization
`of the network [29], which has helped us identify, debug,
`and subsequently handle various network peculiarities.
`
`2.2.3 System management and data replication
`
`To achieve scalability and robustness, the location infor-
`mation of prefixes must be made available to all core
`nodes. We now describe OASIS’s main system manage-
`ment and data organization techniques.
`
`Global membership view. Every OASIS core node
`maintains a weakly-consistent view of all other nodes in
`the core, where each node is identified by its IP address, a
`globally-unique node identifier, and an incarnation num-
`ber. To avoid O(n2) probing (where n is the network
`size), core nodes detect and share failure information co-
`operatively: every core node probes a random neighbor
`each time period (3 seconds) and, if it fails to receive a
`response, gossips its suspicion of failure.
`Two techniques suggested by SWIM [7] reduce false
`failure announcements. First, several intermediates are
`chosen to probe this target before the initiator announces
`its suspicion of failure. Intermediaries alleviate the prob-
`lems caused by non-transitive Internet routing [11]. Sec-
`ond, incarnation numbers help disambiguate failure mes-
`sages: alive messages for incarnation i override anything
`for j < i; suspect for i overrides anything for j ≤ i. If a
`node learns that it is suspected of failure, it increments its
`incarnation number and gossips its new number as alive.
`A node will only conclude that another node with incar-
`nation i is dead if it has not received a corresponding alive
`message for j > i after some time (3 minutes). This ap-
`
`HTTPD
`App server
`
`Replica
`
`DNS
`
`RPC
`
`Replica
`
`OASIS server
`
`OASIS
`core
`node
`
`DB
`
`Figure 6: OASIS system components
`
`proach provides live nodes with sufficient time to respond
`to and correct false suspicions of failure.
`Implicit in this design is the assumption that nodes
`are relatively stable; otherwise, the system would incur
`a high bandwidth cost for failure announcements. Given
`that OASIS is designed as an infrastructure service—to
`be deployed either by one service provider or a small
`number of cooperating providers—we believe that this
`assumption is reasonable.
`
`Consistent hashing. OASIS tasks must be assigned to
`nodes in some globally-known yet fully-decentralized
`manner. For example, to decide the responsibility of
`mapping specific IP prefixes, we partition the set of pre-
`fixes over all nodes. Similarly, we assign specific nodes
`to play the role of a service rendezvous to aggregate in-
`formation about a particular service (described in §3.3).
`OASIS provides this assignment through consistent
`hashing [20]. Each node has a random identifier; several
`nodes with identifiers closest to a key—e.g., the SHA-1
`hash of the IP prefix or service name—in the identifier
`space are assigned the corresponding task. Finding these
`nodes is easy since all nodes have a global view. While
`nodes’ views of the set of closest nodes are not guaran-
`teed to be consistent, views can be easily reconciled using
`nodes’ incarnation numbers.
`
`Gossiping. OASIS uses gossiping to efficiently dissem-
`inate messages—about node failures, service policies,
`prefix coordinates—throughout the network [7]. Each
`node maintains a buffer of messages to be piggybacked
`on other system messages to random nodes. Each node
`gossips each message O(log n) times for n-node net-
`works; such an epidemic algorithm propagates a message
`to all nodes in logarithmic time with high probability.3
`
`Soft-state replica registration. OASIS must know all
`replicas belonging to a service in order to answer corre-
`sponding anycast requests. To tolerate replica failures ro-
`bustly, replica information is maintained using soft-state:
`
`3While structured gossiping based on consistent hashing could re-
`duce the bandwidth overhead needed to disseminate a message [3], we
`use a randomized epidemic scheme for simplicity.
`
`USENIX Association
`
`NSDI ’06: 3rd Symposium on Networked Systems Design & Implementation
`
`133
`
`

`

`replicas periodically send registration messages to core
`nodes (currently, every 60 seconds).
`Hosts running services that use OASIS for anycast—
`such as the web server shown in Figure 6—run a sepa-
`rate replica process that connects to their local application
`(i.e., the web server) every keepalive period (currently set
`to 15 seconds). The application responds with its current
`load and capacity. While the local application remains
`alive, the replica continues to refresh its locality, load,
`and capacity with its OASIS core node.
`
`Closest-node discovery. OASIS offloads all measure-
`ment costs to service replicas. All replicas, belonging
`to different services, form a lightweight overlay, in or-
`der to answer closest-replica queries from core nodes.
`Each replica organizes its neighbors into concentric rings
`of exponentially-increasing radii, as proposed by Merid-
`ian [46]: A replica accepts a neighbor for ring i only if
`its RTT is between 2i and 2i+1 milliseconds. To find the
`closest replica to a destination d, a query operates in suc-
`cessive steps that “zero in” on the closest node in an ex-
`pected O(log n) steps. At each step, a replica with RTT
`r from d chooses neighbors to probe d, restricting its se-
`
`
`2 r.lection to those with RTTs (to itself) between 12 r and 3
`The replica continues the search on its neighbor returning
`the minimum RTT to d. The search stops when the latest
`replica knows of no other potentially-closer nodes.
`Our implementation differs from [46] in that we per-
`form closest routing iteratively, as opposed to recursively:
`The first replica in a query initiates each progressive
`search step. This design trades overlay routing speed for
`greater robustness to packet loss.
`
`3 Architecture
`
`In this section, we describe the distributed architecture of
`OASIS in more detail:
`its distributed management and
`collection of data, locality and load optimizations, scala-
`bility, and security properties.
`
`3.1 Managing information
`
`We now describe how OASIS manages the four tables
`described in §2.2. OASIS optimizes response time by
`heavily replicating most information. Service, bucketing,
`and proximity information need only be weakly consis-
`tent; stale information only affects system performance,
`not its correctness. On the other hand, replica liveness
`information must be more fresh.
`
`Service table. When a service initially registers with
`OASIS,
`it includes a service policy that specifies its
`service name and any domain name aliases, its desired
`server-selection algorithm, a public signature key, the
`
`maximum and minimum number of addresses to be in-
`cluded in responses, and the TTLs of these responses.
`Each core node maintains a local copy of the service table
`to be able to efficiently handle requests. When a new ser-
`vice joins OASIS or updates its existing policy, its policy
`is disseminated throughout the system by gossiping.
`The server-selection algorithm specifies how to order
`replicas as a function of their distance, load, and total
`capacity when answering anycast requests. By default,
`OASIS ranks nodes by their coordinate distance to the
`target, favoring nodes with excess capacity to break ties.
`The optional signature key is used to authorize replicas
`registering with an OASIS core node as belonging to the
`service (see §3.5).
`
`Bucketing table. An OASIS core node uses its buck-
`eting table to map IP addresses to IP prefixes. We boot-
`strap the table using BGP feeds from RouteViews [38],
`which has approximately 200,000 prefixes. A PATRICIA
`trie [27] efficiently maps IP addresses to prefixes using
`longest-prefix matching.
`When core nodes modify their bucketing table by split-
`ting or merging prefixes [30], these changes are gossiped
`in order to keep nodes’ tables weakly consistent. Again,
`stale information does not affect system correctness: pre-
`fix withdrawals are only used to reduce system state,
`while announcements are used only to identify more pre-
`cise coordinates for a prefix.
`
`Proximity table. When populating the proximity table,
`OASIS seeks to find accurate coordinates for every IP
`prefix, while preventing unnecessary reprobing.
`OASIS maps an IP prefix to the coordinates of its clos-
`est replica. To discover the closest replica, an core node
`first selects an IP address from within the prefix and is-
`sues a probing request to a known replica (or first queries
`a neighbor to discover one). The selected replica tracer-
`outes the requested IP to find the last routable IP address,
`performs closest-node discovery using the replica overlay
`(see §2.2.3), and, finally, returns the coordinates of the
`nearest replica and its RTT distance from the target IP.
`If the prefix’s previously recorded coordinate has either
`expired or has a larger RTT from the prefix, the OASIS
`core node reassigns the prefix to these new coordinates
`and starts gossiping this information.
`To prevent many nodes from probing the same IP pre-
`fix, the system assigns prefixes to nodes using consistent
`hashing. That is, several nodes closest to hash(prefix) are
`responsible for probing the prefix (three by default). All
`nodes go through their subset of assigned prefixes in ran-
`dom order, probing the prefix if its coordinates have not
`been updated within the last Tp seconds. Tp is a function
`of the coordinate’s error, such that highly-accurate coor-
`dinates are probed at a slower rate (see §2.2.2).
`
`134
`
`NSDI ’06: 3rd Symposium on Networked Systems Design & Implementation
`
`USENIX Association
`
`

`

`SJ
`OASIS
`rendezv
`
`coralcdn
`
`SI
`OASIS
`rendezv
`dns
`
`4
`
`3
`
`J
`OASIS
`core
`
`Resolv
`
`OASIS
`Consistent
`Hashing
`
`2
`
`I
`OASIS
`core
`
`1
`
`Figure 7: Steps involved in a DNS anycast request to OASIS
`using rendezvous nodes.
`
`Liveness table. For each registered service, OASIS
`maintains a liveness table of known replicas. Gossip-
`ing is not appropriate to maintain these liveness tables
`at each node: stale information could cause nodes to re-
`turn addresses of failed replicas, while high replica churn
`would require excessive gossiping and hence bandwidth
`consumption.
`Instead, OASIS aggregates liveness information about
`a particular service at a few service rendezvous nodes,
`which are selected by consistent hashing. When a replica
`joins or leaves the system, or undergoes a significant load
`change, the OASIS core node with which it has regis-
`tered sends an update to one of the k nodes closest to
`hash(service). For scalability, these rendezvous nodes
`only receive occasional state updates, not each soft-state
`refresh continually sent by replicas to their core nodes.
`Rendezvous nodes can dynamically adapt the parameter
`k based on load, which is then gossiped as part of the ser-
`vice’s policy. By default, k = 4, which is also fixed as a
`lower bound.
`Rendezvous nodes regularly exchange liveness infor-
`mation with one another, to ensure that their liveness ta-
`bles remain weakly consistent. If a rendezvous node de-
`tects that an core node fails (via OASIS’s failure detec-
`tion mechanism), it invalidates all replicas registered by
`that node. These replicas will subsequently re-register
`with a different core node and their information will be
`re-populated at the rendezvous nodes.
`Compared to logically-decentralized systems such as
`DHTs [39, 42], this aggregation at rendezvous nodes al-
`lows OASIS to provide faster response (similar to one-
`hop lookups) and to support complex anycast queries
`(e.g., as a function of both locality and load).
`
`3.2 Putting it together: Resolving anycast
`
`Given the architecture that we have presented, we now
`describe the steps involved when resolving an anycast re-
`quest (see Figure 7). For simplicity, we limit our discus-
`sion to DNS redirection. When a client queries OASIS
`for the hostname coralcdn.nyuld.net for the first time:
`
`1. The client queries the DNS root servers, finding an
`OASIS nameserver I for nyuld.net to which it sends
`the request.
`2. Core lookup: OASIS core node I finds other core
`nodes near the client that support the DNS interface
`by executing the following steps:
`
`(a) I locally maps the client’s IP address to IP pre-
`fix, and then prefix to location coordinates.
`(b) I queries one of the k rendezvous nodes for ser-
`vice dns, call this node SI, sending the client’s
`coordinates.
`(c) SI responds with the best-suited OASIS name-
`servers for the specified coordinates.
`(d) I returns this set of DNS replicas to the client.
`Let this set include node J.
`
`3. The client resends the anyc

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket