throbber
United States Patent
`Rabinovich et al.
`
`[19J
`
`
`
`I IIIII IIIIIIII Ill lllll lllll lllll lllll lllll lllll lllll lllll 111111111111111111
`US006167427A
`[11] Patent Number:
`[45] Date of Patent:
`
`6,167,427
`Dec. 26, 2000
`
`[54] REPLICATION SERVICE SYSTEM AND
`METHOD FOR DIRECTING THE
`REPLICATION OF INFORMATION SERVERS
`BASED ON SELECTED PLURALITY OF
`SERVERS LOAD
`
`[75]
`
`Inventors: Irina Rabinovich; Michael
`Rabinovich, both of Gillette, N.J.
`
`[73] Assignee: Lucent Technologies Inc., Murray Hill,
`N.J.
`
`5,787,442
`5,832,219
`5,923,837
`
`7/1998 Hacheri et al. ......................... 707/201
`11/1998 Pettus et al. ....................... 395/200.33
`7/1999 Matias ....................................... 714/28
`
`OTHER PUBLICATIONS
`
`Microsoft Press Computer Dictionary, 3rd. Ed., Microsoft
`Press, Sep. 1997.
`Baumgariner et al., Global load balancing strategy for
`distributed computer system, IEEE, Jul. 1988.
`Dandamudi et. al., A hierarchical load sharing policy for
`distributed systems, IEEE, Sep. 1997.
`
`[21] Appl. No.: 08/979,611
`
`[22] Filed:
`
`Nov. 28, 1997
`
`[51]
`[52]
`
`[58]
`
`Int. Cl.7 ........................... G06F 15/16; G06F 15/167
`U.S. Cl. .......................... 709/201; 709/203; 709/217;
`709/218; 709/219; 707/204; 707/205
`Field of Search
`..................................... 709/203, 201,
`709/202, 316, 217, 218, 219, 221, 223,
`226, 229, 235, 238, 249; 707/204, 205,
`200
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,974,256
`5,539,883
`5,588,147
`5,708,812
`5,745,683
`5,774,668
`
`11/1990 Cyr et al. ................................ 379/113
`7/1996 Allon et al. ............................. 395/675
`12/1996 Neeman et al.
`............................ 707/1
`1/1998 Van Dyke et al.
`..................... 395/712
`4/1998 Lee et al. ... .... ... ... ... ... .... ... ... ... 709 /250
`6/1998 Choquier ............................ 395/200.53
`
`Primary Examiner-Le Hien Luu
`Assistant Examiner-Beatriz Prieto
`Attorney, Agent, or Firm-William Ryan
`
`[57]
`
`ABSTRACT
`
`A system an method for efficiently providing access by a
`large number of clients to objects located at a large number
`of information servers. A non-bottleneck solution is pro(cid:173)
`vided to sharing load among servers by migrating or repli(cid:173)
`cating objects over from highly loaded servers to less highly
`loaded servers. Objects that experience low loading are
`deleted to make room for more highly used objects and to
`permit make space for new objects. A naming service is
`provided to provide rapid access to a replica of a requested
`objects, while avoiding directing access requests to servers
`from which replicas of requested objects have been deleted.
`Hierarchical ordering of replication and naming functions
`permits a variety of particular access methods to be realized.
`
`13 Claims, 5 Drawing Sheets
`
`701
`RECEIVE LOAD INFORMATION
`
`702
`DETERMINE SMAX, SMIN
`
`- LOADMIN,SMIN >d ?
`LOADMAX,SMAX
`
`NO
`
`SEND OFFLOAD TO
`PMAX,SMAX
`
`708
`loadmax,R = loadmin,R=avLoadR
`
`709
`
`YES---~~'--------
`OFFLOAD MSG SENT ?
`NO
`
`711
`
`Find S'max S'min such thot
`loadmax,S'max = maxi=1,n{but i not=Smax,Smin){Loadmax,Si)
`loadmin,S'min = mixi=1,n{but i not = Smin,Smox(loodmin,Si)
`pmox,R = pmox,S'max
`pmin,R = = pmin,S'min
`
`712
`losdmax,R = loadmax,Smax;
`loadmin,R = loadmin,Smin'
`pmax,R = pmax,R = pmox,Smax;
`pmin,R = pmin,Smin
`714
`SEND REPORT TO PARENT
`
`713
`SEND REPORT TO PARENT
`
`1
`
`Petitioner Limelight - LN1006
`
`

`
`U.S. Patent
`
`Dec. 26, 2000
`
`Sheet 1 of 5
`
`6,167,427
`
`FIG.
`
`f
`
`110-2
`
`110-(N-1)
`
`110-N
`
`130
`
`140-M
`
`FIG. 2
`
`210
`
`220-2
`~-....__..........._,
`REPLICATOR - - - - - --
`
`-
`
`220-i
`
`230-2
`
`2
`
`

`
`U.S. Patent
`
`Dec. 26, 2000
`
`Sheet 2 of 5
`
`6,167,427
`
`FIG. 3
`
`210
`
`210
`
`220-2
`
`FIG. 4
`
`3
`
`

`
`U.S. Patent
`
`Dec. 26, 2000
`
`Sheet 3 of 5
`
`6,167,427
`
`FIG. 5
`
`501
`READ LOAD FROM
`
`SUBORDINATE SERVERS
`502
`
`MORE SERVERS ?
`
`NO 503
`
`ROOT SERVER? YES
`NO
`
`505
`PASS LOW AND HIGH LOADS
`TO HIGHER SERVER
`
`FIG. 6
`
`600
`START
`
`601
`READ LOAD FROM
`
`SUBORDINATE SERVERS
`602
`
`MORE SERVERS ?
`
`NO 603
`
`504
`MSGS TO HIGH AND SERVERS
`TO PERFORM DISTRIBUTION EVENT
`
`604
`
`MSGS TO SUBORDINATE
`ROOT SERVER ? >-YE_S-i HIGH AND LOW SERVERS
`TO PERFORM DIST EVENT
`
`606
`NO
`MSGS TO SUBORDINATE
`HIGH AND LOW HOSTING SETS
`TO PERFORM DIST EVENT
`
`605
`PASS LOW AND HIGH LOADS
`TO HIGHER SERVER
`
`4
`
`

`
`U.S. Patent
`
`Dec. 26, 2000
`
`Sheet 4 of 5
`
`6,167,427
`
`FIG. 7
`
`701
`RECEIVE LOAD INFORMATION
`
`702
`DETERMINE SMAX, SMIN
`
`
`
`704
`703
`
`~LO-AD-M-AX-,S-M-AX--_._LO-A-DM-IN-,S-M-IN->....Ld-?~ >-Y_ES_-i SEND OFFLOAD TO
`PMAX,SMAX
`705
`SEND ADDITIONAL
`OFFLOAD MSGS
`
`NO
`
`.--------~
`
`END
`
`708
`loadmax,R = loadmin,R=avLoadR
`
`709
`OFFLOAD MSG SENT?
`NO
`
`711
`
`Find S'max S'min such that
`loadmax,S'max = maxi=1,n{but i not=Smax,Smin)(Loadmax,Si)
`loadmin,S'min = mixi=1,n(but i not = Smin,Smax(loadmin,Si)
`pmax,R = pmax,S'max
`pmin,R = = pmin,S'min
`
`712
`losdmax,R = loadmax,Smax;
`loadmin,R = loadmin,Smin'
`pmax,R = pmax,R = pmax,Smax;
`pmin,R = pmin,Smin
`714
`SEND REPORT TO PARENT
`
`713
`SEND REPORT TO PARENT
`
`5
`
`

`
`U.S. Patent
`
`Dec. 26, 2000
`
`Sheet 5 of 5
`
`6,167,427
`
`FIG. 8
`
`802
`HOST SENDS DELETEREPL(X)
`TO REDIRECTOR
`
`803
`REDIRECTOR DELETES THE HOST
`FROM REPLICA SET, RESPONDS
`WITH RESOLVED_CNT(Xp)
`
`804
`805
`
`-RE-S-OL-VE-D_-C_N_T _=....__S-AT-IS-FIE-D-_C_N_T -? NO
`WAIT
`
`
`
`YES
`
`~____,.___......____,
`
`806
`p DELETES X
`
`807
`
`END
`
`6
`
`

`
`6,167,427
`
`1
`REPLICATION SERVICE SYSTEM AND
`METHOD FOR DIRECTING THE
`REPLICATION OF INFORMATION SERVERS
`BASED ON SELECTED PLURALITY OF
`SERVERS LOAD
`
`FIELD OF THE INVENTION
`
`5
`
`The present invention relates generally to the field of
`distributed
`information
`systems. More particularly,
`the
`present invention relates, in one aspect, to selective repli(cid:173)
`cation and distribution of data objects and services between
`and among a plurality of information systems. Still more
`particularly, aspects of the present invention relate to dis(cid:173)
`tributed systems and methods for efficient provisioning of
`objects and services to clients in large (including global) 15
`networks.
`
`BACKGROUND OF THE INVENTION
`
`30
`
`35
`
`40
`
`2
`threshold d, the less-loaded server
`a certain distribution
`would obtain some replicas of objects kept on the higher(cid:173)
`loaded server, thus taking up a portion of its load. Due to the
`randomness of choice, all pairs of servers would be
`involved. An approach similar to this is presented in "Adap(cid:173)
`tive load sharing in homogeneous distributed systems," by T.
`L. Casavant and J. G. Kuhl, IEEE Trans. on Software Eng.,
`vol. 2(14), pp. 141-154, Feb. 1988. The Casavant, et al
`paper also defines a threshold for which nodes with a load
`10 below the threshold are constrained
`to not initiate load
`comparison.
`However, as the number of servers grows, each server
`must initiate load comparison more frequently. Otherwise,
`the average time between load distribution events for any
`given pair of nodes will grow linearly with the number of
`nodes, as will the lag between load changes and detection of
`these changes. Since a server has a limit on how frequently
`it can perform load distribution, this solution is not scalable
`to large systems.
`Another approach is to organize hosting servers in a
`connected graph structure, with neighboring nodes perform(cid:173)
`ing pair-wise load distribution. Since the graph is connected
`the load distribution involves all servers. This technique is
`similar to an algorithm described in "Simulations of three
`adaptive,
`decentralized
`controlled,
`job
`scheduling
`algorithms," by J. A Stankovic, Computer Networks, 8, pp.
`199-217, August 1984. One difference is that in the Stank(cid:173)
`ovic paper, when a node sends its load data to a neighbor, it
`includes its information about all other nodes. These tech(cid:173)
`niques also prove to not be scalable as required for large
`networks.
`Another important consideration in establishing a global
`or other large information system is that of a naming service
`for objects and servers. In general, naming services are used
`to map a logical name of an object into the physical name of
`a replica. The main limiting factors for the name service are
`the number of clients (which determines
`the number of
`requests for name resolution), and the number of objects
`(which determines the size of the name-mapping database).
`Another factor is the number of requests from hosting
`servers for updates of name mappings.
`Name services are well known in the art, including the
`Domain Name Service (DNS) used in today's Internet and
`45 described, e.g., in P. V. Mockapetris, "Domain Names(cid:173)
`Concepts and Facilities," Request for Comments 1034,
`DDN Network
`lnfromation Center, SRI International,
`November, 1987. Also used in Internet naming is CCITT
`(now ITU) Recommendation X.500.
`However, mappings between host name and IP address
`seldom change in the DNS scheme. Rather, DNS is prima(cid:173)
`rily an append-only database that permits the addition of
`new host name/IP address mappings; current DNS imple(cid:173)
`mentations support little or no dynamic mapping of host
`55 name to IP address.
`Using current DNS naming service techniques to map
`logical object names
`to physical replicas, name server
`responses cached by clients become incorrect much more
`quickly. Thus, clients must query the name service much
`more often, greatly increasing
`the burden on the name
`service. Weak-consistency schemes for replicating the map(cid:173)
`ping database such as that described in B. W. Lampson,
`"Designing a global name service," Proc. of ACM Conf on
`Principles of Distributed Systems, ppl-10, 1986 result in
`many
`incorrect
`responses
`to clients. These
`incorrect
`responses result in the use of an incorrect physical name to
`access an object-with
`resulting failure and request renewal.
`
`As networked computers and databases, and users of
`these systems, have proliferated in numbers and geographic 20
`spread, interest has grown in efficiently providing access to
`information objects and services (hereinafter "objects") at
`host computers or information
`servers. Presently,
`for
`example, thousands of Internet servers provide a very large
`number of objects to millions of user clients worldwide. 25
`These servers are located in many countries around the
`world and typically at many locations in each country. In
`other particular cases, network service providers and private
`corporations locate server nodes at widely separated points
`in their networks.
`A particular challenge faced by developers of these net(cid:173)
`works of servers is that of providing access to a widely
`distributed set of clients without overloading particular
`hosts. The overload may occur, e.g., because a server stores
`objects that are in high demand and/or the server is a
`repository for large numbers of objects. Meeting this chal(cid:173)
`lenge proves especially difficult when the demand for par(cid:173)
`ticular objects varies considerably with time. Thus, while a
`straightforward replication of all objects at all servers would
`generally improve availability of a particular object to a
`range of clients, the cost of such replication is prohibitive. In
`fact, the economics of replication, distribution and storage
`do not usually permit design for a worst-case predicted
`demand condition in such large networks.
`The load on each server is an important consideration in
`adequately meeting demand from clients for objects; in
`general, it is desirable to balance the load among servers. In
`many existing replicated object systems, with relatively few
`servers, this question is quite tractable: system administra-
`tors or a dedicated computer system process can monitor the
`load of servers and decide on selective replica placement.
`When
`the number of servers
`increases, however, such
`human or dedicated process cannot be expected to efficiently
`direct the creation and deletion of replicas.
`Current object-location
`techniques used in distributed
`server networks assume that sets of object replicas are well
`known to clients. However, when the number and geo(cid:173)
`graphic distribution of servers increases to dense national
`and international proportions this assumption proves unre-
`alistic; the ability of clients to efficiently locate desired
`objects at servers increases markedly.
`One possible solution to the scalability problem of the
`replication service would be to use a localized "greedy"
`approach. For example, each hosting server might choose 65
`another hosting server at random, and perform load com(cid:173)
`parisons with this other server. If the load difference exceeds
`
`50
`
`60
`
`7
`
`

`
`6,167,427
`
`3
`World-Wide-Web (Web) syntax and semantics for object
`names (URLs) are quite different from those in DNS or
`X.500. Use of DNS or X.500-compliant symbolic names in
`networks like the Web would require extensive changes in
`Web browsers.
`
`5
`
`4
`FIG. 7 is a flowchart for a preferred illustrative load
`balancing arrangement.
`FIG. 8 is a flowchart showing an illustrative method for
`deleting a replica.
`
`SUMMARY OF THE INVENTION
`
`10
`
`Limitations of the prior art are overcome and a technical
`advance is made in accordance with the present invention
`described in illustrative embodiments herein.
`In one illustrative embodiment, a method is described for
`achieving the number and placement of object replicas in a
`network of servers. This result is achieved without the need
`for bottleneck-causing global decisions. Using the present
`inventive methods and resulting network, a server network
`is realized in which creation and deletion of replicas is
`minimized when a steady demand for all objects persists for
`an appropriate period of time. Moreover, when such steady
`demand exists the load can be distributed among servers in
`a substantially equal manner.
`In accordance with an illustrative embodiment, the deci(cid:173)
`sion on the number and placement of replicas is made within
`the network. Moreover, the process is dynamic, with replicas
`being created and deleted as demand and geographic origins
`of requests change. The illustrative replication is advanta(cid:173)
`geously transparent to end-user clients, except in improved
`network response.
`illustrative
`In accordance with an aspect of the
`embodiment, no high demand bottlenecks arise which might 30
`require an increase of processing power at any node. Rather,
`as load increases, the number of nodes can be increased to
`handle the increased demand. This result obtains whether the
`high demand is based on the number of clients requesting
`objects increases, or whether the demand arises from an 35
`increase in the number of objects.
`In accordance with one aspect of the present invention, a
`new naming service is introduced for finding object replicas.
`Advantageously,
`the existing DNS name services can be
`used as a first level of indirection in the new naming service.
`In other particular contexts, such as when the character(cid:173)
`istics of languages for specifying object content or when
`certain network protocols are used, the illustrative inventive
`techniques, or their application, may be modified as required
`to achieve desired results.
`
`DETAILED DESCRIPTION
`Illustrative System Overview
`FIG. 1 shows an illustrative application of the present
`invention. There, a wide area network 100 comprises a large
`number of servers 140-i, i-1, 2, ... M and clients 110-j,
`... N. Servers 140-i host objects publicly accessible
`j=l,2,
`to clients 110-j. A node may play a dual role as a server for
`some objects and a client for other objects.
`Clients access objects by their names. The name of an
`15 object allows a client to infer the identity of hosting server
`(e.g., its domain name or IP address) and access method (i.e.,
`the protocol to be used to access the server). For illustrative
`purposes, it will be assumed that object names embed a
`domain name of the server that hosts the object. One
`20 instance of this environment
`is the Internet, which
`comprises, inter alia, Web servers and Web browsers. Such
`servers typically include computers and related equipment
`for controlling ( often very large) databases, while browsers
`are typically associated with client terminals or (usually
`25 smaller) computers, all as is well known in the art.
`The network of FIG. 1 is a network of information
`systems, where objects provide access to information, and
`thus do not change as a result of an access by a client.
`However, an access may result in extensive processing by
`the server to compute the result of the client's query. For
`example, in a geographical database, a request for the map
`of a specified area in a specified scale may require genera(cid:173)
`tion of the map, which typically is an expensive operation in
`terms of server resources.
`A load measure exists for each server 140-i in FIG. 1 to
`allow comparison of load on each server. For example, when
`the servers operate using the UNIX operating system, the
`length of the input queue (as measured, e.g., by the output
`of the uptime command) proves to be convenient for this
`40 purpose. In other particular environments, other measures of
`load may be preferable.
`In addition, each individual server can estimate the frac(cid:173)
`tion of its total load due to a given object on the server. In
`typical operation
`this
`is accomplished by monitoring
`45 resource consumption (e.g., CPU time, 10 operations, etc.)
`due to requests for individual objects and dividing up the
`total load between objects in proportion to their consump(cid:173)
`tion. While the illustrative servers 140 in FIG. 1 may be
`quite different in total system resources, a server with an
`50 average queue length of 1.5 is more heavily loaded than a
`server with an average queue length of 0.8, regardless of the
`quantity of processing needed at each server to achieve these
`queue lengths.
`In achieving purposes of the present invention, the illus(cid:173)
`trative embodiment of FIG. 1 provides for replicating an
`object located on a particular server, say server p, on another
`server, server q. Alternatively an object may be migrated
`from server p to server q. xP will denote a replication of
`object x on server p. load(p) denotes the load of node p, and
`60 load(px) denotes the load on node p due to object x. In
`general, if x is migrated from node p to node q, the reduction
`of load on p may not be equal to the increase of load on q,
`due to difference in processing power of p and q.
`Typically, there are two ways by which a system can
`65 balance the load: directing client requests to less loaded
`servers (among those with replicas of the requested object),
`and migrating or replicating objects between servers. Server
`
`BRIEF DESCRIPTION OF THE DRAWING
`
`of illustrative
`description
`The above-summarized
`embodiments of the present invention will be more fully
`understood upon a consideration of the following detailed
`description and the attached drawing, wherein:
`FIG. 1 is an overall view of an illustrative network system
`embodiment of the present invention.
`FIG. 2 shows a replication service arrangement compris-
`ing a hierarchy of replicators for use in the network of FIG.
`1.
`
`FIG. 3 shows an application of the replication service
`arrangement of FIG. 2.
`FIG. 4 shows a naming service arrangement for use in the
`system of FIG. 1.
`FIG. 5 is a flowchart illustrating a load balancing system
`and method in accordance with one embodiment of the
`present invention.
`FIG. 6 is a flowchart for a modified version of the system
`and method of FIG. 5.
`
`55
`
`8
`
`

`
`6,167,427
`
`5
`selection for a particular client request is typically based on
`the geographic origin of the request (i.e., the closest replica
`is chosen), but the selection may be based on another
`criterion, or at random. Thus, the load, is illustratively
`balanced by replication or migration of objects. The event of
`migrating an object from server p to server q, or creating a
`new replica of an object on q by copying it from p, is called
`a distribution event. Servers p and q are, respectively, the
`source and the recipient in the distribution event.
`It is typically desired that when in a network system like 10
`that ofFIG. l the demand for all objects does not change, the
`system stabilizes into a state in which the loads are distrib(cid:173)
`uted equally among all hosting servers. Thus, when the
`demand for different objects changes and stabilizes at dif(cid:173)
`ferent levels, the system will eventually re-distribute
`the 15
`load so that it is again distributed equally among hosts.
`However, small changes in the demand for objects should
`not trigger load re-distribution; otherwise, the system will
`hardly ever be stable.
`It proves convenient to consider a mechanism for load 20
`balancing
`to be stabilizing
`if there exist two constants,
`demandDiff and loadDiff such that if the variation in time of
`the request rate for every object x stays within demandDiff,
`the system eventually reaches a state where no replicas are
`created or dropped, and the difference between the load of
`any two hosting servers does not exceed loadDiff.
`Likewise, it proves advantageous to avoid load balancing
`of a type in which isolated regions are created with autono(cid:173)
`mous load distribution in each region. Thus, a load balancing
`mechanism desirably avoids a condition in which nodes
`from different regions have significantly different load, even
`if individual regions are not in the stable state. More
`formally, for some L and l<L, and with all nodes partitioned
`into three sets, A(L,l)={plload(p)~L}, B(L,l)={pll~load(p)
`<L}, and C(L,l)={plload(p)<l}. A mechanism for load bal(cid:173)
`ancing is called contiguous if there exist some constants d
`and t such that for any L and l<L-d,
`if no node moves
`between sets A(L,l), B(L,l) and C(L,l) for time t, then there
`will be a distribution event either with a source in A and
`recipient outside A, or with a source outside C and recipient 40
`inside C.
`The contiguity criterion may be illustrated by a system
`with four servers i, j, k, and 1. A load balancing mechanism
`can be derived that balances the load between nodes in node
`pair i andj, and separately between nodes in node pair k and
`1, with no balancing performed between the node pairs. The
`demand for objects hosted by the first pair (i, j) will be
`assumed to be very high, but unstable (so that there are
`continuous distribution events occurring between i and j).
`Also, it will be assumed that there is unstable but overall low
`demand for objects on k and 1. Thus the load on i and j
`greatly exceeds the load on k and 1, while fluctuating within
`each pair. This example shows the importance of both the
`contiguity criterion as well as the stabilization criterion.
`Returning to the system of FIG. 1, when an object is
`created, it is placed on one of the hosting servers and is
`registered with a name service 120. Registering of a related
`type is known in a so-called "persistent URL" proposal for
`use in the World Wide Web (hereinafter "Web") operation in
`the Internet, but such registering of persistent URLs is not
`part of the present invention. This registration in accordance
`with the illustrative embodiment involves sending a message
`to name service 120 informing that service of the physical
`name of the object and assigning it a symbolic name. The
`physical name of an object is the name by which the object
`can be accessed by the clients (in the Web context, this
`would be the object's URL). The symbolic name uniquely
`
`6
`identifies the object. However, it resolves into the name
`service identity rather than the identity of the server hosting
`the object.
`The symbolic name is advertised to the users; it is the
`5 name known to the clients. When a client wants to access the
`object, it uses its symbolic name. Since this name resolves
`into the name service identity, the request for the object
`actually arrives at the name service 120 as shown by the link
`between typical client 110-1 and name service 120. Name
`server 120 finds a corresponding physical name ( due to
`replication, a symbolic name can map into multiple physical
`names) and sends it as a special "re-direct" response to the
`client. The client, such as 110-1 in FIG. 1 then uses the
`physical name received to access the object at the identified
`server, such as 140-1 in FIG. 1.
`Asynchronously, hosting servers periodically report their
`load to the replication service unit 130. The replication
`service 130 uses these reports to migrate or replicate objects
`so that the load is shared among all hosting servers, and
`replicas are placed close to a majority of requests for the
`corresponding objects.
`When a hosting server 140-j creates or drops a replica of
`an object, it records this fact at the name service 120. Thus,
`the name service keeps a mapping of the symbolic name of
`25 an object to a set of physical names of currently available
`replicas. When resolving a symbolic name, the name service
`chooses one of the current set of physical names, applying
`a fair procedure and taking into account the geographical
`location of the requesting client.
`30 The Replication Service
`One potential bottleneck in the system of FIG. 1 is the
`replication service. Indeed, as the number of hosting servers
`and hosted objects increases, the number of load reports the
`replication service must process grows, and so does the
`35 search space for deciding on replica placement. This section
`describes a scalable distributed architecture for the replica(cid:173)
`tion service 130. Scalability
`is of interest to the present
`discussion because important target networks are assumed to
`be large.
`The following description of a policy for replica place-
`ment is based on the objective of load distribution. A
`challenge in designing such a policy is that the policy must
`ensure globally desirable system behavior ( according to
`criteria such as the stabilizing and contiguity criteria dis-
`45 cussed above) without creating any bottleneck-producing
`global decision points. Those skilled in the art will develop
`other detailed policies and heuristics within the scope of the
`presently described architecture and infrastructure.
`An illustrative structure for the replication service 130 of
`50 FIG. 1 is shown in FIG. 2. There, it will be seen that
`replication service 130 comprises a hierarchy of replicator
`servers. Each replicator server in FIG. 2 comprises standard
`network computer functionality, including a central process(cid:173)
`ing unit, main and auxiliary memory and input/output
`55 facilities, all arranged to receive information from informa(cid:173)
`tion servers and other replicators
`to perform,
`inter alia,
`analyses of load information, and to generate control mes(cid:173)
`sages relating to these analyses and to creation and deletion
`of object replicas. The highest level replicator is at node 210
`60 and replicators 220-j appear at nodes directly below node
`210. Only two levels are shown explicitly in the hierarchy of
`FIG. 2, but other levels of replicators will appear as required
`for particular networks at respective nodes at other levels in
`the hierarchy. For very large networks, the level of such
`65 nodes may extend over many such levels in a hierarchy.
`Direct descendants of a node in the hierarchy of replicators
`in FIG. 2 are referred to as subordinates. For a server S, let
`
`9
`
`

`
`6,167,427
`
`7
`8
`In accordance with this illustrative preferred embodiment,
`the hosting set H(S) be a set of hosting servers in the subtree
`rooted at S. In particular, for a hosting server p, H(p )={p}.
`a protocol is employed which uses three parameters. One of
`The hierarchical structure of FIG. 2, can be used to imple(cid:173)
`these parameters is a system-wide parameter, the distribu(cid:173)
`ment a variety of load re-distribution methods, as will now
`tion threshold, d, which reflects the load difference between
`be described in more detail.
`5 servers required
`to trigger distribution events. Another
`A first solution based on the structure of FIG. 2 is to use
`parameter,
`the deletion
`threshold, u, is host-specific. A
`the replicator hierarchy only to find the hosting servers with
`hosting server p deletes replicas ( except for the last replica)
`the highest and lowest load. In this solution, hosting servers
`of an object such that load(xp)<ur As discussed above, the
`report their load to the lowest level replicators. These lowest
`same rate of requests for the same object may result in
`level replicators then choose the servers with the highest and
`10 different values of the object load at different servers.
`lowest load and pass this information up to their parents. The
`However, up is adjusted for every server such that if object
`parents then choose hosting servers with the highest and
`x qualifies for deletion on one server under certain request
`lowest load, among those reported by their subordinate
`rate, it would qualify for deletion on all servers under about
`replicators. Thus, each replicator reports up the information
`the same rate. The third parameter employed by this illus-
`about the hosting servers with highest and lowest load in its
`15 trative protocol
`is a system-wide stability factor s that
`hosting set. This process continues until it reaches the root,
`reflects the level of variation in server load that should not
`which will identify the hosting servers with the highest and
`cause distribution events. The three parameters are con(cid:173)
`lowest load in the system and re-distribute the load between
`strained by a condition 2*umax+s<d, where umax is the
`them. Since non-root replicators work in parallel, the whole
`maximum deletion threshold across all hosting servers. Each
`process takes place in logarithmic time.
`The flowchart of FIG. 5 illustrates this process. Starting at 20
`replicator is advantageously arrranged to have at least three
`block 500 in the flowchart, high and low load information is
`subordinates.
`In accordance with the illustrative protocol, a hosting
`read from subordinate nodes as indicated by process block
`501. This is continued at block 501 until the decision at
`server p periodically examines all of its objects, attempting
`block 502 indicates that all immediately subordinate nodes
`to delete object replicas whose load is below ur It does not,
`however, delete the sole replica of an object in the system.
`have input their information. A further test is made at 25
`decision block 503 to determine if the information has been
`It then sends a load report to its parent replicator. Every
`input to the root node from its subordinate nodes. If the node
`replicator collects load reports from its subordinates and
`sends the load report to the higher-level replicator.
`processing inputs is not the root node, the high and low load
`information is passed to the next higher node at block 505.
`The load report from node S ( either a hosting server or a
`This process continues until the root server node is the node
`30 replicator) has a form ( av Lo ad 5, H(S), Pmax s, Pmin s, loadmax
`receiving the inputs. Then, messages are sent to the high and
`of hosting
`s, loadmin 5 ), where avLoad 5 is the ave rag~ load
`low load nodes as described above and indicated by process
`servers fr~m H(S). Thus in the case of a hosting server, the
`block 504 in FIG. 5 to cause a distribution event to occur.
`average load is the actual load. H(S) includes information
`This solution can be shown to satisfy the stability and
`about the number of these servers; Pmax s and Pmin s are the
`contiguity criteria described above.
`35 identities of hosting servers chosen am~ng H(S).
`1f S is a
`Certain inefficiencies can be identified in the process
`leaf, both Pmin s and Pmax s are the identity of S itself. When
`shown in FIG. 5 and described above that may be important
`Sis a leaf, i.e'. a hosting
`server, loadmaxs and loadminS are
`in some network applications. First, the work of lower level
`the same and equal to the actual load of S. For a no~-leaf
`replicators whose hosting sets do not contain nodes with the
`server S, Pmax s, Pmin s, loadmax s and loadmin s are calculated
`globally highest and lowest load is wasted, even if nodes
`40 in the protoc~l based on repor'ts from suboidinates of S.
`with the load difference over the threshold were found.
`Upon collecting reports from all subordinates, a replicator
`R executes the DistributeLoad protocol appearing in Listing
`Second, the load between only two (or, more gener

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket