throbber
584 • Chapter 18: Distributed Coordination
`
`with four sites and full replication. Suppose that transactions T 1 and T 2
`wish to lock data item Q in exclusive mode. Transaction T1 may
`succeed in locking Q at sites 51 and 53, while transaction T2 may
`succeed in locking Qat sites 52 and 54. Each then must wait to acquire
`the third lock, and hence a deadlock has occurred.
`
`18.4.1.4 Biased Protocol
`The biased protocol is based on a model similar to that of the majority
`protocol. The difference is that requests for shared locks are given more
`favorable treatment than are requests for exclusive locks. The system
`maintains a lock manager at each site. Each manager manages the locks for
`all the data items stored at that site. Shared and exclusive locks are handled
`differently.

`
`• Shared locks: When a transaction needs to lock data item Q, it simply
`requests a lock on Q from the lock manager at one site containing a
`replica of Q.
`• Exclusive locks: When a transaction needs to lock data item Q, it
`requests a lock on Q from the lock manager at all sites containing a
`replica of Q.
`
`As before, the response to the request is delayed until the request can be
`granted.
`The scheme has the advantage of imposing less overhead on read
`operations than does the majority protocol. This advantage is especially
`significant in common cases in which the frequency of reads is much
`greater than is the frequency of writes. However, the additional overhead
`on writes is a disadvantage. Furthermore, the biased protocol shares the
`majority protocol's disadvantage of complexity in handling deadlock.
`
`1.8.4.1.5 Primary Copy
`In the case of data replication, we may choose one of the replicas as the
`primary copy. Thus, for each data item Q, the primary copy of Q must
`reside in precisely one site, which we call the primary site of Q,
`When a transaction needs to lock a data item Q, it requests a lock at
`the primary site of Q. As before, the response to the· request is delayed
`until the request can be granted.
`Thus, the primary copy enables concurrency control for replicated data
`to l:>e handled in q. manner· similar to that for unreplicated data. This
`method of handling allows for a simple implementation. However, if the
`primary site of Q fails, Q is inaccessible even tho11gh other sites containing
`a replica may be ·accessible.

`
`Apple 1013 (Part 4 of 4)
`U.S. Pat. 9,189,437
`
`

`
`585
`
`18.4.2 Timestamping
`The principal idea behind the timestamping scheme
`6. 9 is that each transaction is given a unique timestamp
`deciding the serialization order. Our first task, then,
`centralized scheme to a distributed scheme is to
`generating unique timestamps. Once this scheme has
`previous protocols can be applied directly
`to
`environment.
`
`18.4.2.1 Generation of Unique Timestamps
`There are two primary methods for generating unique
`centralized and one distributed. In the centralized scheme, a
`chosen for distributing the timestamps. The site can use a ""'"·"''"'-u
`or its own local clock for this purpose.
`a unique
`In the distributed scheme, each site
`using either a logical counter or the local dock. The
`timestamp is obtained by concatenation of the unique local
`the site identifier, which must be unique (Figure 18.2).
`concatenation is important! We use the
`identifier in
`position to ensure that the global timestamps generated
`always greater than those generated in another
`technique for generating unique timestamps with the one we
`Section 18.1.2 for generating unique names.
`We may still have a problem if one site generates local
`faster rate than do other sites. In such a case, the fast
`will be larger than that of other sites. Therefore, all
`by the fast site will be larger than those generated by
`needed is a mechanism to ensure that local timestamps are
`across the system. To accomplish the
`generation of
`define within each site Si a logical clock (LCi), which U",_,,,,..,,,,
`local timestamp. The logical clock can be implemented as a
`
`local unique timestamp
`
`site identifier
`
`global unique identifier
`
`Figure 18.2 Generation of unique
`
`

`
`586 • Chapter 18: Distributed Coordination
`
`incremented after a n~w local timestamp is generated. To ensure that the
`various logical clocks are synchronized, we require that a site Si advance its
`logical clock whenever a transaction Ti with timestamp <x,y> visits that site
`and x is greater than the current value of LCi. In this case, site Si advances
`its logical clock to the value x + 1.
`If the system clock is used to generate timestamps, then timestamps
`are assigned fairly provided that no site has a system clock that runs fast
`or slow. Since clocks may not be perfectly accurate, a technique similar to
`that used for logical clocks must be used to ensure that no clock gets far
`ahead or far behind another clock.
`
`18.4.2.2 Timestamp-Ordering Scheme
`The basic timestamp scheme introduced in Section 6. 9 can be extended in a
`straightforward manner to a distributed system. As in the centralized case,
`cascading rollbacks may result if no mechanism is used to prevent a
`transaction from reading a data item value that is hot yet committed. To
`eliminate cascading rollbacks, we can combine the basic timestamp scheme
`of Section 6. 9 with the 2PC protocol of Section 18.3 to obtain a protocol that
`ensures serializability with no cascading
`rollbacks. We
`leave
`the
`development of such an algorithm to you.
`the
`from
`The basic
`timestamp scheme
`just described suffers
`undesirable property that conflicts between transactions are resolved
`through rollbacks, rather than through waits. To alleviate this problem, we
`can buffer the various read and write operations (that is, delay them) until
`a time when we are assured that these operations can take place without
`causing aborts. A read(x) operation by Ti must be delayed if there exists a
`transaction Ti that will perform a write(x) operation but has not yet done
`so, and TS('lj) < TS(Ti). Similarly, a write(x) operation by Ti must be
`delayed if there exists a transaction 1J. that will perform either read(x) or
`write(x) operation and TS(:!j) < TS(Ti)· There are various methods for
`the conservative
`ensuring
`this property. One such method, called
`timestamp..;ordering scheme, requires each site to maintain a read and write
`queue consisting of all the read and write requests, respectively, that are to
`be executed at the site and that must be delayed to preserve the above
`property._ We shall not present the scheme here. Rather, we leave the
`·development of the algorithm to you.
`
`18.5 • Deadlock Handling
`
`The deadlock-prevention, deadlock-avoidance, and deadlock-detection
`algorithms presented in Chapter 7 can be extended so that they can also be
`used in a distributed system. In the following, we describe several of
`these distributed algorithms.
`
`

`
`18.5 Deadlock Handling • 587
`
`18.5.1 Deadlock Prevention
`The deadlock-prevention and deadlock-avoidance algorithms presented in
`Chapter 7 can also be used in a distributed system, provided that
`appropriate modifications are made. For example, we can use the
`resource-ordering deadlock-prevention technique by simply defining a
`global ordering among the system resources. That is, all resources in the
`entire system are assigned unique numbers, and a process may request a
`resource (at any processor) with unique number i only if it is not holding a
`resource with a unique number greater than i. Similarly, we can use the
`banker's algorithm in a distributed system by designating one of the
`processes in the system (the banker) as the process that maintains the
`information necessary to carry out the banker's algorithm. Every resource
`request must be channeled through the banker.
`These two schemes can be used in dealing with the deadlock problem
`in a distributed environment. The first scheme is simple to implement and
`requires little overhead. The second scheme can also be implemented
`easily, but it may require too much overhead. The banker may become a
`bottleneck, since the number of messages to and from the banker may be
`large. Thus, the banker's scheme does not seem to be of practical use in a
`distributed system.
`In this section, we present a new deadlock-prevention scheme that is
`based on a timestamp-ordering approach with resource preemption. For
`simplicity, we consider only the case of a single instance of each resource
`type.
`To control the preemption, we assign a unique priority number to each
`process. These numbers are used to decide whether a process Pi should
`wait for a process Pj. For example, we can let Pi wait for Pi if Pi has a
`priority higher than that of Pj; otherwise Pi is rolled back. This scheme
`prevents deadlocks because, for every edge Pi~ Pj in the wait-for graph,
`Pi has a higher priority than Pj. Thus, a cycle cannot exist.
`One difficulty with this scheme is the possibility of starvation. Some
`processes with extremely low priority may always be tolled back. This
`difficulty can be avoided through the use of timestamps. Each process in
`the system is assigned a unique timestamp when it is created. Two
`complementary deadlock-prevention schemes using timestamps have been
`proposed:
`
`• The wait- die scheme: This approach is based on a nonpreemptive
`technique. When process Pi requests a resource currently held by Pj, Pi
`is allowed to wait only if it has a smaller timestamp than does Pj ~that
`is, Pi is older than Pj). Otherwise, Pi is rolled back (dies). For example,
`suppose that processes P 1, P2, and P3 have timestamps 5, 10, and 15,
`respectively. If P 1 requests a resource held by P 2, P 1 will wait. If P 3
`requests a resource held by P 2, P 3 will be rolled back.
`
`

`
`-
`
`588 • Chapter 18: Distributed Coordinatiort
`
`• The wound -wait scheme: This approach is based on a preemptive
`technique and is a counterpart to the wait-die system. When process
`Pi requests a resource currently held by Py Pi is allowed to wait only if
`it has a larger timestamp than does Pj (that is, Pi is younger than Pj).
`Otherwise, Pj is rolled back (Pj is wound by Pi)· Returning to our
`previous example, with processes PI, P2, and P3, if PI requests a
`resource held by P 2, then the resource will be preempted from P 2 and
`P 2 will be rolled back. If P 3 requests a resource held by P 2, then P 3 will
`wait.
`
`Both schemes can avoid starvation, provided that, when a process is
`rolled back, it is not assigned a new timestamp. Since timestamps always
`increase, a process that is rolled back will eventually have th~ smallest
`timestamp. Thus, it will not be rolled back again. There are, however,
`significant differences in the way the two schemes operate.
`
`• In the wait-die scheme, an older process must wait for a younger one
`to release its resource. Thus, the older the process gets, the more it
`tends to wait. By contrast, in the wound -wait scheme, an older
`process never waits for a younger process.
`• In the wait-die scheme, if a process Pi dies and is rolled back because
`it requested a resource held by process Pj, then Pi may reissue the
`same sequence of requests when it is restarted. If the resource is still
`held by Pj, then Pi will die again. Thus, Pi may die several times before
`acquiring the needed resource. Contrast this series of events with what
`happens in the wound-wait scheme. Process Pi is wounded and rolled
`back because Pj requested a resource it holds. When Pi is restarted and
`requests the resource now being held by Pj, Pi waits. Thus, there are
`fewer rollbacks in the wound-wait scheme.
`
`The major problem with these two schemes is that unnecessary rollbacks
`may occur.
`
`18.5.2 Deadlock Detection
`The deadlock-prevention algorithm may preempt resources even if no
`deadlock has occurred. To prevent unnecessary preemptions, we can use a
`deadlock-detection algorithm. We construct a wait-for graph describing the
`resource-allocation state. Since we are assuming only a single resource of
`each type, a cycle in the wait-for graph represents a deadlock.
`The main problem in a distributed system is deciding how to maintain
`the wait-for graph. We elaborate this problem by describing several
`common techniques to deal with this issue. These schemes require that
`each site keep a local wait-for graph. The nodes of the graph correspond
`
`

`
`18.5 Deadlock
`
`to all the processes (local as well as nonlocal) that are
`holding or requesting any of the resources local to that site.
`in Figure 18.3 we have a system consisting of two sites, each
`its local wait-for graph. Note that processes P2 and P3
`graphs, indicating that the processes have requested resources
`sites.
`These local wait-for graphs are constructed in the usual manner
`local processes and resources. When a process Pi in site A
`held by process Pj in site B, a request message is sent by Pi
`edge Pi~ Pj is then inserted in the local wait-for graph of
`Clearly, if any local wait-for graph has a cycle, deadlock
`On the other hand, the fact that there are no cycles in any
`wait-for graphs does not mean that there are no deadlocks.
`this problem, we consider the system depicted in Figure
`for graph is acyclic; nevertheless, a deadlock exists in the
`that a deadlock has not occurred, we must show that the
`graphs is acyclic. The graph (shown in Figure 18.4) that we
`taking the union
`the two wait-for graphs 9f Figure 18.3
`contain a cycle, implying that the system is
`a deadlock state.
`There are a number of different methods for organizing
`graph in a distributed system. We shall describe several common'"''--"''"-""' . .,.
`
`18.5.2.1 Centralized Approach
`In the centralized approach, a global wait-for graph is
`union of all the local wait-for graphs. It maintained in a
`the deadlock-detection coordinator. Since there is communication
`system, we must distinguish between two types of wait-for
`real graph describes the real but unknown state of
`cnc,f-t:n-n
`instance in time, as would be seen by an omniscient
`constructed
`an approximation generated by the
`the execution of its algorithm. Obviously, the constructed
`generated such that, whenever the detection algorithm is
`
`site A
`
`site 8
`
`Figure 18.3 Two local wait-for graphs.
`
`

`
`590
`
`Chapter 18: Distributed Coordination
`
`reported results are correct in a sense that, if a deadlock exists,
`reported properly, and if a deadlock is reported, then the system
`a deadlock state. As we shall show, it is not easy to
`rn1''~""''~t algorithms.
`There are three different options (points in time) when the
`graph may be constructed:
`
`1. Whenever a new edge is inserted or removed in one of the
`for graphs
`Periodically, when a number of changes have occurred in a
`graph
`3. Whenever
`algorithm
`
`the coordinator needs
`
`to
`
`invoke
`
`the
`
`Let us consider option 1. Whenever an edge
`removed in a local graph, the local site must also send a mt:~ss<tge
`coordinator to notify it of this modification. On receiving such a
`the coordinator updates its global graph. Alternatively, a
`number of such changes in a single message periodically.
`previous example, the coordinator process will maintain the global "'701'~"-'tnr
`graph as depicted in Figure 18.4. When site B inserts the edge
`local wait-for graph, it also sends a message to the
`Similarly, when site A deletes the edge P5 ~ P 1, because P 1 has
`resource that was requested by
`an appropriate message is sent
`coordinator.
`When the deadlock-detection algorithm is invoked,
`searches its global graph. If a cycle is found, a victim
`rolled back. The coordinator must notify all the sites that a
`process has been selected as victim. The sites, in turn, roll back
`process.
`Note that, in this scheme (option 1), unnecessary rollbacks
`as a result of two situations:
`
`Figure 18.4 Global wait-for graph for Figure 18:3.
`
`

`
`18.5 Deadlock
`
`• False cycleso may exist in the global wait-for graph.
`point, we consider a snapshot of the system as depicted
`holding in
`Suppose that P2 releases the resource it
`in the deletion of the edge P1 ---? P2 in A. Process
`resource held by P3 at site B, resulting in the addition of
`P3 in B. If the insert P2 ---? P3 message from B arrives before
`---? P2 message from A, the coordinator may discover the
`---? P2 ---? P3 ---? P1 after the insert (but before the
`recovery may be initiated, although no deadlock has
`• Unnecessary rollbacks may also result when a deadlock
`occurred and a victim has been picked, but at the same
`processes was aborted for reasons unrelated to the
`the process exceeding its allocated time). For example,
`site A in Figure 18.3 decides to abort P2. At the same
`coordinator has discovered a cycle and picked
`and P 3 are now rolled back, although only
`
`Note that the same problems are inherited
`other two options (that is, options 2 and 3).
`Let us now present a centralized deadlock-detection aLJ;;,VLL
`option
`which detects all deadlocks that actually occur,
`detect false deadlocks. To avoid the report of false deadlocks,
`that requests from different sites be appended with
`(timestamps). When process Pi, at site A, requests a resource
`site B, a request message with timestamp
`is sent. The
`the label TS is inserted in the local wait-for of A. This
`the local wait-for graph of B only if B has received the
`and cannot immediately grant the requested resource. A
`Pj in the same site is handled in the usual manner;
`associated with the edge Pi ---? Pj' The detection
`follows:
`
`site A
`
`site B
`
`coordinator
`
`Figure 18.5 Local and global wait-for graphs.
`
`

`
`592 • Chapter 18: Distributed Coordination
`
`1. The controller sends an initiating message to each site in the system.
`2. On receiving this message, a site sends its local wait-for graph to the
`coordinator. Note that each of these wait-for graphs contains all the
`local information the site has about the state of the real graph. The
`graph reflects an instantaneous state of the site, but it is not
`synchronized with respect to any other site.
`3. When the controller has received a reply from each site, it constructs a
`graph as follows:
`
`a. The constructed graph contains a vertex for every process in the
`system.
`b. The graph has an edge Pi~ Pj if and only if (1) there is an edge Pi
`~ Pj in one of the wait-for graphs, or (2) an edge Pi ~ Pj with
`some label TS appears in more than one wait-for graph.
`
`We assert that, if there is a cycle in the construCted graph, then the
`system is in a deadlock state. If there isno cycle in the constructed graph,
`then the system was not in a deadlock state when the detection algorithm
`was invoked as result of the initiating messages sent by the coordinator (in
`step 1).
`
`18.5.2.2 Fully Distributed Approach
`In the fully distributed deadlock-detection algorithm, all controllers share
`equally the responsibility for detecting deadlock. In this scheme, every site
`constructs a wait-for graph that represents a part of the total graph,
`depending on the dynamic behavior of the system. The idea is that, if a
`deadlock exists, a cycle will appear in (at least) one of the partial graphs.
`We present one such algorithm, which involves construction of partial
`graphs in every site.
`Each site maintains its own local wait-for graph. A local wait-for graph
`in this scheme differs from the one described earlier in that we add one
`additional node Pex to the graph. An arc Pi~ Pex exists in the graph if Pi
`is waiting for a data item in another site being held by any process.
`Similarly, an arc P ex ~ Pj exists in the graph if there exists a process at
`another site that is waiting to acquire a resource currently being held by Pj
`in this local site.
`To illustrate this situation, we consider the two local wait-for graphs of
`Figure 18.3. The addition of the node Pex in both graphs results in the local
`wait-for graphs shown in Figure 18.6.
`If a local wait-for graph contains a cycle that does not involve node P ex'
`then the system is in a deadlock state. If, however, there exists a cycle
`involving Pex' then this implies that there is a possibility of a deadlock. To
`
`

`
`-18.5 Deadlock
`
`Figure 18.6 Augmented local wait-for graphs of
`
`site
`
`ascertain whether a deadlock does exist, we must invoke a
`deadlock-detection algorithm.
`the local wait-for graph
`Suppose that, at
`Si,
`involving node P ex· This cycle must
`of the form
`
`a
`
`in site si is waiting to
`which indicates that transaction
`say,
`On discovering this
`item located in some other
`sends to site Sj a deadlock-detection
`containing
`that cycle.
`receives this deadlock-detection message,
`When site
`local wait-for graph with the new information. Then, it
`constructed wait-for graph for a cycle not involving
`deadlock is found and an appropriate recovery scheme
`cycle involving
`discovered, then sj transmits a
`message to the appropriate
`say, Sk. Site Sk, in
`procedure. Thus, after a finite number of rounds, either a
`discovered, or the deadlock-detection computation halts.
`To illustrate this procedure, we consider the local
`Figure 18.6. Suppose that
`the cycle
`51
`
`Since P3 is waiting to
`a
`in
`a data
`sl to
`message describing that cycle is transmitted from
`site 52 receives this
`it updates its local wait-for
`the wait-for graph of
`18.7. This graph contains the
`
`Therefore, the
`which does not include node
`state and an appropriate recovery scheme must be invoked.
`
`a '1.4'-'<A'-'--'U'--"'
`
`

`
`II Chapter 18: Distributed Coordination
`
`Note that the outcome would be the same if site
`cycle first in its local wait-for graph and sent the
`message to site 5 1. In the worst case, both sites will discover the
`about the same time, and two deadlock-detection messages will
`one by 51 to
`and another by 52 to 51. This situation
`unnecessary message transfer and overhead in updating the
`wait-for graphs and searching for cycles in both graphs.
`To reduce message traffic, we assign to each transaction Pi a
`identifier, which we denote by ID(Pi). When site 5k discovers that
`wait-for graph contains a cycle involving node P ex of the form
`
`it sends a deadlock-detection message to another site only if
`
`ID(PK ) < ID(PK ).
`n
`1
`
`Otherwise, site 5k continues its normal execution, leaving the
`initiating the deadlock-detection algorithm to some other site.
`To illustrate this scheme, we consider again the wait-for
`maintained at sites 51 and 52 of Figure 18.6. Suppose that
`
`both sites discover these local cycles at about the same time.
`site 51 is of the form
`
`Since ID(P3) > ID(P2)1 site 5 1 does not send a deadlock-detection
`to site 52.
`
`Figure 18.7 Augmented local wait-for graph in site 52 of ·Figure
`
`

`
`18.6 Election Algorithms • 595
`
`The cycle in site 52 is of the form
`
`Since 1D(P2) < ID(P3), site 52 does send a deadlock-detection message to
`site 51, which, on receiving the message, updates its local wait-for graph.
`Site 5 1 then searches for a cycle in the graph and discovers that the system
`is in a deadlock state.
`
`18.6 • Election Algorithms
`
`As we pointed out in Section 18.3, many distributed algorithms employ a
`coordinator process that performs functions needed by the other processes
`in the system. These functions
`include enforcing mutual exclusion,
`·maintaining a global wait-for graph for deadlock detection, replacing a lost
`token, or controlling an input or output device in the system.
`If the
`coordinator process fails due to the failure of the site at which it resides,
`the system can continue execution only by restarting a new copy of the
`coordinator on some other site. The algorithms that determine where a
`new copy of the coordinator should be restarted are called election
`algorithms.
`Election algorithms assume that a unique priority number is associated
`with each active process in the system. For ease of notation, we assume
`that the priority number of process Pi is i. To simplify our discussion, we
`assume a one-to-one correspondence between processes and sites,. and
`thus refer to both as processes. The coordinator is always the process with
`the largest priority number. Hence, when a coordinator fails, the algorithm
`must elect that active process with the largest priority number. This
`number must be sent to each active process in the system. In addition, the
`algorithm must provide a mechanism for a recovered process to identify
`the current coordinator.
`In this section, we present two interesting examples of election
`algorithms for two different configurations of distributed systems. The
`first algorithm is applicable to systems where every process can send a
`message to every other process in the system. The second algorithm is
`applicable to systems organized as a ring (logically or physically). Both
`algorithms require n 2 messages for an election, where n is the number of
`processes in the system. We assume that a process that has failed knows
`on recovery that it indeed has failed and thus takes appropriate actions to ·
`rejoin the set of active processes.
`
`18.6.1 The Bully Algorithm
`Suppose that process Pi sends a request that is not answered by the
`coordinator within a time interval T. In this situation, it is assumed that
`
`

`
`596 • Chapter 18: Distributed Coordination
`
`tries to elect itself as the new
`the coordinator has failed, and Pi
`coordinator. This task is completed through the following algorithm.
`Process Pi sends an election message to every process with a higher
`priority number. Process Pi then waits for a time interval T for an answer
`from any one of these processes.
`· If no response is received within time T, Pi assumes that all processes
`with numbers greater than i have failed, and elects itself the new
`coordinator. Process Pi restarts a new copy of the coordinator and sends a
`message to inform all active processes with priority numbers less than i
`that Pi is the new coordinator.
`However, if an answer is received, Pi begins a time interval T', waiting
`to rec~ive a message informing it that a process with a higher priority
`number has been elected. (Some other process is electing itself coordinator,
`and should report the results within time T'.) If no message is sent within
`T', then the process with a higher number is assumed to have failed, and
`process Pi should restart the algorithm.
`If Pi is not the coordinator, then, at any time during execution, Pi may
`receive one of the following two messages from process Pj:
`
`1. Pj is the new coordinator (j > i). Process Pi, in turn, records this
`information.
`2. Pj started an .election (j < i). Process Pi sends a response to Pj and
`begins its own election algorithm, provided that Pi has not already
`initiated such an election.
`
`The process that completes its algorithm has the highest number and is
`elected as the coordinator. It has sent its number to all active processes
`with smaller numbers. After a failed process recovers, it immediately
`begins execution of the same algorithm. If there are no active processes
`with higher numbers, the recovered process forces all processes with lower
`numbers to let it become the coordinator process, even if there is a
`currently active coordinator with a lower number. For this reason, the
`algorithm is termed the bully algorithm.
`Let us demonstrate the operation of the algorithm with a simple
`example of a system consisting of processes P 1 through P 4. The operations
`are as follows:
`
`1. All processes are active; P 4 is the coordinator process.
`2. P 1 ai)d P 4 fail. P 2 determines P 4 has failed by sending a request that is
`not answered within time T. P2 then begins its election algorithm by
`sending a request to P 3.
`3. P3 receives the request, responds to P2, and begins its own algorithm
`by sending an election request to P 4 •

`
`

`
`18.6 Election Algorithms • 597
`
`4. P2 receives P3's response, and begins waiting for an interval T'.
`5. P 4 does not respond within an interval T, so P 3 elects itself the new
`coordinator, and sends the number 3 to P2 and P 1 (which P 1 does not
`receive, since it has failed).
`6. Later, when P 1 recovers, it sends an election request to P2, P3, and P4•
`7. P2 and P3 respond to P 1 and begin their own election algorithms. P 3
`will again be elected, using 'the same events as before.
`8. Finally, P4 recovers and notifies P 1, P2, and P3 that it is the current
`coordinator. (P4 sends no election requests, since it is the process with
`the highest number in the system.)
`
`18.6.2 Ring Algorithm
`The ring algorithm assumes that the links are unidirectional, and that
`processes send their messages to their right neighbors. The main data
`structure used by the algorithm is the active list, a list that contains the
`priority numbers of all active processes in the system when the algorithm
`ends; each process maintains its own active list. The algorithm works as
`follows:
`
`1. If process Pi detects a coordinator failure, it creates a new active list
`that is initially empty. It then sends a message elect(i) to its right
`neighbor, and adds the number i to its active list.
`2. If Pi receives a message elect(j) from the process on the left, it must
`respond in one of three ways:
`a. If this is the first elect message it has seen or sent, Pi creates a new
`active list with the numbers i and j. It then sends the message
`elect(i), followed by the message elect(j).
`j (that is, the message received does not contain P{ s ·
`b. If i =I=
`number), then Pi adds j to its active list and forwards the message
`to its right neighbor.
`c. If i = j (that is, Pi receives the message elect(i)), then the active list
`for Pi now contains the numbers of all the active processes in the
`system. Process Pi can now determine the largest number in the
`active list to identify the new coordinator process.
`
`This algorithm does not specify how a recovering process determines
`the number of the current coordinator process. One solution would be to
`require a recovering process to send an inquiry message. This message is
`forwarded around the ring to the current coordinator, which in turn sends
`a reply containing its number.
`
`

`
`598 • Chapter 18: Distributed Coordination
`
`18.7 • Reaching Agreement
`
`For a system to be reliable, we need a mechanism that allows a set of
`processes to agree on a common "value." There are several reasons why
`such an agreement may not take place. First, the communication medium
`may be faulty, resulting in lost or garbled messages. Second; the processes
`themselves may be faulty, resulting in unpredictable process behavior. The
`best we can hope for, in this case, is that processes fail in a clean way,
`stopping their execution without deviating from their normal execution
`pattern. In the worst case, processes may send garbled or incorrect
`messages to other processes, or even collaborate with other failed
`processes in an attempt to destroy the integrity of the system.
`This problem has been expressed as the Byzantine generals problem.
`Several divisions of the Byzantine army, each commanded by its own
`general, surround an enemy camp. The Byzantine generals must reach a
`common agreement on whether or not to attack the enemy at dawn. It is
`crucial that all generals agree, since an attack by only some of the divisions
`would result in defeat. The various divisions are geographically dispersed
`and the generals can communicate with one another only via messengers
`who run from camp to camp. There are at least two major reasons why the
`generals may not be able to reach an agreement:
`
`• Messengers may get caught by the enemy and thus may be unable to
`deliver
`their messages. This situation corresponds
`to unreliable
`communication in a computer system, and is discussed further in
`Section 18.7.1.
`• Generals may be traitors, trying to prevent the loyal generals from
`reaching an agreement. This situation corresponds to faulty processes
`in a computer system, and is discussed further in Section 18.7.2.
`
`18.7 .1 Unreliable Communications
`Let us _assume that, if processes fail, they do so in a clean way, and that
`the communication medium is unreliable. Suppose that process Pi at site
`·A, which. has sent a message to process P; at site B, needs to know
`whether Pj has received the message so that 1t can decide how to proceed
`with its computation. For example, Pi may decide to compute a function S
`if Pj has received its message, or to compute a function F if Pj has not
`received the message (because of some hardware failure).
`To detect failures, we ca:n use a time-out scheme similar to the one
`described in Section 16.4.1. When Pi sends out a message, it also specifies
`a time interval during which it is willing to wait for an acknowledgment
`message from Pt When p. receives the message, it immediatel

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket