`
`Peter A. Alsberg and John D. Day
`Center for Advanced Computation
`University of Illinois at Urbana-Champaign
`
`resilient protocols, resource sharing, dis-
`Keywords:
`tributed control, distributed computer systems, resil-
`ient resource sharing
`
`A technique is described which permits distributed
`resources to be shared (services to be offered) in a
`resilient manner.
`The essence of the technique is to a
`priori declare one of the server hosts primary and the
`others backups. Any of the servers can perform the
`primary duties.
`Thus the role of primary can migrate
`around the set of servers.
`The concept of n-host resil-
`iency is introduced and the error detection and recov-
`ery schemes for two-host resiliency are presented.
`The
`single primary, multiple backup technique for resource
`sharing is shown to have minimal delay.
`In the general
`case, this is superior to multiple primary techniques.
`
`Introduction
`
`be completely dissimilar (e.g., weather data may be
`stored on the ARPANET datacomputer and processed on the
`ILLIAC IV). Between these two extremes lie the re-
`source sharing concerns of interest to most users.
`
`The user expects a tolerable, as well as tolerant,
`resource sharing environment.
`The user we are inter-
`ested in wants a maximum degree of automation and
`transparency in his resource sharing. He wishes the
`resource sharing to be resilient to host failures and,
`when catastropic failures occur, he would like a "best
`effort" recovery to be automatically initiated by the
`resource sharing system.
`
`The concept of resiliency applies to
`Resiliency.
`A resilient ser-
`the use of a resource as a service.
`vice has four major attributes.
`
`The development of large packet switched networks
`servicing wide geographic areas has generated a great
`deal of interest in distributed resource sharing.
`A
`communications network is a necessary but, by itself,
`is not a sufficient basis to make automated distributed
`resource sharing facilities generally available. High-
`level protocols must be provided to allow cooperation
`in other than an ad hoc manner and techniques must be
`developed to provide resilient service to the user.
`This paper discusses one means by which resilient ser-
`vice may be provided to the user for a wide variety of
`situations, e.g., synchronization, data base access,
`and load sharing.
`
`For our purposes we will consider a distributed
`resource sharing environment which requires the sharing
`of resources dispersed over a large number of possibly
`heterogeneous host computers. Large packet switched
`computer networks like the ARPANET, CYCLADES, and EIN
`represent examples of this environment.
`Since the
`hosts in this environment may be separated by very
`large distances,
`there is a significant and unavoidable
`message delay between hosts. Hence, a major considera-
`tion when choosing a resource sharing strategy is to
`reduce, as much as possible,
`the number of message
`delays required to effect the sharing of resources.
`
`In these networks, some of the resources to be
`shared will be identical (e.g. duplicate copies of data
`bases may be maintained for reliability). Others will
`
`
`* This work was performed as part of Contract DCA100-75-
`C~0021 with the Command and Control Technical Center -
`WWMCCS ADP Directorate of the Defense Communications
`Agency.
`
`1.
`
`2.
`
`3.
`
`4.
`
`It is able to detect and recover from a given
`maximum number of errors.
`
`It is reliable to a sufficiently high degree
`that a user of the resilient service can
`ignore the possibility of service failure.
`
`If the service provides perfect detection and
`recovery from n errors,
`the (ntl)st error is
`not catastrophic.
`A "best effort" is made to
`continue service.
`
`The abuse of the service by a single user
`should have negligible effect on other users
`of the service.
`
`What we are trying to describe here are concepts of ex-
`treme reliability and serviceability.
`The user of a
`resilient service should not have to consider the fail-
`ure of the service in his design. He should be able to
`assume that the system will make a “best-effort" to
`continue service in the event that perfect service can-
`not be supported; and that the system will not fall
`apart when he does something he is not supposed to.
`
`In this paper we discuss a
`Resiliency Criteria.
`technique for providing resilient services. This tech-
`nique is resilient to communication system and host
`failures. Host failures include not only complete
`failure (e.g., a major hardware failure) but also par-
`tial failure (e.g., a malfunctioning host operating
`system). Resiliency cannot be perfect in the large
`network environments we are considering.
`It is, for
`instance, possible but not likely that all 50 of the
`hosts on a large computer network will simultaneously
`fail and all services will be disrupted. What is of
`interest is the establishment of criteria for accept-—
`able resiliency in this environment. We introduce the
`
`562
`
`EMCVMW 1030
`
`
`
`In order for service to
`concept of n-host resiliency.
`be disrupted, n hosts must simultaneously fail in a
`critical phase of service. We point out that it may
`be possible for n or more hosts to fail outside of such
`a critical phase without disrupting service.
`The re-
`siliency techniques discussed in this paper assume a
`two-host resiliency criterion. Expansion of the tech-
`niques to treat three-host or greater resiliency is
`straightforward.
`A two-host resiliency criterion has
`been used because it appears sufficient to provide an
`adequate level of service in most situations and to
`illustrate the principle.
`
`Examples. Examples of the kind of resilient ser-
`vices we envision are ntework synchronization primi-
`tives or a network virtual file system.
`The techniques
`discussed below can support synchronization primitives
`like P and V,
`lock and unlock, and block and wakeup in
`a resilient fashion on a network. Network virtual file
`systems which provide directory services and data
`access services can be provided in an automated and re-
`silient fashion.
`The network virtual file system would
`appear to be a single file system to the user, but
`would in fact be dispersed over a large number of pos~
`sibly heterogeneous hosts on a packet switched network.
`
`Related Work in Distributed Systems
`
`There are two main problems that are addressed by
`the technique we are presenting here:
`synchronization
`of the users of the service and the resiliency of the
`service. Other researchers have proposed techniques to
`achieve the synchronization but haven't treated the re-
`siliency issue carefully.
`
`Perhaps the first work in this area was by Johnson
`{1974].
`Johnson proposed that updates to a data base
`be timestamped by the host which generates the update.
`The updates are then broadcast to the copies of the
`data base.
`The data base managers then apply the up-
`dates in chronological order, as determined by time-
`stamps.
`(Ties are broken by an arbitrary ordering of
`the hosts.)
`Johnson's model introduces the problem
`that during some time interval the copies may be mutu-
`ally inconsistent due to message delays, etc. This
`system was primarily intended for an accounting file,
`in which updates are restricted to assignments of
`values to single fields.
`From the resiliency stand-
`point, it is difficult to ensure that the n~host crite-
`rion has been met and that all copies of the data base
`will eventually receive all of the updates.
`
`Bunch [1975] attempted to avoid some of the diffi-~
`culties of Johnson's scheme by introducing a central
`name (sequence number) generator. This approach has
`the additional problem of introducing a potential
`bottleneck. Grapa [1975] was able to avoid this prob-
`lem in his “reservation center" model. Grapa's model
`is somewhat more general than either the Bunch or
`Johnson model and in a sense includes them as limiting
`cases.
`
`Despite the fact none of these models treat the
`resiliency issues (they were never really intended to),
`there are also several problems that might be encoun-
`tered in more general data base environments. We have
`already mentioned the problem that for some time inter-
`val the data base may be inconsistent. This may cause
`problems for some applications. Also, an update opera-
`tion on one field may use values of other fields to
`In
`compute the new value (in an irreversible manner).
`this case,
`the Johnson and Grapa models must include a
`time delay before applying the updates to guarantee
`that there are no delayed updates with earlier time-
`stamps than those already received. Similarly, it is
`difficult for these models to provide a quick response
`
`The
`time for updates that modify multiple fields.
`It
`technique we describe here avoids these problems.
`provides the minimum response time allowed by the n-
`host resiliency criterion but requires a somewhat more
`complex mechanism.
`
`A Technique for a Resilient Service
`
`The pacing
`Consider synchronization on a network.
`item in a network synchronization operation is network
`message delay time. Network message delay is on the
`order of 100 milliseconds.
`The execution of a process
`synchronization primitive in the typical single site
`environment is on the order of .1 to 1 milliseconds.
`The processing incurred at the site is expected to be
`the same for both network and local operation. Asa
`result, an appropriate measure of the efficiency of a
`network scheme is the number of message delays incurred.
`
`As we have indicated above, what we are interested
`in is a method by which we can provide resilient sup-
`port for some distributed resource sharing activity.
`For purposes of illustration, let us assume we have
`some sort of data base (in the general sense) which is
`being read and modified by a group of network users.
`Let us consider, at least for purposes of description,
`that there is a set of server hosts which do nothing
`but perform the updates and mediate the synchronization
`of these updates generated by user processes.
`(This
`may appear to be somewhat excessive for the practical
`case; but if one is really concerned about having a
`reliable service, it is unwise to make it susceptible
`to the kind of environment found in the typical appli-
`cation host. However,
`there is nothing about this
`scheme that requires that the synchronizing function be
`in a devoted host.) One of the hosts of this set is
`designated as the primary and the rest are backups.
`The backups are ordered in a linear fashion. We will
`discuss recovery schemes in a subsequent section.
`For
`now, let us consider how the resiliency scheme works
`without failures.
`
`to the primary or to
`Update operations may be sent
`any backup.
`The user process then blocks, waiting for
`either a response from the service or a timeout indi-
`cating that the message has been lost and should be
`retransmitted.
`
`For the purposes of this discussion we will ignore
`to some extent the details of the end~to-end transmis-
`sion.
`Some of the ACK's and timeouts mentioned below
`may be provided by an end-to-end protocol such as those
`described in Cerf and Kahn [1974] and Cerf et al.
`[1975].
`Im addition,
`the communication between the
`user and the service could be a single message connec-
`tion to the service.
`Such a connection would take more
`than one message to convince both sides that no mes-
`sages have been lost or duplicated [Belsnes, 1975].
`However, for our purposes we are mainly interested in
`the delays incurred. Although multiple parallel mes-
`sages may be generated,
`the number of sequential mes-
`sage delays will be inherent to any system performing
`this service.
`
`Dedicated Servers. Figure 1 shows the message
`flow for an update operation which has been transmitted
`to the primary server host of a data base.
`The first
`network message delay is incurred in figure la.
`The
`application host transmits the update to the primary
`server host.
`
`The second network message delay is incurred in
`figure 1b.
`The primary server host requests cooperation
`in executing the update operation from the first backup
`server host.
`The primary server host has already up-
`dated its data base.
`The first backup synchronization
`
`563
`
`
`
`server
`host ,
`
`server
`host,
`
`server
`host 3
`
`.
`
`°
`
`server
`hostp
`
`
`
`
`
`update
`request
`
`la: Application host transmits update request to primary server host.
`
`cooperate
`
`request
`
`
`
`server
`host,
`
`server
`host»
`
`server
`host 3
`
`7s
`
`applic.
`host
`
`lb:
`
`Primary server host requests cooperation from the first
`backup in executing the update request.
`
`3: cooperate
`
`1: backup
`
` request
`
`server
`host 5
`
`server
`host 1
`
`server
`host 5
`
`server
`host
`
`“ee
`
`lc: First backup issues three messages in the following order:
`1.
`A backup for an update request is sent to the next backup host.
`2.
`An acknowledgement message is sent to the application host.
`3.
`An acknowledgement of the cooperate message is sent to
`the primary server host.
`
`Figure 1
`
`Update request sent to a primary server host
`
`564
`
`
`
`The backup host
`host will perform the same update.
`will be expected to issue the update ACK message to the
`application host.
`
`In figure lc the third network message delay is
`incurred. Three messages are transmitted by the first
`backup server host.
`In terms of network delay,
`these
`messages are essentially simultaneously transmitted.
`Small
`improvements in resiliency can be achieved by
`issuing them in the designated order. First,
`the back-
`up server host passes a backup update message to the
`next backup server host. At this time only two server
`hosts,
`the primary and the first backup, have positive
`knowledge of the existence of the update operation.
`Should the backup message be successfully received at
`the second backup server host, a third server host
`would also be aware of the update operation.
`The third
`host would be able to assist in recovery should the
`first backup server host or network fail to transmit
`the next two messages.
`The second "simultaneous" mes-
`sage would be the update ACK message to the application
`host.
`The third "simultaneous" message would be trans-
`mitted back to the primary server host to acknowledge
`that the cooperation request on an update operation has
`been received.
`
`Once the primary server host has received the co-
`operation acknowledgement, it is certain that the two-
`host resiliency criterion has been met. Similarly,
`once the application host has received the update ACK
`message it is also certain that the two-host resiliency
`criterion has been met.
`Should the primary server host
`fail to receive the cooperation acknowledgement, appro-
`priate retry and recovery techniques will be initiated.
`
`Figure 2 shows the message flow for an update
`operation which has been transmitted to a backup server
`host.
`The first network message delay is incurred in
`figure 2a.
`The application host transmits the update
`to a backup server host.
`
`The second network message delay is incurred in
`figure 2b.
`The backup server host forwards the update
`operation to the primary server host.
`The application
`hosts have no knowledge of the ordering of server hosts.
`However, each of the server hosts is assumed to have
`explicit knowledge of the ordering.
`The backup server
`host performs no updates on the data base. All updates
`must be initiated by the primary server host. However,
`the backup now has knowledge of the existence of the
`update request from the application host.
`It will not
`discard this request until a backup message referring
`to that same update operation ripples down the backup
`chain and through it.
`
`In figure 2c the third network message delay is
`incurred. Three messages are transmitted by the pri-
`mary server host. As was the case previously,
`these
`messages are essentially simultaneous but a specific
`ordering can provide some small improvements in resil-
`iency. First, a backup message is sent to the first
`backup server host.
`Second, an update ACK message is
`transmitted to the application host since the two~host
`eriterion has now been met. Third, a forward message
`acknowledgment is transmitted to the forwarding backup
`host.
`The message flow is summarized in figure 3.
`
`In a service environment
`Participating Servers.
`where there is no special set of hosts dedicated to the
`service, updates from a user on one of the hosts parti-
`cipating in the service will only experience two net-
`work delays as opposed to the three found in the dedi-
`cated host case. Figure 4 shows that the first delay
`is generated when the host in which the update was gen-
`erated sends the update to the primary as a forward re-
`quest.
`(Note that since members of the service will
`most likely maintain the necessary connections among
`
`each other, many of the single message connection dif-
`ficulties can be avoided in this case.)
`The second
`delay is incurred when the primary responds with a for-
`ward ACK message to the originating backup host.
`The
`primary also sends the backup request to the first
`backup server.
`From this point on,
`the procedure is
`identical to the dedicated server scheme.
`
`The backup ser-
`Alternative Backup Architectures.
`vers have been arranged in a linear, ordered string.
`This is not essential. We have used the linear archi-
`tecture in this paper for several reasons.
`It is easy
`to describe.
`It is one example of the single primary,
`multiple backup strategy for resilient resource sharing.
`It is also a minimum delay scheme for two~host resil-
`iency.
`An example of a non-linearly ordered backup
`scheme is a broadcast scheme.
`In this scheme the pri-
`mary broadcasts backup messages simultaneously to all
`backups.
`The broadcast scheme also has minimum delay.
`It requires fewer total messages than the linearly
`ordered scheme, but error recovery is more complex.
`Grapa is currently investigating the range of feasible
`backup architectures.
`
`Summary. Resiliency is achieved in this scheme by
`a combination of techniques.
`The basic organization of
`the resiliency scheme provides the skeleton on which to
`construct the resilient service.
`The additional mecha-
`nisms used for a particular application will depend
`heavily on the degree of resiliency required. This
`additional resiliency is gained by applying a combina-
`tion of sequence numbering schemes and ACK and time-out
`mechanisms.
`For instance,
`to get two-host resiliency
`for updates being passed down the chain, a "Backup for-
`warded ACK" is used in the following way:
`
`When a backup server host receives the "backup
`ACK" corresponding to the backup message sent to its
`right~hand neighbor (see figure 3), it sends a "backup
`forwarded ACK" to its left-hand neighbor. This assures
`that neighbor that the update has progressed to at
`least the second backup beyond itself.
`
`Also, for most applications one sequence number
`scheme can be applied to the messages to detect lost or
`duplicate messages.
`A second sequence number scheme
`can be applied to the requests themselves. This allows
`proper recovery in the event of failures.
`It also de-
`fines the order in which requests will be applied to
`the data base.
`
`There are two properties of this scheme that should
`be noted. First, regardless of where the user process
`sends the update request, he will get a response in
`three message delay times.
`(If the synchronizing scheme
`is moved into the application hosts,
`this delay can be
`cut to two message times.)
`Second,
`two nearly simul-
`taneous host falures during a small critical interval
`are required to disrupt the scheme.
`
`Failure Detection and Recovery
`
`The detection of failures may
`Failure Detection.
`be accomplished in a variety of ways. Clearly,
`the
`time-outs associated with the ACK's will allow the sys-
`tem to detect a failure during the course of performing
`a request.
`If there are relatively long idle periods
`between requests, and if one wants to avoid the delays
`required to recover from a failure, it may be useful,
`for some applications,
`to have a low level "are you
`alive" protocol among the members of the chain. Other-
`wise,
`the error will not be detected until the next
`request is sent.
`
`There are basically two kinds of failures which
`must be handled:
`1) host failure and 2) network parti-
`tion. Recovery from a host failure is relatively
`
`565
`
`
`
`cee
`
`See
`
`server
`hosty
`
`server
`host,
`
`server
`hosts
`
`hosts
`
`
`server
`
`2a: Application host transmits update request to a backup server.
`
`forward
`
` request
`
` server
`host
`
`server
`host,
`
`server
`host,
`
`
`
`server
`host,
`
`applic.
`host
`
`2b: Backup host forwards update request to the primary host.
`
`3:
`
`forward
`
`
`
` 1: backup
`
`eee server
`
`
` server
`host,
`
`host 5
`
`server
`
`host 5
`
`2c: Primary host issues three messages in the following order:
`
`1.
`A backup for an update request is sent to the first backup host.
`2.
`An acknowledgement message is sent to the application host.
`3. An acknowledgement of the forwarding message is sent to the
`forwarding backup host.
`
`Figure 2
`
`Update request sent to a backup server host
`
`566
`
`
`
` cooperate
`
`
`
`ee
`
`server
`
`Figure 3
`
`Summary of the message flow for the resiliency scheme.
`(BF ACK refers to the Backup Forwarded ACK mentioned in the text. )
`
`forward
`
`request
`
`
`
`server
`hosty
`
`server
`host,
`
`server
`host
`
`server
`hostg
`
`4a:
`
`An update request generated by the third host journalizes the request
`and sends a forward request message to the primary.
`
`forward
`ACK
`
`server
`
` hosto
`
`
`
`server
`host;
`
`server
`hosto
`
`
`
`server
`hostz
`
`server
`hosta
`
`4b:
`
`The two host criteria has now
`The primary records the update.
`been fulfilled.
`The primary sends two messages:
`1.
`a forward acknowledgement back to the third host.
`2.
`a backup request to next backup host.
`
`backup
`
`host,
`
`server
`hosty
`
`server
`host,
`
`server
`host,
`
`server
`hostz
`
`server
`
`ACK
`
`4c:
`
`The next backup host records the update and sends a backup request
`to next server and acknowledges the one he received.
`
`Application of the Resiliency Scheme for Undedicated Hosts
`
`Figure 4
`
`567
`
`
`
`straightforward and will be discussed in the next sec-
`tion. However, operation during a network partition is
`much more difficult to handle and for a majority of
`applications will probably consist of providing very
`degraded service.
`
`To give the reader an idea of the complexity of
`providing service across partitions, let us consider
`the case where as close to full service as possible is
`provided. First, each side must organize itself into a
`resilient system and have a way to rectify the exis-—
`tence of the two primaries when the partition is re-
`paired.
`It must be possible to restore the data base
`to the state it was just before the partition and to
`journalize all updates made during the partition. When
`the partition is repaired the update journals of both
`sides must be merged according to the chronological
`order in which the updates were generated.
`If the same
`event has been observed and entered by groups on both
`sides of the partition,
`the journals may contain dupli-
`cate entries. Duplicates must be recognized and all
`but one discarded. It should be further noted that
`answers to queries submitted by a partitioned subset
`may be inconsistent with answers given queries after
`the partition has been repaired.
`
`Since network partitions are so difficult to han~
`dle it is highly desirable that they be very infrequent.
`It may appear on the surface that this problem is
`easily solved by proper network topology.
`To a degree
`this is the case. But the solution is also highly
`dependent on how much information the subnet returns
`about failures.
`Suppose the communications subnet only
`indicated whether or not it could deliver a message.
`Then every apparent failure would have to be treated as
`a possible partition, and the rather expensive parti-
`tioned mode of operation would have to be initiated.
`However, if the subnet distinguishes between "I was
`unable to deliver the message" and "I got the message
`to the destination node, but the host is not servicing
`the interface"; it would be possible to classify many
`of the failures as host failures and take a less expen-
`sive recovery procedure. There would still be a small
`group of failures that would have to be treated as
`partitions until the communications were restored and
`it was determined whether or not a partition had
`actually occurred.
`
`Host Failure Recovery. Although much of the
`detail for failure recovery will depend heavily on the
`application and the degree of resiliency desired, it is
`possible to describe the basic mechanism by which host
`failures and network partitions are recovered. Let us
`consider the case of a host failure first (see figure
`5). Assume that the subnet has notified a service host
`that messages to that host cannot be delivered because
`the host is dead.
`In figure 5 the dead host is a back-
`up host.
`(If the primary dies a new primary must be
`elected. There are a variety of criteria that could be
`used.
`A simple algorithm would be to designate the
`first backup host as the new primary.)
`
`In figure 5 we assume that the adjacent upstream
`host is notified of the failure.
`It will notify the
`primary of the failure so that the primary may delete
`the host from the backup table.
`The primary will then
`pass a "structure modification" message along the chain.
`(The "restructuring host",
`the one that started this
`recovery, will set a time-out waiting for the "struc~
`ture modification" message to ripple down the chain.)
`After the "structure modification" has rippled back to
`the "restructuring host", it will attempt to establish
`communication with the next live downstream host and
`continue the propagation of the “structure modification
`message".
`The "restructuring host" will set a time-out
`and wait for the return ACK's to propagate. Once the
`“restructuring host" has received the restructuring
`
`ACK from the downstream host, it will then send any
`“backup requests" it has been holding on down the chain
`and normal operation has resumed.
`
`When a host comes up after a crash it will send an
`“initialization request" to some host in the service.
`If that host is not the primary, it will forward the
`request on to the primary.
`The primary will add the
`new host to its tables and pass a message down the
`chain indicating that the other hosts in the service
`should add the new host to their tables.
`The primary
`will also assign one host (possibly the last one in the
`chain) to bring the newcomer up-to-date.
`How the new
`host is brought up to date depends on the application.
`It may be done by transferring to that host the journal
`of all updates since the host went down.
`It may re-
`quire transferring the data base.
`
`Note that there is no protection from a malicious
`backup server declaring itself to be the new primary,
`While malicious users can be addressed by this resil-
`iency approach, servers must be benevolent.
`The
`approach to primary recovery discussed here is not
`the
`resilient to single host error when, for example,
`single host declares itself primary. Further work is
`required to make host failure recovery two-host
`resilient.
`
`Apparent Network Partition Recovery. Let us now
`consider the problem of an apparent network partition.
`In this case the subnet has notified the host that it
`could not deliver a message.
`For some reason the mes-
`sage did not get as far as the destination node. Per-
`haps, after some number of retries, this host has a
`reasonable suspicion that the network has partitioned.
`It will then broadcast “are you alive" messages to
`everyone in the service. After some time period, it
`will assume that all responses that can arrive have
`arrived.
`It will then modify its structure tables
`according to the responses, and send messages to the
`other members with which it can communicate to do the
`same.
`If this fragment of the partition has the old
`primary in it, the primary will coordinate partitioned
`mode operation.
`If not, a new primary may be chosen by
`whatever algorithm is fitting, depending on the level
`of partitioned service that is desired.
`The service
`then enters partitioned operation mode. As mentioned
`above, what
`the service does in this case will depend
`heavily on the application,
`the degree of resiliency
`desired, and the frequency of partition.
`In the genera]
`case,
`two or more partitions can produce incompatible
`states that cannot be joined later.
`Thus,
`the opera-
`tion of a service, while the network is partitioned,
`ean easily span the entire spectrum from doing nothing
`to the rather complex scheme described above.
`
`Alternative Resource Sharing Strategies
`
`We have proposed the use of a single primary with
`multiple backups to support resilient resource sharing.
`The alternative to this approach is to share primary
`duties among several members of the resource set. This
`can take the form of designating all members of the
`resource set as primary or some subset as the group of
`primaries and another subset as the group of backups.
`In the case of two-host resiliency, it has been shown
`that the single primary, multiple backup strategy pro-
`duces the theoretically minimum message delay that en-
`sures the resiliency criteria have been met.
`
`Let us consider the case where there is more than
`one primary.
`In the general case the primary which re-
`ceives a service request must synchronize the execution
`of that service request with all other primaries.
`Otherwise,
`the system cannot guarantee that service re-
`quests are executed in the same order at all resource
`sites.
`(This requirement is essential in the general
`
`568
`
`
`
`server
`
`
`
`
`hosto
`
`struc.
`mod.
`
`server
`
`host,
`
`hosto
`
`server
`
`
`
`
`
`server
`hostg
`
`server
`hosts
`
`5a: Host g sends the primary a structure modification message to
`notify it that host, has failed.
`
`
`
`server
`server
`
`hosto
`host,
`
`
`
`.mod.
`mod.
`
`server
`host 2
`
` server
`
`host4
`
`
`
`server
`hosts
`
`5b:
`
`The modification propagates back down to the instigating host who
`then establishes a connection with the next available host and
`notifies it of the change,
`
`server
`host
`
`server
`host,
`
`server
`host,
`
`
`
`ack.
`
`ack.
`
`struc.
`mod.
`
`mod.
`
`server
`host,
`
`server
`host,
`
`Sc:
`
`The modification propagates to the end of the chain while acknowledge-
`ments are used guarantee that the messages arrived safely.
`
`Figure 5
`
`Restructuring after a Host Failure
`(Host , has failed and it has been detected by host
`
`2”
`
`case. There may be specific applications where the
`nature of the service permits the out of order pro-
`cessing of requests. An example is an inventory system
`where only increments and decrements to data fields are
`permitted and where instantaneous consistency of the
`data base is not a requirement.)
`The synchronization
`of multiple processes reduces to the execution of an
`algorithm in each of the processes that will result in
`distinguishing one process.
`The distinguished process
`then establishes, for example,
`the order in which opera-
`tions will be performed, notifies the other primaries
`of its decision and then relinquishes its distinguished
`role,
`
`In the single primary case the distinguished re~
`source is designated a priori. Hence, any additional
`message traffic, processing load, or protocol complexity
`to distinguish a primary is avoided.
`Instead emphasis
`is placed on electing a new primary should the original
`primary fail.
`
`An alternative strategy may require all members of
`a resource set to be primary or only some of those mem-
`bers to be primary. However,
`the requirement for syn~
`chronization tends to increase processing load at each
`host, message traffic in the communications subnet, and
`the complexity of the service protocols. At the same
`time,
`there is no increase in resiliency or decrease in
`delay.
`Thus a multiple primary strategy can never be
`superior,
`in the general case,
`to a single primary
`
`the single primary, multiple backup
`strategy. Hence,
`strategy is, in a sense,
`fundamental to resilient, dis-
`tributed resource sharing.
`
`Range of Application
`
`The resilient resource sharing strategy discussed
`above can be applied to a wide range of distributed
`system services.
`In particular,
`the authors have
`studied the questions of resilient network synchroniza-
`tion, resource directories, data access and load
`sharing.
`In all cases the resiliency technique seems
`to provide a convenient framework to support automated
`distributed resource sharing.
`
`The application of
`Synchronization Primitives.
`the resiliency technique to the support of synchroniza-
`tion primitives is straightforward. Service requests
`are transmitted to the synchronization service host
`exactly as shown in figures 1 and 2.
`The synchroni-
`zation primitives can be traditional P and V, block and
`wakeup,
`lock and unlock, and similar primitives. When
`a process requests synchronization service (e.g., a P,
`a lock, or a block) it transmits this primitive request
`to one of the synchronization service hosts.
`The
`acknowledgment returned by a synchronization host will
`be either a block or proceed message. This tells the
`requesting process whether it is prevented from or per-
`mitted to enter its critical section.
`If the process
`
`569
`
`
`
`References
`
`Belsnes, Dag
`IEEE Transactions
`"Single-Message Commu