`
`Theory of
`Computing
`Systems
`© 1998 Springer-Verlag
`New York Inc.
`
`Verification of Cache Coherence Protocols by
`⁄
`Aggregation of Distributed Transactions
`
`S. Park and D. L. Dill
`
`Department of Computer Science, Stanford University,
`Stanford, CA 94305, USA
`spark@cs.stanford.edu
`dill@cs.stanford.edu
`
`Abstract. This paper presents a method to verify the correctness of protocols and
`distributed algorithms. The method compares a state graph of the implementation
`with a specification which is a state graph representing the desired abstract behavior.
`The steps in the specification correspond to atomic transactions, which are not atomic
`in the implementation.
`The method relies on an aggregation function, which is a type of abstraction
`function that aggregates the steps of each transaction in the implementation into a
`single atomic transaction in the specification. The key idea in defining the aggre-
`gation function is that it must complete atomic transactions which have committed
`but are not finished.
`This paper illustrates the method on a directory-based cache coherence pro-
`tocol developed for the Stanford FLASH multiprocessor. The coherence protocol
`consisting of more than a hundred different kinds of implementation steps has been
`reduced to a specification with six kinds of atomic transactions. Based on the re-
`duced behavior, it is very easy to prove crucial properties of the protocol including
`data consistency of cached copies at the user level. This is the first correctness proof
`verified by a theorem-prover for a cache coherence protocol of this complexity. The
`aggregation method is also used to prove that the reduced protocol satisfies a desired
`memory consistency model.
`
`⁄
`
`2-891.
`
`This research was supported by the Advanced Research Projects Agency through NASA Grant NAG-
`
`Petition for Inter Partes Review of
`U.S. Pat. No. 7,296,121
`IPR2015‐00158
`EXHIBIT
`Sony‐
`
`
`
`356
`
`1.
`
`Introduction
`
`1.1. Basic Idea
`
`S. Park and D. L. Dill
`
`Protocols for distributed systems often simulate atomic transactions in environments
`where atomic implementations are impossible. This observation can be exploited to make
`formal verification of protocols and distributed algorithms using a computer-assisted
`theorem-prover much easier than it would otherwise be [34]. Indeed, the techniques de-
`scribed below have been used to verify safety properties of significant examples: the cache
`coherence protocol for the FLASH multiprocessor which is currently being designed at
`Stanford [15], [20], a majority consensus algorithm for multiple copy databases [41],
`[18], and a distributed list protocol [9].
`The method proves that an implementation state graph is consistent with a speci-
`fication state graph that captures the abstract behavior of the protocol, in which each
`transaction appears to be atomic. The method involves constructing an abstraction func-
`tion which maps the distributed steps of each transaction to the atomic transaction in the
`specification. We call this aggregation, because the abstraction function reassembles the
`distributed transactions into atomic transactions.
`This method addresses the primary difficulty with using theorem-proving for ver-
`ification of real systems, which is the amount of human effort required to complete a
`proof, by making it easier to create appropriate abstraction functions. Although our work
`is based on using the PVS system from SRI International [33], the method is useful with
`other mechanical theorem-provers, or manual proofs.
`Although finite-state methods (e.g., [32], [10], [17], and [19]) can solve many of the
`same problems with even less effort, they are basically limited to finite-state protocols.
`Finite-state methods have been applied to non-finite-state systems in various ways [38],
`but these techniques typically require substantial pencil-and-paper reasoning to justify.
`Moreover, it is not obvious how to apply these extensions to the examples we verified
`using aggregation. Theorem-provers make sure that such manual reasoning is indeed
`correct, in addition to making available the full power of formal mathematics for proof,
`so they can routinely deal with problems that cannot yet be solved by any finite-state
`methods.
`For our method to be applicable, the description must have an identifiable set of
`transactions. Each transaction must have a unique commit point [16], from which a
`state change cannot be aborted (usually, it is the point at which a state change first
`becomes visible to the specification). The most important idea in the method is that the
`aggregation function can be defined by completing transactions that have committed but
`not yet completed. In general, the steps to complete separate transactions are independent,
`which simplifies the definition of this function. In our experience, this guideline greatly
`simplifies the definition of an appropriate aggregation function.
`The same idea of aggregating transactions can be applied to reverse engineer a spec-
`ification where none exists, because the specification with atomic transactions is usually
`consistent with the intuition of the system designer. We extract a specification model
`which performs transactions atomically at their commit steps in the implementation,
`and does nothing at other steps. The extracted specification provides an illusion that the
`transactions take effect instantaneously at the commit steps in the implementation.
`
`
`
`Verification of Cache Coherence Protocols by Aggregation of Distributed Transactions
`
`357
`
`If the extracted specification is not obviously complete or correct, it can instead be
`regarded as a model of the protocol having an enormously reduced number of states. The
`amount of reduction is much more than other reduction methods used in model checking,
`such as partial order reduction, mainly because the state variables of the reduced system
`are only those relevant to the specification, without variables such as local states and
`communications buffers.
`The major contribution of this work is to reduce the effort required to prove the
`correctness of certain classes of cache coherence protocols (and possibly other types
`of distributed algorithms). The methodology requires extracting or defining transaction-
`oriented specifications. Once the specifications are in this form, aggregation provides a
`simple method for defining abstraction functions based on completing unfinished trans-
`actions across distributed processing elements. The effectiveness of aggregation is shown
`by the example we present below: FLASH is the most complex multiprocessor cache
`coherence protocol that has been formally verified using a theorem-prover.
`
`1.2. Related Work
`
`1.2.1. Verification of Cache Coherence Protocols. One widely used technique for val-
`idating cache coherence protocols is finite-state methods (e.g., model checking). Finite-
`state methods enumerate the states of the reachable state graph of the system, searching
`for states that violate a specified property [31], [5], [40], [32], [17], [19]. These methods
`suffer from the state explosion problem: the number of states for nontrivial numbers of
`processors and cache lines is very large. Another problem with model checkers is that
`it is very difficult to specify correctness conditions of the protocol using notations such
`as Mur’ or temporal logic. The specification is the corresponding memory model of the
`protocol so it is required to encode a full memory model in temporal logic.
`Symbolic state models proposed by Pong and Dubois [38], [37] reduce the state
`explosion problem by using symbolic states which abstract away from the exact number
`of configurations of replicated identical components by recording only whether there
`are zero, one, or more than zero replicated components. However, as in model checking,
`there still remains the problem of specifying the protocol: It is not easy to find a set
`of properties (in their notation), which completely describes the correct behavior of the
`protocols. Moreover, their method requires the user to write an abstract description of
`the protocol to be verified, which raises another verification problem: Are the abstract
`description and the actual protocol equivalent?
`Another approach to formal verification is computer-assisted theorem-proving.
`Theorem-provers make available the full power of formal mathematics for proof, so
`they can routinely deal with problems that cannot yet be solved by any finite-state meth-
`ods. However, the major problem with theorem-proving is that considerable labor is
`required. Consequently, previous theorem-proving approaches have not been able to
`verify a problem of the scale of a full multiprocessor cache coherence protocol. The
`most significant result before our work is a manual proof of “lazy caching,” a simple
`and abstract cache coherence algorithm [2], [13], [21]. It should be noted that using a
`theorem-prover typically increases the labor required to complete a proof compared with
`manual proof—however, the results are much more likely to be correct.
`
`
`
`358
`
`S. Park and D. L. Dill
`
`1.2.2. Abstraction Function. The idea of using abstraction functions to relate imple-
`mentation and specification state graphs is very widely used, especially when manual or
`automatic theorem-proving is used [30], [29], [22] (indeed, whole volumes have been
`written on the subject [8]). The idea has also been used with finite-state techniques [19],
`[11].
`Ladkin et al. [21] have used a refinement mapping [1] to verify a simple caching algo-
`rithm. Their refinement mapping hides some implementation variables, which may have
`the effect of aggregating steps if the specification-visible variables do not change. Our
`aggregation functions generalize on this idea by merging steps even when specification-
`visible variables change more than once. This happens in most cache coherence protocols
`for distributed systems. For example, in the FLASH protocol described later in this paper,
`the cache states (specification variables) can change twice during a write transaction:
`first, if another cache has a copy in exclusive state, the state must transition to invalid;
`then the requesting cache state changes from invalid to exclusive when the cache receives
`data. In this case, our aggregation function modifies the cache state and data to complete
`the transaction.
`A more limited notion of aggregation than ours is found in [24] and [25], where a
`state function undoes or completes an unfinished process. The method only aggregates
`sequential steps within a local process. The idea of an aggregated transaction has been
`used to prove a protocol for database systems [36], where aggregation is obtained also in
`a local process by showing the commutativity of actions from simple syntactic analysis.
`Ours is the first method that aggregates steps across distributed components.
`In program verification, proofs can be simplified by pretending that a statement is
`atomic if its execution contains at most one access of a shared variable. This is the so-
`called “single-action rule” [28], [12], [26]. The single-action rule is generalized in [27].
`This method classifies program statements as “left-movers” or “right-movers” depend-
`ing on their commutative properties. Using these properties, the statements are permuted
`to obtain a coarser-grained version of the program, for which safety properties can be
`checked.
`Cohen used an idea similar to aggregation to prove global progress properties by
`combining progress properties of local processes [6]. The idea of how to construct our
`aggregation function was inspired by a method of Burch and Dill for defining abstraction
`functions when verifying microprocessors [3].
`
`1.3. Contents of the Paper
`This paper is organized as follows. Section 2 defines the verification goal and Section 3
`presents the verification method. Section 4 describes the FLASH cache coherence proto-
`col in two ways: in terms of transactions and per-node-based steps. Section 5 illustrates
`the method on the FLASH protocol. Using the reduced model obtained by aggrega-
`tion, Section 6 proves that one of the two distinct modes supported by the protocol
`implements a sequential consistency memory model. Finally, Section 7 concludes and
`proposes possible lines of future research.
`
`2. Verification Objective
`
`The goal of formal verification is usually to show that two alternative descriptions of the
`same behavior are consistent. The notion of consistency varies according to the details
`
`
`
`Verification of Cache Coherence Protocols by Aggregation of Distributed Transactions
`
`359
`
`of the verification problem. The aim of this section is to define the objective of formal
`verification precisely for transaction-oriented protocols. To do so, we must first describe
`our model of these protocols.
`The verification method begins with two logical descriptions: a description of the
`state graph of the implementation, and a description of the state graph of the specification.
`The implementation description contains a set of state variables, which is partitioned
`into specification variables and implementation variables. The set Q of states of the
`implementation is the set of assignments of values to state variables. The description of
`the implementation also includes a logical formula defining the relation between a state
`and its possible successors. The next state choices are represented by a set of functions F
`from states to states. Each function in F maps a given implementation state to a possible
`successor state. The implementation is nondeterministic ifF has more than one function.
`We could also have represented the next state choices as a relation, but the “set of
`functions” representation is much more suitable for the verification method we describe
`below. It is also easy to represent protocols in this way. For example, languages of iterated
`guarded commands, such as Murphi [10] and UNITY [4] can be translated directly into
`the above representation.
`The description of the specification state graph is similar to the implementation
`description. A specification state is an assignment of values to the specification variables
`of the implementation (implementation variables do not appear in the specification).
`Also, every state in the specification has a transition to itself, which we call an idle
`step. The idle steps are necessary to represent implementation steps that do not change
`specification variables.
`The verification method relies on there being a set of transactions which the com-
`putation is supposed to implement. A transaction is atomic at the specification level,
`meaning that it occurs in a single state transition in the specification. However, transac-
`tions in the implementation are nonatomic; they may involve many steps that are executed
`in several different components of the implementation.
`Each transaction in the implementation must have an identifiable commit step. In-
`tuitively, when tracing through the steps of a transaction, the commit step is the im-
`plementation step that first makes an inevitable change in the specification variables.
`Implementation states that occur before the transaction or during the transaction but be-
`fore the commit step are called precommit states for that transaction. The transaction is
`complete when the last specification variable change occurs. The states after the commit
`step but before the completion of the transaction are called postcommit states for the
`transaction. A state where every committed transaction has completed is called a clean
`state.
`Formally, all of the above concepts can be derived once the postcommit states are
`known for each transaction. The precommit states for the transaction are the states that
`are not postcommit; the commit step for a transaction is the transition from a precommit
`state to a postcommit state for that transaction; and the completion step is the transition
`from a postcommit state to a precommit state. A state is clean if it is a precommit state
`for every transaction.
`We can now describe our objective in formally verifying transaction-oriented pro-
`tocols. We suppose that the designer of a protocol understands how the implementation
`steps correspond to the transactions he or she is trying to implement. This correspon-
`dence is represented formally as a function that maps each implementation step to the
`
`
`
`360
`
`S. Park and D. L. Dill
`
`transaction that it implements. We call this the transaction mapping M. Specifically, the
`transaction mapping maps each commit step to the appropriate atomic transaction; steps
`that are not commit steps for any transaction map to the idle specification step.
`There is a natural correspondence between an implementation state q and the spec-
`ification state proj.q/ that is obtained by projecting away the implementation variables
`(and leaving the specification variables untouched). When q is clean, proj.q/ is the spec-
`ification state that q implements. However, when q is not clean, proj.q/ is not necessarily
`meaningful, and the specification state implemented by q is nonobvious.
`An execution of the implementation consists of a clean initial implementation state
`q and a finite sequence (cid:190) of implementation steps from F. The terminal implementation
`state is [(cid:190) ].q/, where [(cid:190) ] is the functional composition of the elements of (cid:190) in sequence.
`Corresponding to the implementation execution is an execution of the specification.
`The initial state of the execution of the specification is proj.q/, and the sequence of
`specification steps is M.(cid:190) / (the sequence formed by applying the transaction mapping
`M elementwise to (cid:190) ). The terminal state of the specification sequence is [M.(cid:190) /].proj.q//
`(the functional composition of the specification steps, applied to the initial specification
`state).
`Our verification goal is to show that if the terminal implementation state is clean,
`projecting it onto the specification variables yields the terminal specification state. In
`other words,
`8(cid:190) 2 F⁄
`8q 2 Q;
`:
`Clean.q/ ^ Clean.[(cid:190) ].q// ) proj.[(cid:190) ].q// D [M.(cid:190) /].proj.q//:
`This definition of the verification objective may raise several questions. First, why
`do we only require that clean states map to specification states? Extending the definition
`to unclean states seems to result in more the definition of an aggregation function that we
`describe below. The goal of this definition is to establish a simpler and more basic notion
`of correctness that can be used to justify the soundness of the aggregation method.
`Second, what if an implementation sequence does not terminate in a clean state? If we
`verify the above property for all finite sequences, there will either be an extended sequence
`ending in a clean state where we can check the result, or there is no such sequence. Since
`it is always possible to stop initiating transactions, the latter condition indicates there is
`no way to complete the unfinished transactions (e.g., because of deadlock or livelock).
`In either case, a transaction is not executed incorrectly, although it may fail to complete.
`We have limited our attention to safety properties in this paper, so failure to complete
`transactions is beyond our scope of concerns. (Although we have not done it, we believe
`that a more general treatment of liveness as well as safety would not be difficult.)
`
`(1)
`
`3. The Verification Method
`
`In this section we define aggregation functions and show that the existence of an ag-
`gregation function is a sufficient condition implying property (1) above. An aggregation
`function consists of two parts: a completion function which changes the state as though
`the transaction had completed, and a projection which hides the implementation vari-
`ables, leaving only the specification variables.
`
`
`
`Verification of Cache Coherence Protocols by Aggregation of Distributed Transactions
`
`361
`
`(2)
`
`(3)
`
`An aggregation function aggr maps implementation states to specification states. In
`addition, aggr must satisfy two conditions. First, it must have a commutative property:
`8q 2 Q;
`8N 2 F:
`aggr.N .q// D M.N /.aggr.q//:
`This resembles the basic definition of an abstraction function, so we consider an aggre-
`gation function to be a particular type of abstraction function.
`The second property is that an aggregation function must be the same as projection
`for clean states:
`Clean.q/ ) aggr.q/ D proj.q/:
`8q 2 Q:
`A clean state is one where all transactions are complete. Therefore, the completion
`function in aggr should have no effect, leaving only the projection.
`Existence of an aggregation function implies the verification objective (1). Because
`the aggregation function has the commutative property (2), it is easy to prove by induction
`that
`
`8(cid:190) 2 F⁄
`
`:
`
`8q 2 Q;
`aggr.[(cid:190) ].q// D [M.(cid:190) /].aggr.q//:
`This and property (3) directly imply verification objective (1).
`Once a purported aggregation function has been defined, the user must prove that it
`meets the commutative requirement (2). The proof can be done completely automatically
`or consists of a sequence of standard steps.1 The initial 8q and 8N can be eliminated
`automatically by substituting new symbolic constants throughout (when we do this in
`this presentation, we will not change the name of the quantified variable). This yields a
`subgoal of the form
`.N 2 F / ) aggr.N .q// D M.N /.aggr.q//:
`The number of subgoals is equal to the number of transition functions in the imple-
`mentation. In most cases, the required specification step M.N / is the idle step; indeed,
`the only nonidle step is that which corresponds to the commit step in the implementation.
`We have no global strategy for proving these theorems, although most are very simple.
`The above discussion omits an important point, which is that not all states are
`worthy of consideration. Theorem (2) will generally not hold for some absurd states that
`cannot actually occur during a computation. Hence, it is usually necessary to provide
`an invariant predicate, which characterizes a superset of all the reachable states. If the
`invariant is Inv, Theorem (2) can then be weakened to
`8q 2 Q;
`8N 2 F:
`Inv.q/ ) aggr.N .q// D M.N /.aggr.q//:
`In other words, aggr only needs to commute when q satisfies the Inv. Use of an invariant
`incurs some additional proof obligations. First, we must prove that the initial states of
`the protocol satisfy Inv, and, second, that the implementation transition functions all
`preserve Inv.
`
`(4)
`
`(5)
`
`1 We base this comment on our use of the PVS theorem-prover, but we believe the same basic method
`would be used with others.
`
`
`
`362
`
`S. Park and D. L. Dill
`
`4. FLASH Cache Coherence Protocol
`
`The aggregation method has been used to verify the cache coherence protocol used in
`the Stanford FLASH multiprocessor [20], [15]. This section informally describes that
`protocol.
`The cache coherence protocol is directory-based so that it can support a large number
`of distributed processing nodes. Each cache line-sized block in memory is associated
`with a directory header which keeps information about the line. For a memory line, the
`node on which that piece of memory is physically located is called the home; the other
`nodes are called remote. The home maintains all the information about memory lines in
`its main memory in the corresponding directory headers.
`The system consists of a set of nodes, each of which contains a processor, caches, and
`a portion of the global memory. The distributed nodes communicate using asynchronous
`messages through a point-to-point network. The state of a cached copy is in either invalid,
`shared (readable), or exclusive (readable and writable).
`
`4.1.
`
`Informal Description of the Protocol
`
`If a read miss occurs in a processor, the corresponding node sends out a GET request to the
`home (this step is not necessary if the requesting processor is in the home). Receiving the
`GET request, the home consults the directory corresponding to the memory line to decide
`what action the home should take. If the line is pending, meaning that another request
`is already being processed, the home sends a NAK (negative acknowledgment) to the
`requesting node. If the directory indicates there is a dirty copy in a remote node, then the
`home forwards the GET to that node. Otherwise, the home grants the request by sending
`a PUT to the requesting node and updates the directory properly. When the requesting
`node receives a PUT reply, which returns the requested memory line, the processor sets
`its cache state to shared and proceeds to read.
`For a write miss, the corresponding node sends out a GETX request to the home.
`Receiving the GETX request, the home consults the directory. If the line is pending,
`the home sends a NAK to the requesting node. If the directory indicates there is a dirty
`copy in a third node, then the home forwards the GETX to that node. If the directory
`indicates there are shared copies of the memory line in other nodes, the home sends
`INVs (invalidations) to those nodes. At this point, the protocol depends on which of two
`modes the multiprocessor is running in: EAGER or DELAYED. In EAGER mode, the home
`grants the request by sending a PUTX to the requesting node; in DELAYED mode, this
`grant is deferred until all the invalidation acknowledgments are received by the home. If
`there are no shared copies, the home sends a PUTX to the requesting node and updates
`the directory properly. When the requesting node receives a PUTX reply which returns
`an exclusive copy of the requested memory line, the processor sets its cache state to
`exclusive and proceeds to write.
`During the read miss transaction, an operation called a sharing write-back is neces-
`sary in the following “three hop” case. This occurs when a remote processor in node R1
`needs a shared copy of a memory line, an exclusive copy of which is in another remote
`node R2. When the GET request from R1 arrives at the home H, the home consults the
`directory to find that the line is dirty in R2. Then H forwards the GET to R2 with the
`
`
`
`Verification of Cache Coherence Protocols by Aggregation of Distributed Transactions
`
`363
`
`Fig. 1. Processing of a read miss (a GET request) in the FLASH protocol.
`
`source of the message faked as R1 instead of H. When R2 receives the forwarded GET,
`the processor sets its copy to shared state and issues a PUT to R1. Unfortunately, the
`directory in H does not have R1 on its sharer list yet and the main memory does not have
`an updated copy when the cached line is in the shared state. The solution is for R2 to
`issue an SWB (sharing write-back) conveying the dirty data to H with the source faked
`as R1. When H receives this message, it writes the data back to main memory and puts
`R1 on the sharer list. Figure 1 shows the processing of a read miss in the protocol.
`When a remote node receives an INV, it invalidates its copy and then sends an
`acknowledgment to the home. There is a subtle case with an invalidation. A processor
`which is waiting for a PUT reply may get an INV before it gets the shared copy of the
`memory line, which is to be invalidated if the PUT reply is delayed. In such a case, the
`requested line is marked as invalidated, and the PUT reply is ignored when it arrives.
`A valid cache line may be replaced to accommodate other memory lines. A shared
`copy is replaced by issuing a replacement hint to the home, which removes the remote
`from its sharers list. An exclusive copy is written back to main memory by a WB (write-
`back) request to the home. Receiving the WB, the home updates the line in main memory
`and the directory properly.
`
`4.2. Detailed Description of the Protocol
`
`Each cache line-sized block in main memory is associated with a directory header which
`keeps information about the line. The directory header consists of several Boolean flags:
`Local, Dirty, Pending, Head Valid, and List; a pointer to another node Head Pointer; a
`
`
`
`364
`
`S. Park and D. L. Dill
`
`set of pointers Sharer List;2 and the number of sharers in Real Pointers. The Local bit
`indicates if the local processor (in the home) contains a cached copy of the line in either
`shared or exclusive state. The Dirty bit is set if the home thinks that there is a dirty copy of
`the line in the system. The Pending bit is set if the current request for the memory line is
`being processed by a third node. The Head Valid bit indicates whether the Head Pointer
`contains a valid pointer to a node. The Head Pointer entry is simply a cache pointer that
`is stored in the directory header as an optimization. It keeps a pointer to a remote cache
`with a dirty copy if there is one, or one of the nodes with a shared copy. The List bit
`indicates whether Sharer List contains one or more pointers. Sharer List is represented
`abstractly as a set of pointers to the nodes that have a shared copy of the memory line.
`Real Pointers contains a count of the number of sharers in the Sharer List. This count
`excludes the Head Pointer and is mainly used to count invalidation acknowledgments.
`The FLASH protocol consists of a set of rules which are called handlers. Each
`handler is prefixed with NI (network interface) or PI (processor interface) to indicate
`where the requests are generated from. PI handlers are initiated by a requesting processor
`and NI handlers are initiated by a message from the network. The additional notation
`“Local” or “Remote” indicates whether the processing node is the home of the requested
`memory address or not. In the following, some handlers of the protocol for processing
`a read miss are listed; the rest of the handlers are shown in the Appendix.
`† PI.Local.Get: this handler describes actions of the home when the local processor
`needs a shared copy of a memory line. If Pending,3 the local processor is NAKed.
`Otherwise, if Dirty, the home sends a GET request to Head Pointer and Pending is
`set. Otherwise, the data in main memory is copied into the local cache (in shared
`state) and Local is set.
`† NI.Remote.Get: this handler describes actions of a remote node receiving a GET
`request. If the cached data in the remote node is in exclusive state, it is changed
`to shared and the node sends a PUT reply to the source of the GET (and also SWB
`to the home if the source is not the home). Otherwise, the node sends an NAK to
`the source and an NAKC (NAK clear) to the home.
`† NI.Local.Put: this handler processes a PUT reply to the home. Local is set, Dirty
`and Pending are reset, and the shared copy is put into the local cache.
`† NI.SharingWriteback: this handler describes actions of the home receiving an
`SWB. Dirty and Pending are reset, List is set, Real Pointers is incremented, the
`source is added to Sharer List, and the data is written back into main memory.
`
`5. Verification by Aggregating Distributed Transactions
`
`Using the aggregation method, we have formally verified the protocol at the level of
`its formal description [35]. The protocol consisting of more than a hundred different
`implementation steps has been reduced to a model with only six kinds of atomic transac-
`
`2 Actually, the set of pointers is implemented as a linked list for sharers (within the home) by dynamic
`pointer allocation [39].
`3 That is, the Pending bit is set in the directory.
`
`
`
`Verification of Cache Coherence Protocols by Aggregation of Distributed Transactions
`
`365
`
`tions. Based on the reduced atomic behavior, it is very easy to reason about the protocol,
`checking safety properties and data consistency of cached copies.
`In the following, we illustrate how the protocol is reduced to an atomic model by an
`aggregation function. The detailed proofs are confirmed by a theorem-prover and some
`techniques to simplify the proof are presented.
`
`5.1. Extracting Reduced Model of the Protocol
`
`Verification requires two descriptions of the same behavior: an implementation and a
`specification. Sometimes, there is an a priori specification as in the memory model
`verification in the next section. However, in most practical instances, there is only an
`implementation. In such cases, we extract a reduced model of the implementation using
`aggregation. The reduced model concisely captures the behavior of the implementation
`and serves as a specification.
`Recall that to use the aggregation method, we first decide which state variables
`should be considered specification variables. In cache coherence protocols, the consis-
`tency of multiple copies of a memory line is a function of the values and states of cached
`copies, and the corresponding value in main memory. Therefore, the specification vari-
`ables should be the state variables representing the data and states of cached copies and
`the data in main memory.
`We construct a reduced model of the protocol, which we use for a specification. The
`procedure is to trace through a transaction: (1) concatenating the implementation steps,
`(2) simplifying by substituting values forw