`Between the Pruiies
`
`decode transactions, may handle conflicts, may interface with the snoop filter 204,
`and may process transactions. In one embodiment, the protocol logic 202 may
`comprise distributed protocol logic (DPL) 210 and centralized protocol logic (CPL)
`212. In one embodiment, each p01i 200 has an associated DPL 210 to locally
`implement p01i ions of the protocol logic 202 for the respective port 200. In
`prui iculru·, the DPL 2 10 may comprise decode logic to decode incoming
`transactions and may comprise one or more buffers or queues to store data and/or
`other inf01mation associated with incoming and outgoing transactions while being
`processed by the protocol logic 202 and/or awaiting responses from cache nodes
`102, 104. ")
`
`•
`
`[4:61-5: 13] ("The CPL 212 may provide global functions for processing
`transactions regru·dless of which p01i 200 the transaction originated. For exrunple,
`the CPL 212 for each p01i 200 may check for-transaction conflicts, may prevent
`transaction struvation, and may maintain data coherency in accordance with a
`snoop-based protocol. In prui iculru·, the CPL 212 in response to processing a read,
`snoop, or invalidate request may check the state of the line of the request in the
`caches 114 and may issue requests to one or more cache nodes 102, 104 based upon
`the state of the line. In one embodiment, the CPL 2 12 may use the coherency data
`of the snoop filter 204 to reduce the number of requests sent to the cache nodes 102,
`104 and may update coherency data in the snoop filter 204 based upon the
`transaction type and snoop responses received from the cache nodes 102, 104. The
`CPL 2 12 may fmiher comprise logic to bypass the snoop filter 204 and to maintain
`data coherency without using the coherency data of the snoop filter 204. Moreover,
`the CPL 212 may be divided into fom interleaves 2 14, and a separate CPL
`interleave 2 14 may be associated with each of the fom SF interleaves 208.").
`
`In addition, the claimed interconnection controller comprises protocol engines for
`
`processing transactions in accordance with a cache coherence protocol. See, e.g. , '206 patent
`
`claim 1.4. A cache coherence protocol facilitates cache coherency, thus the claimed
`
`interconnection controller is necessru·ily operable to facilitate cache coherency in the computer
`
`system.
`
`Fmthe1more, to the extent not disclosed, a person of ordinruy skill in the rui at the time of
`
`the alleged invention of the Asseiied Claims would have been motivated to modify the prior rui
`
`references identified in Section III and Exhibits A-1- A-9; B-1- B-19; C-1- C-8; D-1- D-14;
`
`and Exhibits E-1- E-14 to include an interconnection controller operable to facilitate cache
`
`coherency across the computer system, at least under Mem01y Integrity's appru·ent infringement
`
`- 150 -
`
`
`
`Public Version - Confidential Infonnation Redacted and Confidentiality Designation Removed Per Agreement
`Between the Pruties
`
`theories. See, e.g., Exhibits D-1- D-14, claim 15.1. For exrunple, it would have been obvious for
`
`an interconnection controller that processes coherence transactions to process those tr·ansactions so
`
`as to facilitate cache coherency across the computer system as described above with respect to the
`
`"cache coherence controller."
`
`"Cache Coherence Controller" "Constructed to Act As An Aggregate
`8.
`Remote Cache"
`
`Some of the Asse1ted Claims ru·e directed to a "cache coherence controller" "constm cted to
`
`act as an aggregate remote cache." For example, claim 18.1 of the '409 patent recites "the cache
`
`coherence contr·oller is constmcted to act as an aggregate remote cache." See also, e.g., '409
`
`patent claim 4 7.1; and '636 patent claim 29 .1. At least lmder Mem01y Integrity's appru·ent
`
`infringement theories, cache coherence contr·ollers "constructed to act as an aggregate remote
`
`cache" were well-known in the rut before the priority dates of the Asse1ted Patents. See, e.g. ,
`
`Exhibits A-1- A-9, claims 18.1 and 47.1; and Exhibits B-1- B-19, claim 29.1. The following
`
`discussion ftnther shows that, at least under Mem01y Integrity's appru·ent infringement the01y , it
`
`was well known and conventional to implement cache coherence controllers "constructed to act as
`
`an aggregate remote cache" in multiprocessor systems.
`
`At least under Mem01y Integrity's appru·ent infringement theories, there are many
`
`examples of prior rut references that disclose implementing cache coherence contr·ollers
`
`"constm cted to act as an aggregate remote cache." Examples of prior rut references that disclose
`
`and ftnt her demonstrate that such was well known include:
`
`•
`
`"The Direct01y-Based Cache Coherence Protocol for the DASH Multiprocessor."
`Lenoski (1990): See, e.g. , p. 1, pru·a. 2 ("We ru·e cunently building a prototype of a
`scalable shru·ed mem01y multiprocessor. The system provides high processor
`perfonnance and scalability though the use of coherent caches and a direct01y-based
`coherence protocol. The high-level organization of the prototype, called DASH
`(Direct01y Architecture for SHared mem01y) [1 7]. is shown in Figure 1. The
`ru·chitecture consists of a number of processing nodes connected through a
`high-bandwidth low-latency interconnection network. The physical mem01y in the
`
`- 151 -
`
`
`
`Public Version - Confidential Infonnation Redacted and Confidentiality Designation Removed Per Agreement
`Between the Pruiies
`
`machine is disu·ibuted among the nodes of the multiprocessor, with all mem01y
`accessible to each node. Each processing node, or cluster, consists of a small number of
`high-perfonnance processors with their individual caches, a p01iion of the
`shru·ed-mem01y, a common cache for pending remote accesses, and a direct01y
`controller interfacing the cluster to the network. A bus-based snoopy scheme is used to
`keep caches coherent within a cluster, while inter-node cache consistency is maintained
`using a distributed direct01y-based coherence protocol.")
`
`Figure 1: Gencnl an:h.itecture of DASH.
`
`Lenoski (1990), Figure 1
`
`• Lenoski (1990): See, e.g. , p. 1, para. 4 ("In DASH, each processing node has a
`direct01y mem01y conesponding to its p01iion of the shru·ed physical mem01y . For each
`mem01y block, the direct01y mem01y stores the identities of all remote nodes caching
`that block. Using the direct01y mem01y , a node writing a location can send
`point-to-point invalidation or update messages to those processors that ru·e actually
`caching that block.")
`
`• Lenoski (1990): See, e.g. , p. 3, pru·a. 5 ("The direct01y conu·oller (DC) contains the
`direct01y mem01y conesponding to the p01iion of main mem01y present within the
`cluster. It also initiates out-bound network requests and replies. The pseudo-CPU
`(PCPU) is responsible for buffering incoming requests and issuing such requests on the
`cluster bus. It mimics a CPU on this bus on behalf of remote processors except that
`responses from the bus ru·e sent out by the direct01y conu·oller. The reply conu·oller
`(RC) u·acks outstanding requests made by the local processors and receives and buffers
`the conesponding replies from remote clusters. It acts as mem01y when the local
`processors ru·e allowed to retry their remote requests. The network interface and the
`local p01iion of the network itself reside on the direct01y card. The interconnection
`network consists of a pair of meshes . One mesh is dedicated to the request messages
`while the other handles replies. These meshes utilize wonnhole routing [9] to minimize
`latency. Finally, the board contains hru·dwru·e monitoring logic and miscellaneous
`control and status registers. The monitoring logic samples a variety of direct01y board
`and bus events from which usage and perf01mance statistics can be derived.")
`
`- 152 -
`
`
`
`Public Version - Confidential Infonnation Redacted and Confidentiality Designation Removed Per Agreement
`Between the Pruiies
`
`• Lenoski (I990): See, e.g. , p. 4.5, pru·a. 4 ("In the protocol, invalidation acknowledges
`are sent to the local cluster that initiated the mem01y request. An altemative would be
`for the home cluster to gather the acknowledges, and, when all have been received,
`send a message to the requesting cluster indicating that the request has been completed.
`We chose the fonner because it reduces the waiting time for completion of a
`subsequent fence operation by the requesting cluster and reduces the potential of a hot
`spot developing at the mem01y .")
`
`•
`
`"The Direct01y-Based Cache Coherence Protocol for the DASH Multiprocessor."
`Lenoski (I992): See, e.g. , p. I 50 ("A DASH system consists of a number of modified
`4D/240 systems that have been supplemented with a direct01y controller boru·d. This
`direct01y conu·oller boru·d is responsible for maintaining the cache coherence across the
`nodes and serving as the interface to the interconnection network.")
`
`Figure 2: Block diagram of sample 2 x 2 DASH system.
`
`Lenoski (I992), Figure 2
`
`• U.S. Patent No. 6,055,6IO to Smith: See, e.g., II :44-55 ("A flow chrui of the basic
`method MI of handling a data request is flow chruied in FIG. 3. At step SI , processor
`PII issues a read request of data stored in main mem01y MMO. At step S2, caches
`CIO-C13 of requester cell MCI ru·e exrunined to detennine if the request can be met
`locally. First, associated cache C II is checked. A hit allows the request to be met
`locally. A miss refers the request to the requestor's coherency conu·oller CCI .
`Coherency controller CCI initiates a local snoop while refening the request to owner
`cell MCO. If the snoop results in a hit, the request can be met locally. If the data is held
`privately by another local processor, e.g., processor PI2, coherency conu·oller requests
`that the data be made public so that the request can be met. Only if the local snoop
`misses is involvement of the owner cell MCO required.")
`
`- 153 -
`
`
`
`Public Version - Confidential Infonnation Redacted and Confidentiality Designation Removed Per Agreement
`Between the Pruiies
`
`• Smith: See, e.g., 11:56-63 ("At step S3, coherency controller CCO of owner cell MCO
`initiates a local snoop of its caches, accesses fast direct01y FDO, and initiates access of
`main mem01y MMO. Coherency conu·oller CCO detennines whether or not the
`fast-direct01y data calls for a recall and whether the direct01y cache data is consistent
`with the local snoop results. If the direct01y data is consistent with the snoop results and
`if a recall is indicated, it is initiated at step S4.")
`
`• Smith: See, e.g., 12:3-8 ("Once the recall process is complete, the requested data is
`u·ansfen ed to the requester cell MCl , coherency conu·oller CCI, cache Cll, and
`processor Pll, at step S6. State inf01m ation in cache Cll, fast direct01y FDO, and the
`coherency direct01y of main mem01y MMO is updated as necessa1y. This completes
`method Ml.")
`
`PROCESSOR REQUESTS DATA READ
`a
`
`1
`
`CHECK IF REQUEST
`CAN BE MET 8Y REQUESTOR CELL
`££
`
`1
`
`CHECK OWNER DIRECTORY CACHE AND SNOOP
`OWNER CACHES
`ll
`
`1
`
`ISSUE PREDICTIVE RECALLS
`Si
`
`1
`
`CHECK PREDICTION AGAINST MAIN DIRECTORY
`TAKE CORRECTIVE ACTION AND ISSUE NEW
`RECALL IF NECESSARY
`S1
`
`1
`
`PROVIDE DATA TO REQUESTOR
`& UPDATE STATES & FAST DIRECTORY
`~
`
`J"igure 3
`
`Smith, Figure 3
`
`• U.S. Patent No, 6,085,295 to Ekanadham: See, e.g. , 2:25-33 ("In a node where a
`remote line is brought into the cache of a processor, but not into the node1s mem01y, the
`adapter acts as a proxy mem01y representing the remote mem01y that the line is
`mapped onto. More specifically, when a mem01y command is issued from a local
`processor to a remote mem01y, the mem01y command is directed to the adapter which
`is responsible for insuring that the command is executed at that remote mem01y.")
`
`- 154 -
`
`
`
`Public Version - Confidential Infonnation Redacted and Confidentiality Designation Removed Per Agreement
`Between the Pruiies
`
`• Ekanadham: See, e.g. , 3:37-45 ("The prefened embodiment of our system that is based
`on a network of switch-based SMP nodes with an adapter attached to each node. FIG. 1
`illusu·ates a high-level diagram of such a multiprocessing system. Each node has a
`plurality of processors P1, P2, . .. , PN interconnected to each other by a switch (SW).
`The switch also interconnects the mem01y modules M1, M2, . .. , MN and adapters A.
`The nodes in tum, ru·e connected to each other through a network as shown.")
`
`• Ekanadham: See, e.g., 3:49-56 ("The adapter connects to the switch and plays the role
`of either a mem01y or a processor. The behavior of the adapter is different for different
`mem01y lines. When a line is homed at the local mem01y of the node, the adapter
`behaves as a proxy processor for that line. When a line is homed at the mem01y of a
`remote node, the adapter behaves as a proxy mem01y for that line. These roles ru·e
`illusu·ated in FIGS. 3A-3C and are elaborated fmiher below.")
`
`• Ekanadham: See, e.g. , 4:7-24 ("In a node in which a line is homed in the local mem01y ,
`the adapter plays the role of a proxy processor representing the accesses to the line
`made by the processors in other nodes of the system. In this role, the adapter maintains
`a state for the line and the list of all nodes sharing that line. The state can be I
`(indicating that no other node has this line), E (indicating that some other node has
`exclusive copy of this line) or S (indicating that this line is shru·ed by this and other
`nodes). As a proxy processor, the adapter receives requests from other adapters and
`perfonns the reads and writes in this node on their behalf. Whenever a local processor
`requires exclusive access to the line while it is in shared state, it cornmlmicates with
`other adapters and invalidates the line in all other nodes. When another node requests
`for exclusive copy of the line. The adapter only invalidates the copies in all other nodes,
`but also requests the local memory to grant the exclusive access. The memory
`controller u·eats the adapter as another processor.")
`
`• Ekanadham: See, e.g. , 4:26-41 ("In a node in which a line is homed at a remote
`mem01y , the adapter acts as a proxy memory. It captures all the u·ansactions for the
`conesponding address and nms the memory protocol. In this role, the adapter
`maintains a state for the line and the list of local caches shru·ing the line. The state can
`be I (indicating that no local cache this line), E (indicating that some local cache has
`exclusive copy of this line) or S (indicating that this line is shru·ed by this and other
`nodes). As a proxy memory , the adapter responds to all requests to the line and obtains
`the contents of the line from the remote node (where that line is backed by memory)
`and supplies the contents to the local caches. It performs the usual coherence control
`operations in the node and coordinates with other adapters. In order to maintain global
`coherence, it may have to issue some bus u·ansactions as a master, as illustrated later.")
`
`- 155 -
`
`
`
`
`
`Public Version - Confidential Infonnation Redacted and Confidentiality Designation Removed Per Agreement
`Between the Pruiies
`
`cache line which is owned by a processor within node 420, and sends the data line to
`the global interface 415 of the requesting node 410 via global interconnect 450.")
`
`• Loewenstein: See, e.g., 11 :60-64 ("Upon receiving the data line, global interface 415
`f01wru·ds the data line to cache 413a, which provides the data to the requesting
`processor 411 a.")
`
`Fig. 4
`
`Loewenstein, Figure 4
`
`• U.S. Patent No, 6,751 ,721 to Webb: See, e.g., Absu·act ("A direct01y-based
`multiprocessor cache control scheme for distributing invalidate messages to change the
`state of shru·ed data in a computer system. The plurality of processors ru·e grouped into
`a plurality of clusters. A direct01y conu·oller u·acks copies of shru·ed data sent to
`processors in the clusters. Upon receiving an exclusive request from a processor
`requesting permission to modify a shru·ed copy of the data, the direct01y conu·oller
`generates invalidate messages requesting that other processors sharing the same data
`invalidate that data. These invalidate messages ru·e sent via a point-to-point
`u·ansmission only to master processors in clusters actually containing a shru·ed copy of
`the data. Upon receiving the invalidate message, the master processors broadcast the
`invalidate message in an ordered fan-in/fan-out process to each processor in the cluster.
`All processors within the cluster invalidate a local copy of the shared data if it exists
`and once the master processor receives acknowledgements from all processors in the
`cluster, the master processor sends an invalidate acknowledgment message to the
`processor that originally requested the exclusive rights to the shru·ed data. The cache
`coherency is scalable and may be implemented using the hybrid
`
`- 157 -
`
`
`
`Public Version - Confidential Infonnation Redacted and Confidentiality Designation Removed Per Agreement
`Between the Pruiies
`
`point-to-point/broadcast scheme or a conventional point-to-point only direct01y-based
`invalidate scheme.")
`
`As illustrated by the prior rui references above, it was well known before the priority dates
`
`of the Asse1ied Patents to implement a "cache coherence controller" "constm cted to act as an
`
`aggregate remote cache" in multiprocessor systems, at least under Mem01y Integrity's apparent
`
`infringement theories. Indeed, a person of ordinruy skill would have been motivated to implement
`
`a "cache coherence controller" "constm cted to act as an aggregate remote cache" in a
`
`multiprocessor system as described below:
`
`• Ekanadham : See, e.g. , 1:23-35 ("Technology considerations limit the size of an SMP
`node to a small number of processors. A method for building a shru·ed-mem01y
`multiprocessor with a larger number of processors is to connect a number of SMP
`nodes with a network, and provide an adapter to extend the SMP's mem01y across the
`SMP nodes (see FIG. 1). Existing adapter designs plug into the mem01y bus of
`bus-based SMP nodes and collectively provide shru·ed mem01y across the system, so
`that any processor in any node can access any location in any mem01y module in the
`system. Resom ces within a node ru·e te1med local and resomces on other nodes ru·e
`ten ned remote.")
`
`• Ekanadham : See, e.g. , 2:37-41 ("By apperu·ing as either a local processor or a local
`mem01y, the adapter uses the local SMP coherence protocol within a node to
`accomplish the above tasks, without any changes to the mem01y controllers.")
`
`• Loewenstein: See, e.g., 5: 1-8 ("Since global interface 115 is also responsible for
`maintaining global cache coherency, global interface 115 includes a hru·dwru·e and/or
`softwru·e implemented cache-coherency mechanism for maintaining coherency
`between the respective caches and main memories of nodes 110, 120, ... 180. Cache
`coherency is essential in order for the system 100 to properly execute shru·ed-mem01y
`programs conectly.")
`
`Accordingly, it would have been obvious to implement a "cache coherence controller"
`
`"constm cted to act as an aggregate remote cache" in a multiprocessor system having multiple
`
`clusters of processors while maintaining coherency with a reasonable expectation of success.
`
`Fmthe1more, it would have been obvious to implement a "cache coherence controller"
`
`"constm cted to act as an aggregate remote cache" because such a modification would simply be
`
`the use of a known technique (e.g., a cache coherence controller "constm cted to act as an aggregate
`
`- 158-
`
`
`
`Public Version - Confidential Infonnation Redacted and Confidentiality Designation Removed Per Agreement
`Between the Pruiies
`
`remote cache") to improve similru· devices (e.g. , multiprocessor systems) in the same way (e.g.,
`
`improve perfonnance and scalability while maintaining coherency).
`
`"Cache Coherence Controller" "Constructed to Act As A Probing Agent
`
`9.
`Pair"
`
`Some of the Asse1ied Claims ru·e directed to a "cache coherence controller" "constm cted to
`
`act as a probing agent pair." For exrunple, claim 19.1 of the '409 patent recites "the cache
`
`coherence controller is constmcted to act as a probing agent pair." See also, e.g. , '409 patent claim
`
`48.1; and '636 patent claim 30.1. At least lmder Mem01y Integrity's apparent infringement
`
`theories, cache coherence controllers "constmcted to act as a probing agent pair" were well-known
`
`in the rui before the priority dates of the Asse1ied Patents. See, e.g., Exhibits A-1- A-9, claims
`
`19.1 and 48.1; and Exhibits B-1- B-19, claim 30.1. The following discussion fmi her shows that,
`
`at leas t lmder Mem01y Integrity's appru·ent infringement the01y, it was well known and
`
`conventional to implement cache coherence controllers "constmcted to act as a probing agent pair"
`
`in multiprocessor systems.
`
`At least under Mem01y Integrity's appru·ent infringement theories, there are many
`
`examples of prior rui references that disclose implementing cache coherence controllers
`
`"constm cted to act as a probing agent pair." Examples of prior rut references that disclose and
`
`fmiher demonstrate that such was well known include:
`
`•
`
`"The Direct01y-Based Cache Coherence Protocol for the DASH Multiprocessor."
`Lenoski (1990): See, e.g. , p. 1, pru·a. 2 ("We ru·e cunently building a prototype of a
`scalable shared mem01y multiprocessor. The system provides high processor
`perfonnance and scalability though the use of coherent caches and a direct01y-based
`coherence protocol. The high-level organization of the prototype, called DASH
`(Direct01y Architecture for SHru·ed mem01y) [1 7]. is shown in Figure 1. The
`ru·chitecture consists of a number of processing nodes connected through a
`high-bandwidth low-latency interconnection network. The physical mem01y in the
`machine is distributed among the nodes of the multiprocessor, with all mem01y
`accessible to each node. Each processing node, or cluster, consists of a small number of
`high-perfonnance processors with their individual caches, a p01iion of the
`shru·ed-memOiy, a common cache for pending remote accesses, and a direct01y
`
`- 159 -
`
`
`
`Public Version - Confidential Infonnation Redacted and Confidentiality Designation Removed Per Agreement
`Between the Pruiies
`
`controller interfacing the cluster to the network. A bus-based snoopy scheme is used to
`keep caches coherent within a cluster, while inter-node cache consistency is maintained
`using a distributed direct01y-based coherence protocol.")
`
`Figure l: General an:h.itecture of DASH.
`
`Lenoski (1990), Figure 1
`
`• Lenoski (1990): See, e.g. , p. 1, pru·a. 4 ("In DASH, each processing node has a
`direct01y mem01y conesponding to its p01iion of the shru·ed physical mem01y . For each
`mem01y block, the direct01y mem01y stores the identities of all remote nodes caching
`that block. Using the direct01y mem01y , a node writing a location can send
`point-to-point invalidation or update messages to those processors that ru·e actually
`caching that block.")
`
`• Lenoski (1990): See, e.g. , p. 3, pru·a. 5 ("The direct01y conu·oller (DC) contains the
`direct01y mem01y conesponding to the p01iion of main mem01y present within the
`cluster. It also initiates out-bound network requests and replies. The pseudo-CPU
`(PCPU) is responsible for buffering incoming requests and issuing such requests on the
`cluster bus. It mimics a CPU on this bus on behalf of remote processors except that
`responses from the bus are sent out by the direct01y conu·oller. The reply conu·oller
`(RC) u·acks outstanding requests made by the local processors and receives and buffers
`the conesponding replies from remote clusters. It acts as mem01y when the local
`processors ru·e allowed to retry their remote requests. The network interface and the
`local p01iion of the network itself reside on the direct01y card. The interconnection
`network consists of a pair of meshes . One mesh is dedicated to the request messages
`while the other handles replies. These meshes utilize wonnhole routing [9] to minimize
`latency. Finally, the board contains hru·dwru·e monitoring logic and miscellaneous
`control and status registers. The monitoring logic samples a vru·iety of direct01y boru·d
`and bus events from which usage and perf01mance statistics can be derived.")
`
`• Lenoski (1990): See, e.g. , p. 4.5, para. 4 ("In the protocol, invalidation acknowledges
`are sent to the local cluster that initiated the mem01y request. An altemative would be
`for the home cluster to gather the acknowledges, and, when all have been received,
`
`- 160 -
`
`
`
`Public Version - Confidential Infonnation Redacted and Confidentiality Designation Removed Per Agreement
`Between the Pruiies
`
`send a message to the requesting cluster indicating that the request has been completed.
`We chose the fonner because it reduces the waiting time for completion of a
`subsequent fence operation by the requesting cluster and reduces the potential of a hot
`spot developing at the mem01y .")
`
`•
`
`"The Direct01y-Based Cache Coherence Protocol for the DASH Multiprocessor."
`Lenoski (I 992): See, e.g. , p. I 50 ("A DASH system consists of a number of modified
`4D/240 systems that have been supplemented with a direct01y controller boru·d. This
`direct01y conu·oller board is responsible for maintaining the cache coherence across the
`nodes and serving as the interface to the interconnection network.")
`
`Figure 2: Block diagram of sample 2 x 2 DASH system.
`
`Lenoski (I 992), Figure 2
`
`• U.S. Patent No. 6,055,6IO to Smith: See, e.g. , II :44-55 ("A flow chrui of the basic
`method MI of handling a data request is flow chruied in FIG. 3. At step SI , processor
`PII issues a read request of data stored in main mem01y MMO. At step S2, caches
`CIO-C13 of requester cell MCI ru·e exrunined to detennine if the request can be met
`locally. First, associated cache C II is checked. A hit allows the request to be met
`locally. A miss refers the request to the requestor's coherency conu·oller CCI .
`Coherency controller CCI initiates a local snoop while refening the request to owner
`cell MCO. If the snoop results in a hit, the request can be met locally. If the data is held
`privately by another local processor, e.g., processor PI2, coherency conu·oller requests
`that the data be made public so that the request can be met. Only if the local snoop
`misses is involvement of the owner cell MCO required.")
`
`• Smith: See, e.g. , II :56-63 ("At step S3, coherency controller CCO of owner cell MCO
`initiates a local snoop of its caches, accesses fast direct01y FDO, and initiates access of
`main mem01y MMO. Coherency conu·oller CCO detennines whether or not the
`
`- 161 -
`
`
`
`Public Version - Confidential Infonnation Redacted and Confidentiality Designation Removed Per Agreement
`Between the Pruiies
`
`fast-direct01y data calls for a recall and whether the direct01y cache data is consistent
`with the local snoop results. If the direct01y data is consistent with the snoop results and
`if a recall is indicated, it is initiated at step S4.")
`
`• Smith: See, e.g. , 12:3-8 ("Once the recall process is complete, the requested data is
`transfened to the requester cell MCl , coherency conu·oller CCI, cache Cll, and
`processor Pll, at step S6. State inf01mation in cache Cll, fast direct01y FDO, and the
`coherency direct01y of main mem01y MMO is updated as necessa1y. This completes
`method Ml.")
`
`PROCESSOR REQUESTS DATA READ
`a
`
`1
`
`CHECK IF REQUEST
`CAN BE MET BY REQUESTOR CELL
`~
`
`l
`
`CHECK OWNER DIRECTORY CACHE AND SNOOP
`OWNER CACHES
`~
`
`ISSUE PRWICTIVE RECALLS
`Si
`
`l
`1
`
`CH£CK PREDICTION AGAINST MAIN DIRECTORY
`TAKE CORRECTIVE ACTION AND ISSUE NEW
`RECALL IF NECESSARY
`S1
`
`1
`
`PROVIDE DATA TO REQUESTOR
`& UPDATE STATES & FAST DIRECTORY
`~
`
`J"igure 3
`
`Smith, Figure 3
`
`• U.S. Patent No, 6,085,295 to Ekanadham: See, e.g., 2:25-33 ("In a node where a
`remote line is brought into the cache of a processor, but not into the node's mem01y , the
`adapter acts as a proxy mem01y representing the remote mem01y that the line is
`mapped onto. More specifically, when a mem01y command is issued from a local
`processor to a remote mem01y , the mem01y command is directed to the adapter which
`is responsible for insuring that the command is executed at that remote mem01y .")
`
`• Ekanadham: See, e.g., 3:37-45 ("The prefened embodiment of our system that is based
`on a network of switch-based SMP nodes with an adapter attached to each node. FIG. 1
`illusu·ates a high-level diagram of such a multiprocessing system. Each node has a
`plurality of processors Pl, P2, . .. , PN interconnected to each other by a switch (SW).
`
`- 162 -
`
`
`
`Public Version - Confidential Infonnation Redacted and Confidentiality Designation Removed Per Agreement
`Between the Pruiies
`
`The switch also interconnects the mem01y modules Ml, M2, . .. , MN and adapters A.
`The nodes in tum, ru·e connected to each other through a network as shown.")
`
`• Ekanadham: See, e.g., 3:49-56 ("The adapter connects to the switch and plays the role
`of either a mem01y or a processor. The behavior of the adapter is different for different
`mem01y lines. When a line is homed at the local mem01y of the node, the adapter
`behaves as a proxy processor for that line. When a line is homed at the mem01y of a
`remote node, the adapter behaves as a proxy mem01y for that line. These roles ru·e
`illusu·ated in FIGS. 3A-3C and are elaborated fmiher below.")
`
`• Ekanadham: See, e.g., 4:7-24 ("In a node in which a line is homed in the local mem01y ,
`the adapter plays the role of a proxy processor representing the accesses to the line
`made by the processors in other nodes of the system. In this role, the adapter maintains
`a state for the line and the list of all nodes sharing that line. The state can be I
`(indicating that no other node has this line), E (indicating that some other node has
`exclusive copy of this line) or S (indicating that this line is shru·ed by this and other
`nodes). As a proxy processor, the adapter receives requests from other adapters and
`perfonns the reads and writes in this node on their behalf. Whenever a local processor
`requires exclusive access to the line while it is in shared state, it cornmlmicates with
`other adapters and invalidates the line in all other nodes. When another node requests
`for exclusive copy of the line. The adapter only invalidates the copies in all other nodes,
`but also requests the local memory to grant the exclusive access. The memory
`controller u·eats the adapter as another processor.")
`
`• Ekanadham: See, e.g. , 4:26-41 ("In a node in which a line is homed at a remote
`mem01y , the adapter acts as a proxy memory . It captures all the u·ansactions for the
`conesponding address and nms the memory protocol. In this role, the adapter
`maintains a state for the line and the list of local caches shru·ing the line. The state can
`be I (indicating that no local cache this line), E (indicating that some local cache has
`exclusive copy of this line) or S (indicating that this line is shru·ed by this and other
`nodes). As a proxy memory , the adapter responds to all requests to the line and obtains
`the contents of the line from the remote node (where that line is backed by memory)
`and supplies the contents to the local caches. It performs the usual coherence control
`operations in the node and coordinates with other adapters. In order to maintain global
`coherence, it may have to issue some bus u·ansactions as a master, as illustrated later.")
`
`- 163 -
`
`
`
`
`
`Public Version - Confidential Infonnation Redacted and Confidentiality Designation Removed Per Agreement
`Between the Pruiies
`
`cache line which is owned by a processor within node 420, and sends the data line to
`the global interface 415 of the requesting node 410 via global interconnect 450.")
`
`• Loewenstein: See, e.g., 11 :60-64 ("Upon receiving the data line, global interface 415
`f01wru·ds the data line to cache 413a, which provides the data to the requesting
`processor 411 a.")
`
`Fig. 4
`
`Loewenstein, Figure 4
`
`• U.S. Patent No, 6,751 ,721 to Webb: See, e.g., Absu·act ("A direct01y-based
`multiprocessor cache conu·ol scheme for distributing invalidate messages to change the
`state of shru·ed data in a computer system. The plurality of processors ru·e grouped into
`a plurality of clusters. A direct01y conu·oller u·acks copies of shru·ed data sent to
`processors in the clusters. Upon receiving an exclusive request from a processor
`requesting permission to modify a shru·ed copy of the data, the d