throbber
United States Patent [19J
`Luick et al.
`
`[54] MULTIPROCESSOR CACHE COHERENCE
`DIRECTED BY COMBINED LOCAL AND
`GLOBAL TABLES
`
`[75]
`
`Inventors: David Arnold Luick; John
`Christopher Willis; Philip Braun
`Winterfield, all of Rochester, Minn.
`
`[73] Assignee: International Business Machines
`Corporation, Armonk, N.Y.
`
`[ *] Notice:
`
`This patent issued on a continued pros(cid:173)
`ecution application filed under 37 CFR
`1.53( d), and is subject to the twenty year
`patent term provisions of 35 U.S.C.
`154(a)(2).
`
`[21] Appl. No.: 08/724,628
`
`[22] Filed:
`
`Oct. 1, 1996
`
`[51]
`[52]
`
`[58]
`
`[56]
`
`Int. Cl? ...................................................... G06F 12/00
`U.S. Cl. ........................... 711/141; 711!118; 711/146;
`711/144; 711!124
`Field of Search ..................................... 711!118, 130,
`711/165, 141, 146-148, 124, 152, 144
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,747,043
`4,843,542
`4,928,225
`5,025,365
`5,303,362
`5,490,261
`5,537,569
`5,577,204
`5,590,308
`5,604,882
`5,623,628
`5,710,907
`5,749,095
`5,751,995
`5,778,429
`5,822,763
`5,864,854
`5,892,970
`
`5/1988 Rodman .................................. 395/925
`6/1989 Dashiell et a!.
`.................. ...... 364/200
`5/1990 McCarthy eta!. ...................... 364/200
`6/1991 Mathur et a!. .......................... 711!121
`4/1994 Butts, Jr. et a!. ....................... 711!121
`2/1996 Bean et a!. .............................. 711!121
`7/1996 Masubushi ............... ............... 395/448
`11/1996 Brewer et a!. .......................... 710/132
`12/1996 Shih ........................................ 395/463
`2/1997 Hoover et a!. .......... ................ 395/448
`4/1997 Brayton et a!. ......................... 395/468
`1!1998 Hagersten eta!. ...................... 711!148
`5/1998 Hagersten ................ ............... 711!141
`5/1998 Sarangdhar ............................. 395/472
`7/1998 Sukegawa eta!. ..................... 711!129
`10/1998 Baylor et a!. ........... ................ 711!118
`1!1999 Boyle ........................................ 707/10
`4/1999 Hagersten ............................... 711!141
`
`111111
`
`1111111111111111111111111111111111111111111111111111111111111
`US006088769A
`[11] Patent Number:
`[45] Date of Patent:
`
`6,088,769
`*Jul. 11, 2000
`
`5,895,487
`5,897,664
`
`4/1999 Boyd et a!. ............................. 711!122
`4/1999 Nesheim et a!. .................. ...... 711!206
`
`OTHER PUBLICATIONS
`
`Leonidas I. Kontothanassis et al. "Lazy Release Consistency
`for Hardware-Coherent Multiprocessors", University of
`Rochester, 1994.
`Sandra Johnson Baylor et al., "An Evaluation of Cache
`Coherence Protocols for MIN-Based Multiprocessors",
`International Symposium on Shared Memory Multiprocess(cid:173)
`ing, pp. 230-241, Apr. 1991.
`Proceedings of the 17th International Symposium on Com(cid:173)
`puter Architecture, Jun. 1990, pp. 148-159, "The Directo(cid:173)
`ry-Based Cache Coherence Protocol for the DASH Multi(cid:173)
`processor" by Daniel Lenoski et al.
`IEEE Computer, vol. 25, No. 03, Mar. 1992, pp. 63-79, "The
`Stanford Dash Multiprocessor" by Daniel Lenoski et al.
`
`Primary Examiner-John W. Cabeca
`Assistant Examiner-Pierre-Michel Bataille
`Attorney, Agent, or Firm-Terrance A. Meador; David A.
`Hall; Karuna Ojanen
`
`[57]
`
`ABSTRACT
`
`A method and apparatus for maintaining coherence between
`shared data stored within a plurality of memory devices,
`each memory device residing in a different node within a
`tightly coupled multiprocessor system. Each node includes
`a "local coherence unit" and an associated processor. A
`cache unit is associated with each memory/processor pair.
`Each local coherence unit maintains a table which indicates
`whether the most current copy of data stored within the node
`resides in the local memory, in the local cache, or in a
`non-local cache. The present invention includes a "global
`coherence" unit coupled to each node via the logical inter(cid:173)
`connect. The global coherence unit includes a interconnect
`monitoring device and a global coherence table. When data
`which resides within the memory of a first node is trans(cid:173)
`ferred to a second node, the interconnect monitoring device
`updates the global coherence table to indicate that the data
`is being shared. The global coherence table also preferably
`indicates in which node a copy of the most current data
`resides.
`
`6 Claims, 7 Drawing Sheets
`
`Petition for Inter Partes Review of 
`U.S. Pat. No. 7,296,121
`IPR2015‐00158
`EXHIBIT
`Sony‐
`
`1
`
`

`

`/
`~------------L--l
`r;-;;;r- 103
`I
`~
`- -!- / 113 --'i
`.
`:
`I
`CACHE
`I
`CONTROLLER
`._11s r- f - _____ _ j
`I -
`-
`-
`I
`I
`-
`v
`I
`L2 r-- I
`t- - _j
`I PROCESSOR
`I
`'L 119 1
`I
`I
`I
`'I
`.)
`\
`I I
`I
`I
`TABLE
`L
`l
`I - - - - - -
`L ___ ~
`'-121
`L ______ ~o~ 1 _______ _j
`
`101
`
`101
`
`NO~II NO~I
`
`100
`
`/
`
`I INTERCONNECT
`124
`I
`BUS
`-- 123
`I r- _ _ __ .L _ --"l
`I
`I
`GCU
`I
`I
`BUS ~
`I
`I
`
`INTERFACE
`
`125
`
`I
`I
`
`I
`
`I
`I
`
`-
`
`-
`
`-
`
`-
`
`-
`
`1
`L
`
`I
`I
`I
`
`1
`
`117\
`
`1
`
`I
`L -
`
`MEMORY I ..
`.....
`
`LCU
`CONTROL
`
`1
`
`LCU
`
`-
`
`101
`
`-107
`
`BUS
`INTERFACE
`
`._ 111
`
`109
`
`FIG. 1
`
`1
`
`I
`
`BUS
`MONITOR
`
`GCU
`TABLE
`
`I
`
`I
`I
`_j
`
`127
`
`129
`
`d •
`\Jl
`•
`~
`~ ......
`~ = ......
`
`~ F-
`'"""'
`'"""' ~
`
`N c c c
`
`'JJ. =(cid:173)~
`~ .....
`'"""' 0 ......,
`
`-..J
`
`0\
`
`.... = 00
`
`00
`....
`......::.
`0\
`\C
`
`2
`
`

`

`FIG. 2A
`
`FIG. 2B
`
`CACHE LINE 1
`
`CACHE LINE 2 CACHE LINE 3
`
`• • •
`
`CACHE LINE n
`
`'
`
`0
`
`201
`
`1
`
`201
`
`•••
`
`1
`
`201
`
`0
`
`201
`
`CACHE LINE 1
`
`CACHE LINE 2 CACHE LINE 3
`
`•••
`
`CACHE LINE n
`
`01
`
`203
`VIRTUAL
`ADDRESS
`000000
`•
`•
`•
`•
`•
`010100
`010101
`•
`•
`•
`
`0
`
`011110
`
`00
`
`203
`
`•••
`
`10
`
`203
`
`00
`
`203
`
`NODE 1
`1
`
`NODE 2
`0
`
`••••
`0
`
`NODE n
`1
`
`1
`1
`
`1
`0
`
`0
`
`,
`
`1
`
`,
`
`0
`0
`
`0
`
`!
`
`0
`1
`
`1
`
`FIG. 2C
`
`205
`
`205
`
`205
`
`205
`
`d •
`\Jl
`•
`~
`~ ......
`~ = ......
`
`~
`
`~
`'"""'
`'"""' ~
`
`N c c c
`
`'JJ. =(cid:173)~
`~ .....
`N
`0 ......,
`-..J
`
`0\
`
`.... = 00
`
`00
`....
`......::.
`0\
`\C
`
`3
`
`

`

`U.S. Patent
`
`Jul. 11, 2000
`
`Sheet 3 of 7
`
`6,088,769
`
`YES
`
`READ MCD
`FROM LOCAL
`CACHE
`
`307
`
`313
`
`315
`
`317
`
`319
`
`321
`
`SEND REQUEST FOR
`MCD OVER GLOBAL
`INTERCONNECT
`
`RECEIVE REQUEST
`IN GCU
`
`READ MCD
`FROM LOCAL
`MEMORY &
`UPDATE LCU
`
`311
`
`SEND REQUEST FROM
`GCU TO NODE
`
`RESPOND BY
`SENDING MCD TO
`REQUESING NODE
`
`FIG. 3
`
`4
`
`

`

`U.S. Patent
`
`Jul. 11, 2000
`
`Sheet 4 of 7
`
`6,088,769
`
`301
`
`READ MCD
`FROM LOCAL
`MEMORY &
`UPDATE LCU
`
`311
`
`@
`
`YES
`
`READ MCD
`FROM LOCAL
`CACHE
`
`307
`
`313
`
`315
`
`401
`
`403
`
`405
`
`SEND REQUEST FOR
`MCD OVER GLOBAL
`INTERCONNECT
`
`RECEIVE REQUEST
`IN GCU
`
`DETERMINE NODE
`CONTAINING MCD
`
`SEND REQUEST TO
`TRANSFER MCD
`
`TRANSFER DATA
`
`FIG. 4
`
`5
`
`

`

`U.S. Patent
`
`Jul. 11, 2000
`
`Sheet 5 of 7
`
`6,088,769
`
`READ MCD
`FROM LOCAL
`MEMORY &
`UPDATE LCU
`
`311
`
`YES
`
`READ MCD
`FROM LOCAL
`CACHE
`
`307
`
`SEND REQUEST FOR
`MCD OVER GLOBAL
`INTERCONNECT
`
`313
`
`315
`
`317
`
`501
`
`503
`
`TO
`
`GCU TO TRANSFER
`DATA TO NODE
`
`FIG. 5
`
`6
`
`

`

`U.S. Patent
`
`Jul. 11, 2000
`
`Sheet 6 of 7
`
`6,088,769
`
`NO
`
`UPDATE LCU
`
`611
`
`607
`
`COPY DATA
`FROM CACHE TO •-----------+-------'
`LOCAL MEMORY
`COPY DATA
`TO NON-LOCAL
`CACHE
`IN
`NODE 2
`
`621
`
`617
`
`COPY DATA TO
`EACH OTHER
`SUCH NODE
`
`623
`
`COPY DATA TO
`MEMORY
`IN NODE 2
`
`619
`
`FIG. 6
`
`7
`
`

`

`U.S. Patent
`
`Jul. 11, 2000
`
`Sheet 7 of 7
`
`6,088,769
`
`WRITE TO
`LOCAL CACHE
`
`701
`
`UPDATE LCU
`
`703
`
`UPDATE GCU
`
`705
`
`UPDATE
`LCUs
`IN SHARED
`NODES
`
`709
`
`707
`
`FIG. 7
`
`8
`
`

`

`1
`MULTIPROCESSOR CACHE COHERENCE
`DIRECTED BY COMBINED LOCAL AND
`GLOBAL TABLES
`
`6,088,769
`
`2
`the width of directory entries within such tables, making the
`table relatively large and complex.
`Accordingly, it is an object of the present invention to
`provide a system and method for maintaining coherence of
`5 data stored in multiple caches within a multiprocessor
`system by storing relatively small and simple entries in
`coherence tables.
`
`BACKGROUND OF THE INVENTION
`1. Field of the Invention
`The present invention relates generally to computer cache
`schemes, and more particularly to a method and apparatus
`for maintaining coherence between memories within a sys(cid:173)
`tem having distributed shared memory.
`2. Description of the Related Art
`Due to the demand for greater processing power, and due
`to the desire to have relatively small processors work
`cooperatively as a multi-processing system, there have been
`many attempts over the last several years to solve the
`problems inherent in maintaining coherence between
`memory devices which are accessible to more than one
`processing device. The coherence problem is exemplified by
`a system in which a single logical interconnect channel is
`shared by a plurality of memory devices, each paired with an
`associated processor to form a "tightly-coupled" multi(cid:173)
`processor system. Reference from a processor may be to
`addresses that are within the memory associated with the
`requesting processor or within a memory associated with
`another of the processors within the system.
`Each processor is also associated with a local cache. The
`cache is a relatively small high speed memory device which
`allows data to be accessed rapidly. The cache is typically
`relatively small because the cost of such high speed memory
`is relatively high. However, in systems in which the pro(cid:173)
`cessors are very long instruction word processors (VLIWs)
`the caches are unusually large, increasing the likelihood that
`information stored in a cache associated with a first proces(cid:173)
`sor may also be stored in the cache associated with at least 35
`a second processor. It should be understood that there is a
`tradeoff between the size and the speed of the cache.
`Consider a cache that is a two level cache in which the
`first level of the cache is smaller and faster than the second
`level of the cache. Therefore, the data which is most likely 40
`to be required is maintained in the first level of the cache,
`and data which is less likely to be required is maintained in
`the second level of the cache. If data is required which is not
`present in either the first or the second level of the cache,
`then the data must be retrieved from the slower memory 45
`device. A number of different algorithms have been devised
`for determining what data to maintain in each level of the
`cache (i.e., anticipating what data is most likely to be
`required soon). However, a detailed understanding of such
`algorithms is not necessary for the purposes of the present 50
`discussion.
`Since each processor has an associated cache, and each
`cache may have more than one level, care must be taken to
`ensure the coherence of the data that is maintained through(cid:173)
`out the system. That is, if more than one cache may contain 55
`a copy of the same interval of shared, writable memory, a
`reasonable approach to parallel programming requires some
`provision for ensuring that all copies reflect a coherent
`value. One way to ensure that coherence is maintained is to
`maintain a directory (or table) which points to each non- 60
`local cache in which the data resides. By knowing the
`location of each copy of the data, each copy can either be
`updated, or a notation can be made within the table to
`indicate that the data at one or more locations is out-of-date.
`Such tables require pointers to multiple nodes which are 65
`caching data. However, maintaining one table which points
`to each copy within each node increases the complexity and
`
`10
`
`15
`
`20
`
`25
`
`30
`
`SUMMARY OF THE INVENTION
`
`The present invention is a method and apparatus for
`maintaining coherence between shared data stored within a
`plurality of memory devices, each memory device residing
`in a different node within a tightly coupled multiprocessor
`system. Each of the nodes are coupled via a logical inter(cid:173)
`connect (such as a bus). Each node further includes a "local
`coherence unit", and a processor associated with the
`memory device and forming a memory/processor pair there(cid:173)
`with. In accordance with one embodiment of the present
`invention, a cache unit is associated with each memory/
`processor pair to aid in reading and writing to each memory
`device. Each processor is capable of sharing data with each
`other processor, and reading and writing to addresses within
`any of the memory devices within the system. Data is
`considered to be "shared" when that data is stored in more
`than one node. For example, shared data may be stored in a
`memory device within a first node and within a cache within
`a second node. In addition, shared data may be stored in a
`memory device within a first node and within a memory
`device within a second node. Reading and writing to
`memory devices is preferably accomplished through the
`cache. However, in an alternative embodiment, no such
`cache is provided.
`Each local coherence unit maintains a table which indi(cid:173)
`cates whether the most current copy of data stored within the
`node resides in the local memory (i.e., the memory within
`the node in which the local coherence table resides), in the
`local cache (i.e., a cache within the node in which the local
`coherence table resides), or in a non-local cache (i.e., a cache
`within a node other than the node in which the local
`coherence table resides).
`In addition to the plurality of nodes, the multiprocessor
`system of the present invention includes a "global coher(cid:173)
`ence" unit coupled to each node via the logical interconnect.
`The global coherence unit includes a interconnect monitor(cid:173)
`ing device and a global coherence table. The interconnect
`monitoring device monitors communications between the
`nodes and detects transfers of data from one node to another.
`When data which resides within the memory of a first node
`is transferred to a second node, the interconnect monitoring
`device updates the global coherence table to indicate that the
`data is being shared. The global coherence table also pref(cid:173)
`erably indicates that a copy of the most current data resides
`within the second node. Data which resides within the
`memory of a first node and within the cache of a second node
`is referred to generally as shared data, and more particularly
`as "non-locally cached data". In accordance with one
`embodiment of the present invention, when data is trans(cid:173)
`ferred from a first node to a second, the local coherence table
`within the first node is updated to reflect that the most
`current copy resides in a non-local cache. If an attempt is
`made to modify the data, the modification will always be
`directed to be performed on only one copy of the data (the
`most current copy), and that copy will be well identified.
`In an alternative embodiment of the present invention, the
`local coherence table is only updated if data shared by the
`node in which the table resides is modified. Accordingly, in
`
`9
`
`

`

`6,088,769
`
`3
`accordance with such an embodiment, a processor which
`modifies shared data must send a message to each node with
`which the data is shared to indicate that the data stored
`therein is no longer current. The local coherence table within
`each such node is then updated to reflect that fact that the
`shared data is no longer current.
`The use of local coherence tables and a global coherence
`table reduces the size and complexity of the coherence table
`that would otherwise be required. In addition, the present
`invention provides a means by which direct cache to cache
`transfers can be accomplished. Still further, by maintaining
`a local coherence table to indicate when shared data is
`current, use of the logical interconnect is reduced. That is,
`there is no need to inquire outside the node when the most
`current copy of requested data presently resides within the
`node.
`The details of the present invention, both as to its structure
`and operation, can best be understood in reference to the
`accompanying drawings, in which like reference numerals
`refer to like parts, and in which:
`
`BRIEF DESCRIPTION OF 1HE DRAWING
`
`5
`
`4
`invention, a plurality of nodes 101 are provided, each of
`which preferably include a processor 103, a memory 105, a
`cache 107, a local coherence unit (LCU) 109, and a bus
`interface 111. However, it should be understood that nodes
`in accordance with an alternative embodiment of the present
`invention may be provided without a cache. The processor
`103 and memory 105 are collectively referred to as a
`memory/processor pair. The cache 107 is preferably a con(cid:173)
`ventional cache including a cache controller 113, a first level
`10 cache memory 115, and a second level cache memory 117.
`However, it should be understood that the cache may be
`any system or device which increases the speed at which
`information can be read from and/or written to memory by
`storing information in a faster and smaller memory device.
`15 For example, the cache 107 may be a single level cache, or
`may have more than two levels. Furthermore, the cache may
`be implemented using any type of memory device which has
`a read and write cycle that is fast with respect to the memory
`105. Still further, any size "cache line" may be used. A cache
`20 line is defined as the minimum amount of data that is
`transferred into or out of the cache in a single transfer
`operation.
`The LCU 109 includes a control processor 119 and a
`Local Coherence Table (LCT) 121. The LCT 121 preferably
`has one entry associated with each cache line stored within
`the memory 105. Alternatively, the size of each portion of
`memory that is associated with an entry to the LCT 121 may
`be smaller or larger than a cache line. The address of the
`portion of memory is used as an index into the LCT 121.
`30 This index may be either the virtual address or the effective
`address of the particular portion of data. Virtual addresses
`are uniquely assigned to each unique individually address(cid:173)
`able portion of data within the system. Effective addresses
`are the physical addresses within each local memory 105 in
`35 each node. Therefore, the same effective address may be
`used in several different nodes 101 throughout the system to
`access different portions of data. Nonetheless, each portion
`of memory within a particular node has a unique effective
`address. Effective addresses from different nodes can each
`40 be mapped to one virtual address so that the data can be
`shared by different nodes. For example, data stored at
`effective address "101101" in a first node and effective
`address "10001" within a second node can each be mapped
`to virtual address "100000111101". In this way, data can be
`shared by the two processors within the two nodes, and yet
`may be stored locally within each node. The coherence
`tables of the present invention provide the means by which
`such shared data remains coherent. Accordingly, it will be
`understood that each virtual address is mapped to a unique
`50 effective address within only one node 101 unless two or
`more nodes share the same data, in which case each effective
`address at which such shared data is stored will be mapped
`to the same virtual address.
`Each entry in the LCT 121 preferably also has one bit
`55 which indicates whether the contents of the memory 105 at
`the address associated with the entry is the most current copy
`of that data. The cache controller 113 maintains a record as
`to whether a copy of the contents of the memory 105 at each
`particular location is present within the cache 107. In
`60 accordance with the preferred embodiment of the present
`invention, the cache controller 113 is always consulted when
`data is requested by the processor 103 from memory 105.
`Therefore, if the LCT 121 indicates that the most current
`copy of the data stored at a particular address is locally
`65 available, the cache controller 113 will determine whether
`the most current copy of that data is present in the cache 107
`or the memory 105.
`
`FIG. 1 is a functional block diagram of a tightly coupled
`multiprocessor system in accordance with the present inven- 25
`tion;
`FIG. 2a is an illustration of the organization of a local
`coherence table in accordance with one embodiment of the
`present invention;
`FIG. 2b is an illustration of the organization of a local
`coherence table in accordance with an alternative embodi(cid:173)
`ment of the present invention;
`FIG. 2c is an illustration of the organization of a local
`coherence table in accordance with another alternative
`embodiment of the present invention;
`FIG. 3 is a flow chart of one embodiment of the inventive
`method for reading data;
`FIG. 4 is a flow chart of an alternative embodiment of the
`inventive method for reading data;
`FIG. 5 is a flow chart of yet another alternative embodi(cid:173)
`ment of the inventive method for reading data;
`FIG. 6 is a flow chart of one embodiment of the inventive
`method for writing data; and
`FIG. 7 is a flow chart of an alternative embodiment of the 45
`inventive method for writing data.
`
`DETAILED DESCRIPTION OF 1HE
`PREFERRED EMBODIMENTS
`
`The present invention is a method and apparatus that uses
`local coherence tables and a global coherence table to
`maintain coherence between data that is distributed through(cid:173)
`out a tightly coupled computer system. The use of local
`coherence tables and a global coherence table reduces the
`size and complexity of the coherence table that would
`otherwise be required. In addition, the present invention
`provides a means by which direct cache to cache transfers
`can be accomplished. Still further, by maintaining a local
`coherence table to indicate when shared data is current, use
`of the logical interconnect is reduced. That is, there is no
`need to inquire outside the node when the most current copy
`of requested data presently resides within the node.
`Functional Configuration of the Present Invention
`FIG. 1 is a functional block diagram of a tightly coupled
`multiprocessor system 100 configured in accordance with
`the present invention. In accordance with the present
`
`10
`
`

`

`6,088,769
`
`5
`In accordance with one embodiment of the present
`invention, the table maintained by the cache controller 113
`and the table maintained by the LCT 121 may be combined.
`Accordingly, a single table may be maintained which indi(cid:173)
`cates both whether the contents of a particular memory 5
`location is cached and whether the local copy stored in either
`the memory 105 or the cache 107 is the most current copy.
`Thus, the functions of the cache controller 113 and the
`control processor 119 may be combined, such that each entry
`in the LCT 121 indicates whether the most current copy of
`a particular cache line is available from the cache 107, the
`memory 105, or a source outside the node (i.e., either storage
`or a non-local cache (i.e., a cache within another node).
`The bus interface 111 provides an interface between the
`node 101, other nodes 101, and a Global Coherence Unit
`(GCU) 123 via a logical interconnect 124 (such as a bus).
`The GCU 123 includes a bus interface 125, a interconnect
`monitor 127, and a Global Coherence Table (GCT) 129. The
`interconnect monitor 127 passively monitors the logical
`interconnect 124 to detect attempts by a first node to read
`data from, or write data to, a second node. When the
`interconnect monitor 127 detects that data is being written or
`read between two nodes, the interconnect monitor 127
`updates the GCT 129 to indicate that the data is being shared
`by the two nodes. In accordance with one embodiment of the
`present invention, the GCT 129 is indexed by the virtual
`address of the shared data. The GCT 129 may be configured
`as a data base in which the key is the virtual address. When
`data is shared for the first time, a new entry to the GCT 129
`is generated. Each entry indicates which nodes share the data
`(i.e., which nodes have an effective address that is mapped
`to the same virtual address). Likewise, when a node reads or
`writes to a virtual address for the first time, the node that
`reads, or the node to which data is written, is then added to
`the list of nodes that share that virtual address.
`FIG. 2a is an illustration of the organization of an LCT
`121 in accordance with one embodiment of the present
`invention. The LCT 121 in FIG. 2a has a single bit 201
`associated with each cache line of data within the memory
`105. In accordance with one embodiment, the bit is set to
`indicate that the current locally stored data at the address
`associated with that bit is not the most current copy of that
`data (i.e., that the data was read by another node and may
`have been modified in that node). Addresses associated with
`a bit are referred to herein as "cache line x", where x is a
`number from 1 to n, n being the total number of cache lines
`in memory 105. Accordingly, a first cache line of data (cache
`line 1) stored locally within the node 101 is the most current
`copy of that data, as indicated by the fact that the bit 201 is
`not asserted. However, the second cache line of data stored
`locally (cache line 2) may not be the most current copy of
`that data, since the bit associated with that cache line is
`asserted. In an alternative embodiment of the present
`invention, the bit is only asserted when a copy of the data
`associated with that bit has been modified in another node. 55
`In accordance with such an embodiment of the invention, the
`node which modifies the cache line must report that the
`modification has occurred to each of the nodes that share that
`data. While the overhead required to maintain the LCT 121
`is greater in such an embodiment, the cache operates more
`efficiently, since the local cache in a first node can still be
`used after a second node reads the cache line from the first
`node, as long as no other node has modified the information.
`In the embodiment in which the LCT 121 bit 201 is asserted
`whenever that associated cache line has been read, the
`overhead is lower, since it is relatively easy for a local
`control processor 119 to determine when data has been read
`
`6
`from the node in which that control processor 119 resides. In
`contrast, it is more complex for the control processor 119 to
`determine when another node has modified locally stored
`data.
`The LCT 121 of FIG. 2a may be either directly or
`associatively mapped to the memory 105. For example, if
`the cache is organized in accordance with a "hashing"
`algorithm, then the LCT 121 may be organized in accor(cid:173)
`dance with the same "hashing" algorithm used to determine
`10 the manner in which cache lines are organized within the
`cache. Hashing algorithms for determining the order in
`which cache lines are stored in caches are well known.
`Alternatively, the order of the entries may be either arbitrary,
`or may be the same as the order of the cache lines within the
`15 memory 105.
`In an alternative embodiment of the present invention
`shown in FIG. 2b, entries to the LCT 121 indicate whether
`the most current copy of a cache line is presently maintained
`in the memory 105, the cache 107, or non-locally (i.e., in
`20 another node). For example, two bits 203 may be used to
`indicate where the most current copy of the cache line
`resides. In accordance with one embodiment, "00" indicates
`that the most current copy of the cache line resides within
`the local cache 107, "01" indicates that the most current
`25 copy of the cache line resides within the local memory 105,
`and "10" indicates that the most current copy of the cache
`line resides within another node.
`One embodiment of the organization of the contents of the
`GCT 129 in accordance with the present invention is shown
`30 in FIG. 2c. The virtual address of each cache line which is
`currently being shared by more than one node 101 is
`maintained in the GCT 129. Alternatively, the table may be
`organized such that an entry is to be provided for each cache
`line which may possibly be shared, regardless of whether
`35 that virtual address is currently being shared. Preferably,
`associated with each entry are a plurality of bits 205 which
`indicate which nodes are currently sharing the associated
`virtual address. For example, in the table shown in FIG. 2c,
`the virtual address "000000" is shared by nodes 1 and n. The
`40 virtual address "010100" is shared by nodes 1 and 2, etc. In
`an alternative embodiment of the present invention, a bit
`pattern or code may be uniquely assigned to each node. This
`bit pattern is then stored in an entry associated with a virtual
`address to indicate that the node to which the code or pattern
`45 is assigned is one of the nodes that share that virtual address.
`Read Operations in Accordance With the Present Invention
`FIG. 3 is a flow chart illustrating the method for reading
`data in a manner which ensures coherence in accordance
`with the present invention. Initially, the processor 103 within
`50 a first of the nodes 101 requests data from the cache
`controller 113 (STEP 301). As part of the request, the
`processor 103 provides the cache controller 113 with the
`virtual address of the data to be read. In an alternative
`embodiment of the present invention, the processor 103
`provides the cache controller 113 with the effective address
`of the data to be read. In either case, the cache controller 113
`determines whether the data to be read is present in a first
`level cache (Ll) (STEP 303). Preferably, concurrent with
`STEP 303, the cache controller 113 provides the address to
`60 the control processor 119 to determine whether the most
`current copy of the data to be read resides locally (i.e., within
`the same node 101 as the requesting processor 103) (STEP
`305). If the most current copy of the data is present within
`the local cache 107, then the data read request is serviced
`65 from the local cache (STEP 307) and the process ends.
`If the most current copy of the data to be read is not in the
`local cache 107, then the cache controller 113 checks
`
`11
`
`

`

`6,088,769
`
`7
`whether the data to be read resides within the memory 105
`(STEP 309). If the cache controller 113 indicates that the
`data to be read resides within the memory 105, and the
`control processor 119 reads the LCT 121 and determines that
`the most current copy of the data to be read resides locally, 5
`then the data read request is serviced by the cache controller
`113 copying the data from the memory 105 into the cache
`107. The cache controller 113 then makes the data available
`to the processor 103 (STEP 311) and the process ends.
`If the cache controller 113 determines that the data to be
`read is not stored within the memory 105 (STEP 309), or the
`control processor 119 determines that the copy that is stored
`in memory 105 is not the most current copy of the data
`(STEP 309), then the cache controller 113 requests the data
`over the logical interconnect 124 (STEP 313). In forming the
`request, the address of the data being requested must be a 15
`virtual address. The request is received by the GCU 123
`(STEP 315). The GCU 123 then determines which node has
`the most current copy of the data to be read. It may be
`possible for more than one node to have the most current
`copy of the data to be read. In either case, the GCU 123 20
`determines which node is to respond to the request (STEP
`317). In accordance with the embodiment illustrated in FIG.
`3, a request to service the read operation is then sent from
`the GCU 123 to the node which the GCU 123 has deter(cid:173)
`mined should respond (STEP 319). The request is received
`by the bus interface 111 over the logical interconnect 124
`and coupled to the cache controller 113. The cache controller
`113 and the LCU 109 determine where the most current copy
`of the data to be read resides. The cache controller 113 then
`transmits the data to be read over the logical interconnect
`124 (STEP 321). The requesting node 101 receives the data
`in the bus interface 111. The data is coupled to the cache
`controller 113, which makes the data available to the
`requesting processor 103. In accordance with the preferred
`embodiment of the present invention, the data is transferred
`from cache to cache by the cache controllers 113 within the
`requesting node 101 and the node 101 from which the data
`is to be read. If the data to be read is not within the cache
`107, then the cache controller 113 retrieves the data from
`local memory 105 and transmits the data directly to the
`cache controller 113 in the requesting node 101.
`In an alternative embodiment of the present invention
`shown in FIG. 4, each of the steps 301-317 are performed
`as described above. However, once the GCU 12

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket