throbber
(12) United States Patent
`(10) Patent N0.:
`US 6,662,277 B2
`
`Gaither
`(45) Date of Patent:
`Dec. 9, 2003
`
`U8006662277B2
`
`(54) CACHE SYSTEM WITH GROUPS OF LINES
`AND WITH COHERENCY FOR BOTH
`SINGLE LINES AND GROUPS OF LINES
`
`(75)
`
`Inventor: Blaine D. Gaither, Fort Collins, CO
`(us)
`
`(73) Assignee: Hewlett-Packard Development
`Company, L.P., Houston, TX (US)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`OTHER PUBLICATIONS
`.
`.
`“Cache Coherence Protocol For A Multiple Bus Multipro-
`cessor System”; 09/704,176; Blaine D. Gaither et al; Filed
`OCL 31> 2000
`“On Effectiveness Of Sectored Caches In Reducing False
`Sharing Misses”; 1997 IEEE; Kuang—Chih Liu & Chung—Ta
`King; Department of Computer Science; National Tsing Hua
`University; 99. 352—359.
`“Two Techniques For Improving Performance On Bus-
`—based Multiprocessors”; 1995 IEEE; Craig Anderson and
`Jean—Loup Baer; Department of Computer Science and
`Engineering; University of Washington; pp. 264—275.
`
`(21) Appl. No.: 09/919,309
`
`* Cited by examiner
`
`(22)
`
`Filed:
`
`(65)
`
`Jul. 31, 2001
`_
`_
`_
`Pr10r Publlcatlon Data
`
`Primary Examiner—Donald Sparks
`Assistant Examiner—Ngoc Dinh
`(74) Attorney, Agent, or Firm—Augustus W. Winfield
`
`US 2003/0028730 A1 Feb. 6, 2003
`7
`
`(57)
`
`ABSTRACT
`
`................................................ G06F 12/00
`Int. Cl.
`(51)
`....................... 711/145; 711/141; 711/144;
`(52) US. Cl.
`711/146; 711/147; 711/148
`_
`(58) Fleld of Search ................................. 711/141, 145,
`711/146, 147, 148
`
`(56)
`
`_
`References Clted
`U.S. PATENT DOCUMENTS
`
`9/1997 Cheriton ..................... 711/144
`5,666,514 A *
`7/1998 Pawlowski ..
`711/137
`5,787,475 A *
`5,832,232 A * 11/1998 Danneels
`709/235
`6,035,376 A *
`3/2000 James
`...........
`711/145
`6,049,845 A *
`4/2000 Bauman et a1.
`...... 710/113
`
`...... 711/156
`6,189,078 B1 *
`2/2001 Bauman et a1.
`.............. 711/146
`6,266,744 B1 *
`7/2001 Hughes et a1.
`
`In a computer system with caching, memory transactions
`can retrieve and store groups of lines. Coherency states are
`maintained for groups of lines, and for individual lines. A
`single coherency transaction, and a single address
`transaction, can then result in the transfer of multiple lines
`of data, reducing overall latency. Even though lines may be
`transferred as a group, the lines can subsequently be treated
`separately. This avoids many of the problems caused by long
`lines, such as increased cache-to-cache copy activity. In one
`alternative, when a cache memory requests a group of lines,
`and when the group of lines is partially owned by another
`cache memory,
`then the requesting cache receives fewer
`than all the lines in the requested group.
`
`9 Claims, 5 Drawing Sheets
`
`300
`GROUP UNOWNED
`?
`
`r304
`
`REQUESTED LINE
`OWNED
`?
`
`YES
`
`YES
`
`r302
`_TTT___1
`GET UP TO ALL LINES IN GROUP,
`INVALIDATE ALL EXISTING COPIES
`(IN OTHER CACHES) OF RETRIEVED
`LINES, MARK REQUESTED LINE AS
`EXCLUSIVE, MARK OTHER
`RETRIEVED LINES AS SHARED
`306
`NO———1
`GET UP TO ALL UNOWNED LINES IN
`COPIES (IN OTHER CACHES) OF
`RETRIEVED LINES, MARK
`REQUESTED LINE AS EXCLUSIVE,
`MARK OTHER RETRIEVED LINES AS
`SHARED
`
`r310
`No_l
`INVALIDATE ALL EXISTING
`COPIES (IN OTHER CACHES)
`OF REQUESTED LINE, MOST
`CURRENT COPY TO
`MEMORY, MOST CURRENT
`COPY TO REQUESTOR, MARK
`REQUESTED LINE AS
`EXCLUSIVE
`
`
` GROUP, INVALIDATE ALL EXISTING
`YES
`
`
`
`308
`
`LINES OWNED
`
`Y SAME OWNE
`
`
`
`
`
`
`v
`312
`INVALIDATE ALL EXISTING
`COPIES (IN OTHER CACHES)
`OF GROUP, MOST CURRENT
`COPY
`TO MEMORY, MOST
`CU
`RRENT COPY TO
`REQUESTOR, MARK GROUP
`AS EXCLUSIVE
`
`
`
`APPLE1020
`
`1
`
`APPLE 1020
`
`

`

`US. Patent
`
`Dec. 9, 2003
`
`Sheet 1 0f 5
`
`US 6,662,277 B2
`
`flmflflflfl
`
`110
`
`120
`
`an
`112
`
`100}
`
`“I
`
`“8
`
`k
`
`102
`
`104
`
`\V
`
`114
`
`FIG. 1
`
`2
`
`

`

`US. Patent
`
`Dec. 9, 2003
`
`Sheet 2 0f 5
`
`US 6,662,277 B2
`
`200
`
`
`
`GROUP UNOWNED
`
`YES
`
`GET COPIES OF UP TO
`
`
`
`
`ALL LINES IN GROUP,
`MARK RETRIEVED
`
`LINES AS SHARED
`
`NO
`
`
`
`GET COPIES OF UP TO
`
`ALL UNOWNED LINES
`
`
`
`IN GROUP, MARK
`RETRIEVED LINES AS
`
`
`
`
`
`SHARED
`
`
`
`GET COPY AND
`
`
`OWNERSHIP OF LINE
`
`NO
`
`
`
`
`
`FROM OWNER, MARK
`LINE AS EXCLUSIVE OR
`
`SHARED
`
`
`FIG. 2
`
`
`
`
`
`
`
`
`208
`
`
`LINES OWNED
`-Y SAME OWNE'
`
`YES
`
`GET COPIES AND
`
`OWNERSHIP OF UP TO
`
`ALL LINES IN GROUP
`
`FROM OWNER, MARK
`RETRIEVED LINES AS
`
`SHARED
`
`EXCLUSIVE OR
`
`NO
`
`REQUESTED LINE
`
`OWNED
`
`7 '
`
`YES
`
`3
`
`

`

`US. Patent
`
`Dec. 9, 2003
`
`Sheet 3 0f 5
`
`US 6,662,277 B2
`
`F300
`
`
`GROUP UNOWNED
`
`YES
`
`NO
`
`REQUESTED LINE
`
`NO
`
`YES
`
`
`
`
`GET UP TO ALL LINES IN GROUP,
`INVALIDATE ALL EXISTING COPIES
`
`
`
`302
`
`(IN OTHER CACHES) OF RETRIEVED
`LINES, MARK REQUESTED LINE AS
`EXCLUSIVE, MARK OTHER
`RETRIEVED LINES AS SHARED
`
`
`
`
`
`
`GET UP TO ALL UNOWNED LINES IN
`GROUP, INVALIDATE ALL EXISTING
`COPIES (IN OTHER CACHES) OF
`RETRIEVED LINES, MARK
`REQUESTED LINE AS EXCLUSIVE,
`MARK OTHER RETRIEVED LINES AS
`SHARED
`
`
`
`
`
`
`
`(308
` NO
`LINES OWNED
`
`'Y SAME OWNE'
`310
`
`
`
`
`
`
`
`
`
`
`
`
`INVALIDATE ALL EXISTING
`
`
`COPIES (IN OTHER CACHES)
`
`OF REQUESTED LINE, MOST
`
`CURRENT COPY TO
`
`
`
`
`MEMORY, MOST CURRENT
`COPY TO REQUESTOR, MARK
`REQUESTED LINE AS
`EXCLUSIVE
`
`
`INVALIDATE ALL EXISTING
`
`
`COPIES (IN OTHER CACHES)
`OF GROUP, MOST CURRENT
`COPY TO MEMORY, MOST
`CURRENT COPY TO
`
`REQUESTOR, MARK GROUP
`AS EXCLUSIVE
`
`
`
`FIG. 3
`
`4
`
`

`

`US. Patent
`
`Dec. 9, 2003
`
`Sheet 4 0f 5
`
`US 6,662,277 132
`
`308
`
`
`LINES OWNED
`'Y SAME OWNE'
`
`
`
`
`YES
`
`NO
`
`02
`
`
`
`
`
`INVALIDATE ALL EXISTING
`
`COPIES (IN OTHER
`CACHES) OF REQUESTED
`LINE, MOST CURRENT
`COPY TO MEMORY, MOST
`
`CURRENT COPY TO
`
`REQUESTOR, MARK
`REQUESTED LINE AS
`
`
`
`
`
`
`
`
`
`
`
`
`
`00
`
`ARE ALL LINES
`
`MODIFIED
`
`
`
`
`
`?
`
`YES
`
`INVALIDATE ALL EXISTING
`
`COPIES (IN OTHER
`CACHES) OF GROUP,
`MOST CURRENT COPY TO
`
`MEMORY, MOST CURRENT
`COPY TO REQUESTOR,
`
`MARK GROUP AS
`
`EXCLUSIVE
`
`402'WI
`
`
`
`
`EXCLUSIVE
`
`
`
`l
`' COPY ALL, MARK NON-
`
`REQUESTED LINES AS
`SHARED
`:
`r____.——__v——————I
`I
`I
`COPY ALL, MARK NON-
`: REQUESTED MODIFIED
`402"\|
`LINES AS EXCLUSIVE,
`, MARK NON-REQUESTED
`l
`EXCLUSIVE LINES AS
`:
`SHARED
`
`Il
`
`5
`
`

`

`US. Patent
`
`Dec. 9, 2003
`
`Sheet 5 0f 5
`
`US 6,662,277 B2
`
`500
`
`GROUP IN GOTL
`?
`
`
`
`
`
`YES
`
`
`
`
`ANY LINES IN
`GROUP OWNED
`
`502
`
`
`
`
`YES
`
`r505
`
`REQUESTED LINE
`OWNED
`?
`
`INVALIDATE ALL EXISTING
`
`COPIES (IN OTHER CACHES),
`ALLOCATE GROUP IN GOTL,
`
`RETRIEVE UP TO ENTIRE GROUP,
`MARK REQUESTED LINE AS
`OWNED, OTHERS AS SHARED
`
`INVALIDATE ALL EXISTING
`
`COPIES (IN OTHER CACHES),
`RETRIEVE UP TO ALL
`
`UNOWNED LINES, MARK
`REQUESTED LINE AS
`
`OWNED, OTHERS AS
`SHARED
`
`
`
`
`
`
`
`
`YES
`
`510
`
`LINES OWNED
`
`
`'Y SAME OWNE'
`
`YES
`
`RETRIEVE GROUP
`
`RETRIEVE REQUESTED
`
`FROM OWNER,
`TRANSFER
`
`OWNERSHIP IN GOTL
`
`LINE FROM OWNER,
`TRANSFER
`
`OWNERSHIP IN GOTL
`
`FIG. 5
`
`6
`
`

`

`US 6,662,277 B2
`
`1
`CACHE SYSTEM WITH GROUPS OF LINES
`AND WITH COHERENCY FOR BOTH
`SINGLE LINES AND GROUPS OF LINES
`
`FIELD OF INVENTION
`
`This invention relates generally to computer systems and
`more specifically to cache memory systems.
`BACKGROUND OF THE INVENTION
`
`Most computer systems employ a multilevel hierarchy of
`memory systems, with relatively fast, expensive, limited-
`capacity memory at the highest level of the hierarchy and
`proceeding to relatively slower, lower cost, higher-capacity
`memory at the lowest level of the hierarchy. Typically, the
`hierarchy includes a small fast memory called a cache, either
`physically integrated within a processor integrated circuit, or
`mounted physically close to the processor for speed. There
`may be separate instruction caches and data caches. There
`may be multiple levels of caches.
`Caches are commonly organized around an amount of
`memory called a line, or block, or page. The present patent
`document uses the term “line,” but the invention is equally
`applicable to systems employing blocks or pages.
`Many computer systems employ multiple processors,
`each of which may have multiple levels of caches. Some
`caches may be shared by multiple processors. All processors
`and caches may share a common main memory. Aparticular
`line may simultaneously exist in memory and in the cache
`hierarchies for multiple processors. All copies of a line in the
`caches must be identical, a property called coherency. The
`protocols for maintaining coherence for multiple processors
`are called cache coherence protocols.
`A cache “owns” a line if the cache has permission to
`modify the line without issuing any further coherency trans-
`actions. There can only be one “owner” of a line. For any
`cache coherence protocol, the most current copy of a cache
`line must be retrieved from the current owner, if any, and a
`copy of the data must be delivered to the requestor. If the line
`is to be modified, ownership must be acquired by the
`requester, and any shared copies must be invalidated.
`There are three common approaches to determine the
`location of the owner of a line, with many variations and
`hybrids. In one approach, called a snooping protocol, or
`snoop-based protocol, the owner is unknown, and all caches
`must be interrogated (snooped) to determine the location of
`the most current copy of the requested line. All requests for
`access to a cache line, by any device in the system, are
`forwarded to all caches in the system. Eventually, the most
`current copy of a line is located and a copy is provided to the
`requestor.
`In a single-bus system, coherence (snooping)
`traffic, addresses, and often data all share a common bus.
`In a second approach, called a directory-based protocol,
`memory is provided to maintain information about the state
`of every line in the memory system. For example, for every
`line in memory, a directory may include a bit for each cache
`hierarchy to indicate whether that cache hierarchy has a
`copy of the line, and a bit to indicate whether that cache
`hierarchy has ownership. For every request for access to a
`cache line, the directory must be consulted to determine the
`owner, and then the most current copy of the line is retrieved
`and delivered to the requester. Typically, tags and status bits
`for a directory are stored in main memory, so that a request
`for state information cycles main memory and has the
`latency of main memory. In a multiple bus system, directory
`traffic may be on a separate bus.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`A third approach is a global coherency filter, which has a
`tag for every valid line in the cache system. A coherency
`filter is a snoop system with a second set of tags, stored
`centrally, for all caches in the system. A request for a cache
`line is forwarded to the central filter, rather than to all the
`caches. The tags for a coherency filter are typically stored in
`a small high-speed memory. Some coherency filters may
`only track owned lines, and may not be inclusive of all
`shared lines in the system. In a multiple bus system, coher-
`ency filter traffic may be on a separate bus.
`For relatively small systems, with one bus or with only a
`few buses, snoop-based protocols provide the best perfor-
`mance. However, snoop-based systems with one bus
`increase bus traffic, and for large systems with one bus or
`with only a few buses, snoop traffic can limit overall
`performance. Directory-based systems increase the time
`required to retrieve a line (latency) relative to snooping on
`a single bus, but
`in a multiple-bus system a directory
`requires less coherency traffic on the system buses than
`snoop-based systems. For large multiple-bus systems, where
`bus traffic may be more important than latency, directory-
`based systems typically provide the best overall perfor-
`mance. Many computer systems use some sort of hybrid of
`snoop-based and directory-based protocols. For example,
`for a multiple bus system, snoop-based protocols may be
`used for coherency on each local bus, and directory-based
`protocols may be used for coherency across buses.
`If a processor requests a line, the overall time required to
`retrieve the line (overall
`latency) includes (1) the time
`required to acquire access rights using a cache coherency
`protocol, (2) the time required to process an address, and (3)
`the time required to retrieve and transfer the data. As
`discussed above, bus traffic for coherency requests can limit
`overall performance.
`One way to decrease bus traffic for coherency requests is
`to increase the line size. For example, if contiguous lines are
`requested, each line requires a separate coherency request. If
`line size is doubled, twice as much data is read for each
`coherency request. In addition, a substantial part of overall
`latency is the time required to route a memory request to the
`various memory components and to get the data from those
`components. Larger
`lines provide more data for each
`request. However, as lines become even larger, much of the
`data transferred is not needed, and much of the cache space
`is filled with data that is not needed. This increases the bus
`traffic for data transfer, and increases the cache miss rate,
`both of which negatively impact overall performance. In
`addition, some fraction of a line may be needed exclusively
`by more than one processor or node. This can cause exces-
`sive cache-to-cache copy activity as the two processors or
`nodes fight for ownership, and the resulting number of
`coherency requests may increase.
`As an alternative, it is known to permit partial line (or
`partial block) invalidation. It is also known to prefetch extra
`sub-lines. For example, see C. K. Liu and T. C. King, A
`Performance Study on Bounteous Transfer in Multiproces-
`sor Sectored Caches, The Journal of Supercomputing, 11,
`405—420 (1997). Liu and King describe a coherence proto-
`col for invalidating sub-lines, and for prefetching of multiple
`sub-lines.
`
`There is an ongoing need to reduce overall latency while
`maintaining coherency, particularly for large multiple-bus
`systems.
`
`SUMMARY OF THE INVENTION
`
`A computer system retrieves and stores groups of lines.
`Coherency states are maintained for groups of lines and for
`
`7
`
`

`

`US 6,662,277 B2
`
`3
`individual lines. Alternatively, the coherency state of a group
`of lines can be deduced by the coherency state of all of its
`sublines. A single coherency transaction, and a single
`address transaction, can then result in the transfer of multiple
`lines of data, reducing overall latency. Even though lines
`may be retrieved as a group, the lines can subsequently be
`treated separately. This avoids many of the problems caused
`by long lines, such as increased cache-to-cache copy activ-
`ity. There may be multiple owners of lines within a group of
`lines. Special instructions may be implemented that request
`up to a group of lines. That is, depending on ownership, the
`instruction may result in only one line being transferred, or
`up to an entire group of lines being transferred. For multiple-
`bus systems, latency may be further reduced by preferably
`retrieving unowned lines from memory rather than from
`caches.
`
`10
`
`15
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a block diagram of an example computer system
`suitable for use with the invention.
`
`20
`
`FIG. 2 is a flow chart of an example method for main-
`taining coherence for a line request without a request for
`ownership.
`FIG. 3 is a flow chart of an example method for main-
`taining coherence for a line request with a request for
`ownership.
`FIG. 4 is a flow chart of an example alternative method for
`part of FIG. 3.
`FIG. 5 is a flow chart of an example alternative to the
`methods of FIGS. 2—4.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENT OF THE
`INVENTION
`
`FIG. 1 illustrates an example computer system suitable
`for use with the invention. In FIG. 1, multiple nodes (100,
`102, and 104) are interconnected through a switch 106 or
`other interconnect. Each node includes multiple processors
`(P). Each processor includes a first level cache (C1). Each
`node includes a shared second level cache (C2), and shared
`memory (M). Each shared memory includes a directory (D).
`The number of nodes, number of processors per node,
`number of caches,
`location of memory, and the use of
`directories, are all for purposes of illustration only.
`For a cache miss, assume that the time (latency) for a
`processor to retrieve a line from a first level cache from
`another processor within its own node (for example,
`the
`latency for processor 108 to retrieve a line from cache 110)
`is T nanoseconds. The time for the processor to retrieve a
`line from the memory of its own node (for example, the
`latency for processor 108 to retrieve a line from memory
`112) may be about 3T nanoseconds. The time for the
`processor to retrieve a line from the memory of a remote
`node (for example, the latency for processor 108 to retrieve
`a line from memory 114) may be about 6T nanoseconds. The
`time for the processor to retrieve a line from a first level
`cache in a remote node (for example, the latency for the
`processor to retrieve a line from cache 116) may be about 9T
`nanoseconds. Within a typical single-bus system (for
`example, bus 120), a cache-to-cache transfer is typically
`faster than a transfer from memory. In a typical multiple-bus
`system, as in FIG. 1, a worst-case transfer from memory (6T
`nanoseconds) is faster than a worst case cache-to-cache
`transfer (9T nanoseconds). Therefore, for a multiple-bus
`system, as compared to a single-bus system, a different
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`strategy is needed for retrieving lines in case of a cache miss
`in order to optimize overall memory latency.
`In a typical computer system, one memory address cor-
`responds to one line of memory. When an address is pre-
`sented to a memory system,
`the memory system may
`retrieve the requested data, possibly from multiple inter-
`leaved sources, and then place the resulting data into a
`memory register. To increase parallelism, it is known for
`memory systems (for example, memory 112) to actually
`retrieve multiple lines and place multiple lines into a
`memory register. For example,
`in one example commer-
`cially available computer system, a line is 128 bytes, and
`when one line is requested, the memory system retrieves 512
`bytes (4 lines) and places the lines in a register. The
`requesting processor only receives the one requested line.
`The remaining lines are available for reading with reduced
`latency, assuming that one of the other lines may be needed
`soon. Knowledge of this arrangement may be used to
`optimize compilers, which may then take advantage of the
`reduced latency for the other lines.
`It is also known for caches to have an associated prefetch
`buffer, so that in case of a cache miss, the requested line and
`N successive lines are retrieved from memory. Typically, the
`transfer of each of the N lines is the result of a separate bus
`transaction. The requesting cache only receives the one
`requested line, and the remaining lines are available for
`reading with reduced latency assuming that one of the other
`lines may be needed soon.
`In contrast to the above, in a computer system in accor-
`dance with the invention, a group of lines may be retrieved
`from memory, or copied from a cache, and the entire group
`of lines may be cached by the requester, with a single read
`instruction. In addition, cache coherency is maintained for
`groups of lines, in addition to individual lines. In particular,
`individual lines within a group of lines may have different
`owners. Several advantages result. First, consider the rela-
`tive latency times discussed above. If processor 108 requests
`a line that is in memory 114, much of the latency involves
`locating the line. If it is likely that adjacent lines will be
`needed (spatial locality), a system in accordance with the
`invention provides multiple lines for very little incremental
`latency. Individual ownership of lines within groups of lines
`reduces cache-to-cache copy activity. In addition, special
`memory instructions permit transfer of a variable number of
`lines, depending on ownership, further reducing cache-to-
`cache copy activity. In addition, as will be discussed in more
`detail below, a system in accordance with the invention will
`typically transfer a group of lines from memory rather than
`from a cache, thereby reducing the average latency, reducing
`local bus traffic, and reducing link (for example, 118) traffic,
`for a multiple bus system. For example, if a processor in
`node 100 requests a line that is memory 114 in node 104, and
`is also cached in a cache in node 102, the system will find
`the entry in the directory for node 104, and retrieve the lines
`directly from memory 114 rather than cause additional bus
`traffic on node 102. Accordingly, a system using groups of
`lines in accordance with the invention reduces latency and
`takes advantage of spatial
`locality even beyond that of
`sectored caches and prefetch buffers, while avoiding excess
`cache-to-cache transfers.
`
`Typical caches have a line size L, and the unit of coher-
`ence and unit of transfer is L. Sectored caches with sub-lines
`
`may have sub-lines (or sub-blocks) of size S, where S<L, the
`unit of coherence may be S, and the unit of transfer may be
`S or L. In a cache in accordance with the invention, the line
`size is L, the unit of transfer may be L or integer multiples
`of L, and the unit of coherence is L and integer multiples of
`
`8
`
`

`

`US 6,662,277 B2
`
`5
`L. Groups of lines may complement use of sub-lines. That
`is, sectored caches with sub-lines, and snooping, may be
`used within each node, and groups of lines, with a directory
`or coherence filter, may be used across multiple nodes.
`If a memory system retrieves N lines and places them into
`a register, it may be convenient to use the same N lines as
`a group of lines. However, it is not necessary for a group of
`lines to be the same size as a memory register. In some of
`the following discussion, a group of lines is assumed to be
`four lines for purposes of illustration only. Given an M-bit
`address, and N lines in a group of lines, a group of lines is
`defined by the most significant M minus log2(N) bits of the
`address. For example, assuming that a group of lines is four
`lines, and assuming for simplicity a 16-bit address, there is
`a four-line group of lines defined by the 14 most significant
`bits of the address in conjunction with all combinations of
`the two least significant bits of the address.
`It is known for a cache to be organized into sets, with
`some of the address bits used as an index to determine which
`set is to be used for the address. Within each set, other
`address bits, called a tag, are used to determine which line
`within a set corresponds to the address. With groups of lines
`in accordance with the invention, the index may correspond
`to a group of lines instead of a single line. For example,
`given an 1-bit index, the most significant I-2 bits of the index
`may be used as an index for a four-line group of lines. One
`tag may apply to the entire group of lines. Alternatively, each
`line preferably has a separate addressable entry, as deter-
`mined by the tag for the line. Each line preferably has its
`own coherency state, and the overall group may optionally
`have a separate coherency state for convenience. The sepa-
`rate coherency state for the overall group is discussed in
`more detail later.
`
`From the above discussion, given an address, N lines may
`be retrieved from memory, or from another cache, as a
`group, and N lines may be placed into a cache as a group.
`Once the group of lines is placed into the cache, the lines
`may be treated as a group, or as separate lines, as discussed
`below.
`
`Cache coherence protocols commonly place each cached
`line into one of multiple states. One common approach uses
`three possible states for each line in a cache. Before any lines
`are placed into the cache, all entries are at a default state
`called “Invalid”. When a previously uncached physical line
`is placed into the cache, the state of the entry in the cache
`is changed from Invalid to “Shared”. If a line is modified in
`a cache, it may also be immediately modified in memory
`(called write through). Alternatively, a cache may write a
`modified line to memory only when the modified line in the
`cache is invalidated or replaced (called write back). For a
`write-back cache, when a line in the cache is modified or will
`be modified, the state of the entry in the cache is changed to
`“Modified”. The three-state assignment just described is
`sometimes called a MSI protocol, referring to the first letter
`of each of the three states.
`A common variation adds one additional state. In the
`
`variation, when a physical line is copied into the cache, if no
`copy of the line exists in any other cache, the line is placed
`in an “Exclusive” state. The word “Exclusive” means that
`
`exactly one cache hierarchy has a copy of the line. If a line
`is in an “Exclusive” state in a cache hierarchy for a first
`processor, and if a second processor requests the same line,
`the line will then be copied into two cache hierarchies, and
`the state of the entry in each cache is set to “Shared”. This
`four-state assignment just described is sometimes called a
`MESI protocol, referring to the first letter of each of the four
`states. There are many other variations.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`A line in the Exclusive state may be modified at any time
`by its owner without further coherency operations. Once
`modified, the line is marked as Modified in the local cache.
`The Exclusive state allows a line to be owned (the only copy
`in the system), without being modified. This scheme reduces
`the number of extra coherence requests needed to modify a
`line.
`
`In the following discussion, the MESI protocol is used to
`illustrate a first example embodiment of the invention. The
`invention is equally applicable to other cache coherency
`systems and methods. In addition, some further assumptions
`are made to facilitate discussion of a first example
`embodiment, as follows. A line is owned in the MESI
`protocol if it is in the Modified or Exclusive state. For
`purposes of illustration only, if a line has an owner (state
`Modified or Exclusive), and the line is requested by a
`requestor that may modify the line, the most current copy of
`the line will be provided to the requester, and in addition will
`be written to memory. The previous owner will Invalidate its
`copy of the line, and the new owner will set the initial state
`of the line to Exclusive. If a line is not owned (not cached,
`or cached with state Shared), the requestor will set the initial
`state of the line to Shared.
`
`Additional assumptions may be useful for groups of lines.
`If a requested line is in a group of lines that has mixed
`ownership (that is, at least two lines in the group have
`different owners), then the requester preferably should get
`ownership of only the one requested line to avoid excessive
`cache-to-cache transfers. In contrast, if a requested line is in
`a group of lines that is entirely owned by a single owner,
`then it is likely that ownership of an entire data structure is
`changing, and the requester preferably should get ownership
`of the entire group of lines. These rules reduce latency for
`groups of lines by transferring a group of lines and owner-
`ship of the group if the lines all used to be owned by the
`same previous owner, while preventing increased cache-to
`cache-transfers by not combining lines that have a history of
`being owned by different owners at the same time.
`Still another optional feature of a system in accordance
`with the invention is to give the requestor control over
`whether multiple lines should be transferred when only one
`line is requested. Specifically, new memory system com-
`mands may be implemented that optionally limit a request to
`a single line, or optionally permit multiple lines to be
`transferred.
`
`Given the above assumptions and features, FIG. 2 illus-
`trates an example method for maintaining coherency for
`groups of lines for a line request in which ownership of the
`requested line is not requested. FIGS. 3 and 4 illustrate an
`example method for maintaining coherency for a group of
`lines for a line request in which ownership of the requested
`line is requested. In FIG. 1, and in the following discussion
`of FIGS. 2—4, a directory is assumed for purposes of an
`example system, but coherency filters or other coherency
`methods are equally suitable.
`In FIG. 2, a line is requested, but ownership of the line is
`not requested. That is, the requester needs to read the data,
`and does not need to modify the data. Coherency states for
`the corresponding group of lines are checked in the direc-
`tory. At step 200, if all lines in the corresponding group of
`lines are unowned (for MESI, no lines are in the Modified
`or Exclusive state in any cache),
`then at step 202 the
`requester gets a copy of the requested line, and copies of up
`to all of the other lines in the corresponding group of lines.
`The requestor will mark its copy of retrieved lines as Shared
`(both locally and within the appropriate directory), and the
`
`9
`
`

`

`US 6,662,277 B2
`
`7
`requestor will mark its entry locations for any non-retrieved
`lines as Invalid. At step 204, if some lines in the correspond-
`ing group of lines are owned, and the requested line is not
`owned, then at step 206 the requestor gets a copy of the
`requested line, and copies of up to all of the other unowned
`lines in the corresponding group of lines. The requester will
`mark its copy of retrieved lines as Shared (both locally and
`within the appropriate directory), and the requestor will
`mark its entry locations for any non-retrieved lines as
`Invalid.
`
`Note that at steps 202 and 206, the requested line, and
`perhaps other lines in a group of lines, are not owned.
`Therefore,
`the unowned lines may be retrieved from
`memory, and as discussed above, for a multiple-bus system
`the unowned lines at a remote node are preferably retrieved
`from memory. The system may use snooping within a node,
`with a resulting priority placed on retrieving lines from
`cache if available, and may use a directory or coherency
`filter across nodes, with a priority placed on retrieving lines
`from memory. As a result,
`latency per line is reduced
`because multiple lines are transferred with a single request,
`and for multiple-bus systems, latency is further reduced, and
`local bus traffic is reduced, because the lines are retrieved
`from memory instead of from caches.
`At step 208, the requested line is owned, and if ownership
`within the group of lines is mixed (that is, some lines in the
`corresponding group of lines are owned by different
`owners),
`then at step 210 the requester gets just
`the
`requested line from the owner. As discussed above,
`the
`requestor preferably should get ownership of only the one
`requested line to avoid excessive cache-to-cache transfers.
`The requestor will mark its copy of the retrieved line as
`Exclusive or Shared (both locally and within the appropriate
`directory), and the requester will mark its entry locations for
`the non-retrieved lines as Invalid. At step 212, all lines in the
`corresponding group of lines are owned by the same owner.
`The requester can get a copy of the requested line and copies
`of up to all the other lines in the group from the owner. As
`discussed above, it is likely that ownership of an entire data
`structure is changing, and the requester preferably should get
`a copy of the entire group of lines. The requester will mark
`its copy of retrieved lines as Exclusive or Shared (both
`locally and within the appropriate directory), and the
`requestor will mark its entry locations for any non-retrieved
`lines as Invalid.
`
`In FIG. 3, a line is requested, and ownership of the line is
`requested. At step 300, if all lines in the corresponding group
`of lines are unowned, then at step 302 the requestor can get
`a copy of the requested line, and up to all of the other lines
`in the corresponding group of lines. All existing copies of
`the requested lines, other than the requestor’s copy, will be
`marked as Invalid. The requestor will mark its copy of the
`requested line as Exclusive, and its copies of other retrieved
`lines as Shared (both locally and within the appropriate
`directory), and the requestor will mark its entry locations for
`any non-retrieved lines as Invalid. At step 304, if some lines
`in the corresponding group of lines are owned, and the
`requested line is not owned, then at step 306 the requestor
`gets a copy of the requested line, and copies of up to all of
`the other unowned lines in the corresponding group of lines.
`All existing copies of the requested lines, other than the
`requestor’s copy, will be marked as Invalid. The requestor
`will mark its copy of the requested line as Exclusive (both
`locally and within the appropriate directory), and the
`requestor will mark its copies of any other retrieved lines as
`Shared, and the requester will mark its entry locations for
`any non-retrieved lines as Invalid. As discussed above in
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`for a multiple-bus system,
`conjunction with FIG. 2,
`unowned lines are preferabl

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket