throbber
111111
`
`1111111111111111111111111111111111111111111111111111111111111
`US007698509Bl
`
`c12) United States Patent
`Koster et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,698,509 Bl
`Apr. 13, 2010
`
`(54) SNOOPING-BASED CACHE-COHERENCE
`FILTER FORA POINT-TO-POINT
`CONNECTED MULTIPROCESSING NODE
`
`(75)
`
`Inventors: Michael J. Koster, Freemont, CA (US);
`Christopher L. Johnson, San Francisco,
`CA (US); Brian W. O'Krafka, Austin,
`TX (US)
`
`(73) Assignee: Oracle America, Inc., Redwood Shores,
`CA (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 873 days.
`
`(21) Appl. No.: 10/889,952
`
`(22) Filed:
`
`Jul. 13, 2004
`
`(51)
`
`Int. Cl.
`G06F 12100
`(2006.01)
`(52) U.S. Cl. ....................... 7111146; 711/144; 711/145;
`711/128; 711/119; 711/E12.032; 711/E12.042
`(58) Field of Classification Search .................. 711/146
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`6,018,763 A *
`6,055,610 A *
`6,671,780 B1 *
`6,721,848 B2 *
`
`112000 Hughes eta!. .............. 709/213
`4/2000 Smith eta!. ................. 7111156
`12/2003 Lu et al ...................... 7111136
`4/2004 Gaither ....................... 7111122
`
`6,810,467 B1 *
`................ 7111146
`10/2004 Khare eta!.
`6,959,364 B2 *
`10/2005 Safraneket a!. ....... ...... 7111146
`2002/0087811 A1 *
`7/2002 Khare eta!.
`................ 7111146
`2002/0126704 A1 *
`.................. 370/503
`9/2002 Camet a!.
`2003/0009626 A1 *
`112003 Gruner eta!. ..... .. ... ..... 7111130
`2004/0068616 A1 *
`4/2004 Tierney eta!. .............. 7111141
`2004/02307 52 A1 *
`1112004 Blake eta!. ................. 7111147
`2005/0044195 A1 *
`............ .. ....... 709/223
`2/2005 Westfall
`2005/0228952 A1 *
`10/2005 Mayhew eta!. ............. 7111133
`FOREIGN PATENT DOCUMENTS
`889403 A2 * 111999
`EP
`* cited by examiner
`Primary Examiner-Reginald G Bragdon
`Assistant Examiner-Aracelis Ruiz
`(74) Attorney, Agent, or Firm-Osha • Liang LLP
`
`(57)
`
`ABSTRACT
`
`A multiprocessing node has a plurality of point-to-point con(cid:173)
`nected microprocessors. Each of the microprocessors is also
`point-to-point connected to a filter. In response to a local
`cache miss, a microprocessor issues a broadcast for the
`requested data to the filter. The filter, using memory that
`stores a copy of the tags of data stored in the local cache
`memories of each of the microprocessors, relays the broad(cid:173)
`cast to those/microprocessors having copies of the requested
`data. If the snoop filter memory indicates that none of the
`microprocessors have a copy of the requested data, the snoop
`filter may either (i) cancel the broadcast and issue a message
`back to the requesting microprocessor, or (ii) relay the broad(cid:173)
`cast to a connected multiprocessing node.
`
`15 Claims, 11 Drawing Sheets
`
`"up" adjacent
`node
`
`1~0
`
`"left'' adjacent
`node
`
`Microprocessor
`
`1~2
`
`1~6
`
`Microprocessor
`
`~
`
`~ 1~2
`
`,---..
`
`Microprocessor
`
`Snoop Filter
`I Shadow Tag I
`Memory
`1M
`
`i i
`
`1~4
`
`1 ~8 -
`
`Microprocessor
`
`" right" adjacent
`node
`
`"down" adJC!cent
`node
`
`Petition for Inter Partes Review of 
`U.S. Pat. No. 7,296,121
`IPR2015‐00158
`EXHIBIT
`Sony‐
`
`1
`
`

`
`U.S. Patent
`
`Apr. 13, 2010
`
`Sheet 1 of 11
`
`US 7,698,509 Bl
`
`I'-(X)
`..--
`
`Q)
`N.C
`-~~
`(.)
`,.
`
`7
`
`""
`
`Q)
`
`..---£
`-l<O
`(.)
`
`IDGJ
`
`..--
`
`..__.
`
`~J
`...--
`
`>-
`c 0
`·n; E
`~Q)
`~
`,.
`
`.<
`
`7
`
`"'
`
`.._
`-
`Q)
`0
`~
`c
`0
`(.)
`Q)
`.c
`(.)
`<0
`(.)
`
`~J
`N
`
`.._
`0
`fJ)
`fJ)
`Q)
`(.)
`.._
`0
`0..
`.._
`0
`(.)
`
`~
`
`. /
`
`/ 0
`
`2
`
`

`
`U.S. Patent
`
`Apr. 13, 2010
`
`Sheet 2 of 11
`
`US 7,698,509 Bl
`
`(0
`("()
`
`'-
`...........
`())
`c
`:::l
`()) a.
`u E
`0 u
`
`co
`("()
`
`'-
`....... _.
`())
`c
`:::l
`()) a.
`u E
`0 u
`
`0
`'V
`
`'-
`())
`..............
`c
`:J
`()) a.
`0 E
`0
`u
`
`N
`'V
`
`'-
`_. .......
`())
`c
`:J
`()) a.
`0 E
`0
`u
`
`/
`
`0
`("()
`
`N
`("()
`
`'-
`
`())
`
`~ '-
`CD en
`CC>
`0 c
`<i:: ·-
`'
`en
`-o!IJ
`c
`())
`<U 0
`.... o
`cno_
`=
`:::l
`:iE
`
`3
`
`

`
`U.S. Patent
`
`Apr. 13, 2010
`
`Sheet 3 of 11
`
`US 7,698,509 Bl
`
`Q)
`0
`~ .....
`
`Q) -
`c --:::::J
`g.
`:::::J g
`:::::J a.
`c
`-
`
`-\
`v
`
`~J
`<0
`
`~._/
`
`gJ
`
`-\
`v
`
`>-.....
`0
`E
`Q)
`~
`c
`ro
`~
`
`._/
`
`<0
`._/
`1.0
`
`._/
`
`J
`
`.....
`0
`(/)
`(/)
`Q)
`0
`0
`.....
`a.
`0 .....
`0
`~
`
`.....
`0
`(/)
`(/)
`Q)
`0
`0 .....
`a.
`0 .....
`0
`~
`
`.....
`0
`(/)
`(/)
`Q)
`0
`0
`.....
`a.
`0 .....
`0
`~
`
`.....
`0
`(/)
`(/)
`Q)
`0
`0 .....
`a.
`0 .....
`0
`~
`
`Vt
`I\
`
`Vt
`l'v
`
`Vt
`l'v
`
`If
`l'v
`
`/
`
`0
`1.0
`
`4
`
`

`
`70
`
`84
`'
`
`Microprocessor
`
`Microprocessor
`
`~
`00
`•
`~
`~
`~
`
`~ = ~
`
`-
`72
`
`Cache
`Controller
`
`t \_..Cache-Me~ory
`
`Transaction
`
`t"'--i
`
`Local
`Cache
`
`---
`I./ Cache
`Controller
`
`1ooJi
`
`I
`
`80
`
`Local
`Cache
`
`~
`
`l
`
`82
`\
`
`Memory
`
`I
`
`FIG. 4
`
`>
`'e :-: ....
`0 ....
`
`~(H
`N
`
`0
`
`rFJ =-('D
`.....
`
`('D
`
`.j;o.
`
`0 .....
`....
`....
`
`d
`rJl
`-....l
`0..,
`\C
`
`00 u. = \C = """"'
`
`5
`
`

`
`94
`
`96
`
`Microprocessor
`
`Microprocessor
`
`98
`
`Cache I
`
`Controller ~ ~ Local
`Cache
`
`Cache
`
`I
`Controller H Local
`
`'100
`~
`
`Cache
`
`Network
`104
`
`)
`
`108
`
`90
`
`/
`
`Directory
`
`Memory
`
`""'-
`
`~
`
`~
`
`I
`
`Memory
`
`92
`
`l.l Dire:tory 1
`
`FIG. 5
`
`~
`00
`•
`~
`~
`~
`
`~ = ~
`
`>
`'e :-:
`......
`(,H
`
`~
`
`N
`0 ......
`0
`
`('D
`
`rFJ =-('D
`......
`Ul
`0 ......
`......
`......
`
`d
`rJl
`-....l
`0..,
`\C
`
`00 u. = \C = """"'
`
`6
`
`

`
`110
`
`1\2 •
`~ 4- Multiprocessing
`
`Node
`
`1\4 •
`
`Multiprocessing
`Node
`
`1\6 •
`
`Multiprocessing
`Node
`
`1\8 •
`
`Multiprocessing
`Node
`
`~
`
`~
`00
`•
`~
`~
`~
`
`~ = ~
`
`1 ~0
`
`1 ~2
`
`1~4
`
`1~6
`
`_. Multiprocessing
`Node
`
`Multiprocessing
`Node
`
`Multiprocessing
`Node
`
`Multiprocessing
`Node
`
`~
`
`1~8
`
`1~0
`
`1~2
`
`1~4
`
`I
`I
`
`_. Multiprocessing
`Node
`
`Multiprocessing
`Node
`
`Multiprocessing
`Node
`
`Multiprocessing
`Node
`
`..__
`
`116
`
`1~8
`
`1~0
`
`1~2
`
`~ Multiprocessing
`Node
`
`Multiprocessing
`Node
`
`Multiprocessing
`Node
`
`Multiprocessing
`Node
`
`..__
`
`t
`
`t
`
`FIG. 6
`
`t
`
`t
`
`(.H
`
`~
`
`N
`
`>
`'e :-: ....
`0 ....
`
`0
`
`('D
`('D
`
`rFJ =(cid:173)
`.....
`0\
`0 .....
`....
`....
`
`d
`rJl
`-....l
`0..,
`\C
`
`00 u. = \C = """"'
`
`7
`
`

`
`"up" adjacent
`node
`I'
`
`1 ~0
`
`~
`00
`•
`~
`~
`~
`
`~ = ~
`
`"left" adjacent
`node
`
`...
`
`Microprocessor
`
`I'
`
`1£2
`
`1~6
`
`Microprocessor
`
`- Microprocessor
`_____.
`
`I'
`
`1£4
`
`1~8
`
`Microprocessor
`
`.
`
`I
`
`'right" adjacent
`node
`
`~
`
`~ 1 ~2
`
`Snoop Filter
`
`Shadow Tag
`Memory
`164
`
`i i
`
`"down" adjacent
`node
`
`FIG. 7
`
`>
`'e :-: ....
`0 ....
`
`(.H
`
`~
`
`N
`
`0
`
`('D
`('D
`
`rFJ =(cid:173)
`.....
`-....l
`0 .....
`....
`....
`
`d
`rJl
`-....l
`0..,
`\C
`
`00 u. = \C = """"'
`
`8
`
`

`
`U.S. Patent
`
`Apr. 13, 2010
`
`Sheet 8 of 11
`
`US 7,698,509 Bl
`
`'-
`0
`U)
`U)
`(J)
`0
`0
`'-
`a.
`0
`'-
`0
`~
`
`J • I
`
`r-v;!
`
`~J
`
`~ ~
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`<CO
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`,
`
`•
`
`I
`
`'-
`0
`U)
`U)
`Q)
`0
`0
`'-
`a.
`0
`'-
`0
`~
`
`~-o---
`re-m---
`1---<( - ·
`
`-
`
`..
`...
`.....
`f'.-~ ~J
`
`'-
`0
`U)
`U)
`Q)
`0
`0
`'-
`a.
`0
`'-
`0
`~
`
`I • J
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`Cl <(
`'
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`• I
`
`I
`
`'-
`0
`U)
`U)
`(J)
`0
`0
`'-
`a.
`0
`'-
`0
`~
`
`9
`
`

`
`U.S. Patent
`
`Apr. 13, 2010
`
`Sheet 9 of 11
`
`US 7,698,509 Bl
`
`'-
`0
`U)
`U)
`Q)
`0
`0
`'-
`a.
`0
`'-
`0
`~
`•
`
`~
`
`~../
`
`T"""
`
`~
`
`r - - - - -
`I .---+
`I I
`"'
`I I
`-
`all
`I<{
`I I
`I I
`I I
`I I
`I I
`I I
`
`,, +: r
`
`'-
`0
`U)
`U)
`Q)
`0
`0
`'-
`a.
`0
`'-
`0
`~
`
`..
`
`-
`--
`"~ ~../
`
`T"""
`
`T"""
`
`'-
`0
`U)
`U)
`Q)
`0
`0
`'-
`a.
`0
`'-
`0
`~
`
`~~. : ~
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`~
`I
`<{I
`1m
`'-
`2
`I
`I
`- ~~ ~
`·-
`I
`I
`I
`__ __ J
`lL
`s: 0 ~~
`a. oEm ..-----~
`0
`'tJQ)T""
`<O::;E
`0
`..c:
`c:
`(/)
`(/)
`
`0)
`
`-
`
`T"""
`
`"~ ~./
`..
`
`T"""
`
`-
`
`r
`
`r
`
`'-
`0
`U)
`U)
`Q)
`0
`0
`'-
`a.
`0
`'-
`0
`~
`
`10
`
`

`
`U.S. Patent
`
`Apr. 13, 2010
`
`Sheet 10 of 11
`
`US 7,698,509 Bl
`
`~
`
`0
`(/)
`(/)
`Q)
`u
`0
`~ c..
`0
`\... u
`~
`
`J
`
`0
`0
`N
`
`..
`
`..
`
`""'"
`'-0
`N
`
`~-/
`N
`
`0)
`
`~
`Q)
`
`.. ~~
`-·-u.
`3t 0 VI
`c.. oE..-
`ua>N
`0
`0
`~:2:
`c:
`en
`en
`
`~J
`N
`
`p - - -
`I r-~
`I I
`Cll.
`I~
`I I
`I I
`I I
`,, t I ,,
`
`N
`'-O
`N
`
`..
`
`~-/
`N
`..
`
`~
`
`0
`(/)
`rJ)
`Q)
`u
`0 .....
`c..
`0
`\... u
`~
`
`-
`-
`
`~
`
`0
`(/)
`(/)
`Q)
`0
`0
`\... c..
`0
`~ 0
`:2
`
`~
`
`.....
`0
`(/)
`rJ)
`Q)
`0
`0
`~ c..
`0
`~
`(.)
`~
`
`.h
`
`,
`
`.
`~
`Ll..:
`
`11
`
`

`
`U.S. Patent
`
`Apr. 13, 2010
`
`Sheet 11 of 11
`
`US 7,698,509 Bl
`
`Microprocessor
`
`..---------8----,
`-------A----, I
`I
`I
`t: 2F
`I I~
`
`202
`
`2~6
`
`Snoop Filter
`
`Microprocessor
`
`Microprocessor
`
`i : I i I
`
`I
`I
`I
`I
`
`I
`I
`
`I
`I
`I
`I
`I
`8 A
`I
`I
`I
`I
`I
`I
`
`+ I
`
`I
`I
`
`,
`
`~+
`
`:
`
`I
`
`I
`I
`
`200 '
`
`Microprocessor
`
`204
`
`~
`
`2~8
`
`Microprocessor
`
`.. Microprocessor
`
`..........
`220
`
`2£2
`
`2~6
`
`Snoop Filter
`
`......_232
`
`2£4
`
`2~8
`
`• i i
`
`------8----'
`Microprocessor ...
`I
`I
`..,. ___ -----A---~
`-
`
`~ Microprocessor
`
`FIG. 11
`
`12
`
`

`
`US 7,698,509 Bl
`
`1
`SNOOPING-BASED CACHE-COHERENCE
`FILTER FOR A POINT-TO-POINT
`CONNECTED MULTIPROCESSING NODE
`
`BACKGROUND OF INVENTION
`
`As shown in FIG.1, a typical computer system 10 includes
`at least a microprocessor 12 and a main memory 14. The main
`memory 14 contains data for use by the microprocessor 12 to
`perform the operations of the computer system 10. However,
`because the speed of the microprocessor 12 is typically sig(cid:173)
`nificantly faster than that of the main memory 14, memory of
`smaller size and faster speed (referred to and known as
`"cache" memory) is often implemented to allow the micro(cid:173)
`processor 12 to access frequently and/or recently requested
`data faster than it would otherwise take to obtain such data
`from the main memory 14.
`Still referring to FIG. 1, the microprocessor 12 has an
`"on-chip" (i.e., on the same semiconductor die as the micro(cid:173)
`processor 12), or "L1," cache memory 16 and an "off-chip,"
`or "L2," cache memory 18. When the microprocessor 12
`requests data, a cache controller 20 causes the L1 cache
`memory 16 to be searched for the requested data, and if that
`search does not "hit" (i.e., a cache "miss" occurs), the L2
`cache memory 18 is searched for the requested data. If the 25
`requested data is not found in the cache memories 16, 18, the
`requested data is retrieved from the relatively slow main
`memory 14.
`Those skilled in the art will recognize that a microproces(cid:173)
`sor may have any number of cache memory levels, which are 30
`typically referred to by number in order of decreasing prox(cid:173)
`imity to the microprocessor. Further, those skilled in the art
`will recognize that any number of cache memories may be
`on-chip and any number of cache memories may be off-chip.
`A computer system, like the one shown in FIG. 1, may be 35
`used as a system that services requests from and provides data
`to other computers connected over a network. Such a client(cid:173)
`server network model 30 is shown in FIG. 2. In FIG. 2, a
`stand-alone server 32 is connected over a network 34 to
`several client computers 36, 38, 40, 42. The server 32 may be
`used to store data, programs, etc. for use by the client com(cid:173)
`puters 36, 38, 40, 42. Those skilled in the art will recognize
`that the server 32 may also be used to manage and control the
`client computers 36, 38, 40, 42.
`Although some computer systems, like the one shown in
`FIG. 1, have a single microprocessor 12 (such a computer
`system referred to and known as a "uniprocessor" computer
`system), other computer systems, like the server 32 shown in
`FIG. 2, may be formed of multiple microprocessors. FIG. 3
`shows such a multiprocessing computer system 50.
`The computer system 50 of FIG. 3 is shown as having
`multiple microprocessors 52, 54, 56, 58. The microproces(cid:173)
`sors 52, 54, 56, 58 communicate with one another and with a
`main memory 60 over a network (e.g., a bus) 62. The network
`62 is implemented as a set of bits that propagate data in
`parallel from one location to another. The "bandwidth" of the
`network 62 (i.e., the number of bits propagated in parallel by
`the network 62) is an important factor in the overall perfor(cid:173)
`mance of the computer system 50. FIG. 3 also shows an
`input/output interface 64 that is connected to the network 62
`and serves to input and output data to other portions of the
`computer system 50 and/or components external to the com(cid:173)
`puter system 50.
`Those skilled in the art will recognize that the multipro(cid:173)
`cessing computer system 50 of FIG. 3 may represent a par(cid:173)
`ticular type of multiprocessing computer system used in net(cid:173)
`working and known and referred to as a symmetric
`
`2
`multiprocessing (SMP) computer system. A SMP computer
`system is one in which multiple microprocessors share, for
`example, the same memory and input/output interface. Those
`skilled in the art will also recognize that a SMP computer
`system may share the same operating system. Although the
`multiple microprocessors in a SMP computer system share
`the same resources, each microprocessor may act indepen(cid:173)
`dently. For example, while one microprocessor searches for
`data in memory, other microprocessors may update the
`10 memory and perform other tasks, thereby increasing the abil(cid:173)
`ity of the SMP computer system to handle intensive network(cid:173)
`ing demands.
`Those skilled in the art will recognize that SMP computer
`systems provide good scalability in that additional micropro-
`15 cessors may be added or removed with minimal changes to
`the system. Despite the benefits of SMP computer systems,
`bottlenecks may occur when several microprocessors on a
`board share a single memory bus. Rather than put too many
`microprocessors on the same SMP board, designers of net-
`20 work elements often distribute applications across a net(cid:173)
`worked cluster of SMP boards, where each board has its own
`memory, I/0 interface, and operating system.
`
`SUMMARY OF INVENTION
`
`According to one aspect of one or more embodiments of
`the present invention, a computer system comprises: a first
`integrated circuit having a local cache memory; a second
`integrated circuit connected to the first integrated circuit; and
`a filter connected by point-to-point interconnect to the first
`integrated circuit and the second integrated circuit, where, in
`response to a miss for requested data in the local cache
`memory, a broadcast for the requested data is propagated to
`the snoop filter, and where the snoop filter is configured to
`relay the broadcast to the second microprocessor dependent
`on whether the second integrated circuit has a copy of the
`requested data.
`According to another aspect of one or more embodiments
`of the present invention, a method of performing computer
`40 system operations comprises: requesting data from a cache
`memory of a first integrated circuit; if the requested data is not
`found in the cache memory, issuing a broadcast for the
`requested data to a filter point-to-point connected to the first
`integrated circuit; and if an address of the requested data is
`45 found in the filter, relaying the broadcast to a second inte(cid:173)
`grated circuit associated with the address.
`According to another aspect of one or more embodiments
`of the present invention, a computer network comprises: a
`cluster of individual SMP computer systems that are connect-
`50 able using point-to-point interconnect, at least one of the
`individual SMP computer systems having a filter, where, in
`response to a cache miss for requested data in an integrated
`circuit of the at least one of the individual SMP computer
`systems, the integrated circuit is configured to issue a broad-
`55 cast for the requested data to the filter, and where the filter is
`configured to relay the broadcast to another integrated circuit
`in the at least one of the individual SMP computer systems if
`the another integrated circuit has a copy of the requested data
`According to another aspect of one or more embodiments
`60 of the present invention, a computer system comprises a
`plurality of integrated circuits, a filter point-to-point con(cid:173)
`nected to the plurality of integrated circuits and having a
`memory that stores addresses of data stored in cache memo(cid:173)
`ries of the plurality of integrated circuits, and memory com-
`65 prising instructions to: requesting data from a cache memory
`of one of plurality of integrated circuits; if the requested data
`is not found in the cache memory, issuing a broadcast for the
`
`13
`
`

`
`US 7,698,509 Bl
`
`3
`requested data to the filter; and if an address of the requested
`data is found in the filter, relaying the broadcast to another one
`of the plurality of integrated circuits, the another one of the
`plurality of integrated circuits being associated with the
`address.
`Other aspects and advantages of the invention will be
`apparent from the following description and the appended
`claims.
`
`BRIEF DESCRIPTION OF DRAWINGS
`
`FIG. 1 shows a typical computer system.
`FIG. 2 shows a typical computer network.
`FIG. 3 shows a typical multiprocessor computer system.
`FIG. 4 shows a snooping cache-coherent multiprocessor
`computer system.
`FIG. 5 shows a directory-based cache-coherent multipro(cid:173)
`cessor computer system.
`FIG. 6 shows a computer system in accordance with an
`embodiment of the present invention.
`FIG. 7 shows a portion of a computer system in accordance
`with an embodiment of the present invention.
`FIG. 8 shows a flow of messages in a portion of a typical
`computer system.
`FIG. 9 shows an exemplary flow of messages in a portion of
`a computer system in accordance with an embodiment of the
`present invention.
`FIG.10 shows an exemplary flow of messages in a portion
`of a computer system in accordance with an embodiment of
`the present invention.
`FIG. 11 shows an exemplary flow of messages in a portion
`of a computer system in accordance with an embodiment of
`the present invention.
`
`DETAILED DESCRIPTION
`
`In a SMP computer system, such as that shown in FIG. 3,
`each microprocessor has its own cache memory (see discus(cid:173)
`sion of cache memories above with reference to FIG. 1).
`Thus, because a particular cache memory data item acted
`upon by one microprocessor may cause the copy of that data
`item to differ from other copies of that data item stored in the
`cache memories of the various microprocessors in a SMP
`computer system, "cache-coherency" techniques are imple(cid:173)
`mented to ensure that the local cache memory of each micro(cid:173)
`processor is consistent with respect to values that are stored in
`the cache memories of other microprocessors in the SMP
`computer system.
`Cache-coherence problems arise in SMP computer sys(cid:173)
`tems when more than one microprocessor cache memory
`holds a copy of a data item. One type of cache-coherency
`technique known and referred to as a "snooping" relies on all
`cache memories to monitor a common network (e.g., a bus)
`that connects microprocessors to memory. In other words, a
`snooping-based cache-coherency technique depends on the
`ability of cache memories to observe every transaction on a
`network (e.g., a bus) common to the cache memories.
`Now referring to FIG. 4, when microprocessor 70 requests
`data, a cache controller 72 local to microprocessor 70
`searches for the requested data in a cache memory 7 4 local to
`microprocessor 70. If the requested data is not found in the
`local cache memory 74, the cache controller 72 broadcasts a
`data request on a network (e.g., a bus) 76 connected to other
`cache controllers (e.g., cache controller 78) (others not
`shown). The cache controllers (e.g., cache controller 78)
`"snoop" on the network 76 to monitor all transactions on the
`network 76. If a particular cache memory (e.g., cache
`
`4
`memory 80 associated with microprocessor 84) has the data
`requested by the requesting cache controller 72, the cache
`controller (e.g., cache controller 78) associated with the cache
`memory (e.g., cache memory 80) having the requested data
`forwards (i.e., returns) the requested data to the requesting
`cache controller 72, which, in turn, updates its associated
`cache memory 74 with the returned requested data and pro(cid:173)
`vides the returned requested data to requesting microproces(cid:173)
`sor 70. Alternatively, if the requested data is not held in any of
`10 the cache memories 74, 80, a copy of the requested data in the
`main memory 82 is returned to and used by the requesting
`microprocessor 70.
`Further, a cache controller, connected to the network 76
`that observes data being written from one cache memory to
`15 another may invalidate or update its own copy of that data.
`The next time the cache controller's microprocessor requests
`that data, the most recent value of the data is provided to the
`microprocessor, either because its local cache memory has
`the most recent value of the data or through obtaining that
`20 data by generating a data request on the network 76.
`Those skilled in the art will recognize that although a
`snooping-based cache-coherency technique obtains data rela(cid:173)
`tively quickly (i.e., has relatively low latency), such a tech(cid:173)
`nique consumes relatively high bandwidth due to the parallel
`25 broadcast nature of its requests. As a result, snooping-based
`cache-coherency techniques are typically limited to small(cid:173)
`scale systems.
`Now referring to FIG. 5, in another type of cache-coher(cid:173)
`ency technique known and referred to as "directory-based
`30 cache-coherence," when a cache miss occurs in a local cache
`memory (e.g., local cache memory 98 or 100) of a micropro(cid:173)
`cessor (e.g., microprocessor 94 or 96), a cache controller
`(e.g., cache controller 102 or 106) issues a data request over a
`network 104 to a "home" directory (e.g., directory 90 or 92)
`35 of the requested data, the "home" directory typically being
`associated with the "home" memory (e.g., memory 108 or
`109) of the requested data. The "home" directory may indi(cid:173)
`cate to the cache controller a location of the requested data.
`Alternatively, if the "home" directory indicates that no other
`40 cache memories connected to the network 104 have the
`requested data, the requested data may be returned by the
`"home" memory of the requested data.
`One advantage of directory-based cache-coherency tech(cid:173)
`niques with respect to snooping-based cache-coherency tech-
`45 niques is that they keep track of which microprocessor nodes
`have copies of particular data, thereby eliminating the need
`for a high-bandwidth data request broadcast. This is valuable
`on read misses because a data request is subsequently satis(cid:173)
`fied either by the directory indicating the location of a copy of
`50 the requested data or by accessing the main memory.
`Further, because directory-based cache-coherent tech(cid:173)
`niques may rely on low-bandwidth interconnect rather than
`on high-bandwidth networks (e.g., buses) that are necessary
`for broadcasting in snooping-based cache-coherency tech-
`55 niques, directory-based cache-coherent SMP computer sys(cid:173)
`tems may be scalable to a large number of microprocessors.
`However, the indirection overheads associated with directory
`queries make directory-based cache-coherency techniques
`slower (i.e., have higher latency) than snooping-based cache-
`60 coherency techniques (e.g., a directory-based cache-coher(cid:173)
`ence technique may often require three times the number of
`"hops" otherwise taken in a snooping-based cache-coherence
`technique).
`For example, in a snooping-based cache-coherency tech-
`65 nique, upon a cache miss, one set of parallel messages is
`broadcast over a bus and one response message with the
`requested data is sent back to the requesting processing node.
`
`14
`
`

`
`US 7,698,509 Bl
`
`5
`On the other hand, in a directory-based cache-coherent tech(cid:173)
`nique, upon a cache miss, a data request message is sent to the
`home processing node, the home processing node forwards
`the data request message to the owning cache memory, and
`the owning cache memory returns the requested data to the
`requesting processing node. Thus, generally, in snooping(cid:173)
`based cache-coherency techniques, there are more messages
`in parallel (relatively low average latency), while in directory(cid:173)
`based cache-coherency techniques, there are more messages
`in series (relatively high average latency).
`Often, several small SMP servers (e.g., a near-commodity
`modular shelf server) are connected together to provide
`increased processing capabilities. Due to the limited band(cid:173)
`width of the cables connecting the servers, directory-based
`cache-coherency techniques are required to ensure cache- 15
`coherence among the servers. However, as discussed above,
`directory-based cache-coherency techniques have relatively
`high average latency compared to snooping-based cache(cid:173)
`coherency techniques.
`Embodiments of the present invention relate to a technique 20
`for implementing a snooping-based cache-coherence filter
`device (also referred to herein as "snoop filter") in a point-to(cid:173)
`point connected multiprocessing node. FIG. 6 shows an
`exemplary computer system 110 in accordance with an
`embodiment of the present invention. In FIG. 6, a plurality of 25
`multiprocessing nodes 112, 114, 116, 118, 120, 122, 124,
`126, 128, 130, 132, 134, 136, 138, 140, 142 are point-to-point
`connected using high -bandwidth interconnect (shown but not
`labeled). Particularly, each multiprocessing node 112, 114,
`116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 30
`140, 142 is connected to an adjacent multiprocessing node (in
`FIG. 6, each peripheral multiprocessing node is shown as
`being connected to the opposite peripheral multiprocessing
`node, e.g., multiprocessing node 112 is connected to multi(cid:173)
`processing node 118). In one or more other embodiments of 35
`the present invention, a multiprocessing node may be con(cid:173)
`nected to a non-adjacent multiprocessing node. Further, a
`multiprocessing node is not limited to a server and may be any
`type of computer system.
`In one or more embodiments of the present invention, 40
`high-bandwidth interconnect for point-to-point connected
`multiprocessing nodes may be implemented using intercon(cid:173)
`nect technologies such as, for example, Infiniband or PCI
`Express. In one or more other embodiments of the present
`invention, high-bandwidth interconnect used to point-to- 45
`point connect multiprocessing nodes may have a bandwidth
`greater than that of 16-bit 1 GHz interconnect.
`Further, in one or more embodiments of the present inven(cid:173)
`tion, point-to-point interconnect may be used in cabling a
`plurality of multiprocessing nodes (e.g., near-commodity 50
`shelf servers) together. Moreover, in one or more embodi(cid:173)
`ments of the present invention, point-to-point interconnect
`may be used to connect a plurality of multiprocessing nodes
`to a passive backplane.
`FIG. 7 shows an exemplary multiprocessing node 150 in 55
`accordance with an embodiment of the present invention. In
`FIG. 7, the multiprocessing node 150 has four microproces(cid:173)
`sors 152, 154, 156, 158. However, in one or more other
`embodiments of the present invention, a multiprocessing
`node may have any number of microprocessors. In FIG. 7, 60
`each of the microprocessors 152, 154, 156, 158 is connected
`to a snoop filter 162 via high-bandwidth point-to-point inter(cid:173)
`connect. The snoop filter 162 may be connected over high(cid:173)
`bandwidth point-to-point interconnect to one or more snoop
`filters (not shown) of connected multiprocessing nodes (not 65
`shown). Further, each microprocessor 152, 154, 156, 158 may
`be connected, using point-to-point interconnect, to every
`
`6
`other microprocessor 152, 154, 156, 158 in the multiprocess(cid:173)
`ing node 150 for, for example, chip-to-chip communication.
`Further, although the snoop filter 162 in FIG. 7 is shown as
`being connected to adjacent nodes, in one or more other
`embodiments of the present invention, a snoop filter may be
`connected to one or more non-adjacent nodes.
`The snoop filter 162 observes snooping-based cache-co(cid:173)
`herence broadcasts for requested data and the responses
`thereto. At least partly in order to determine whether to for-
`10 ward or cancel snooping-based cache-coherence broadcasts,
`the snoop filter 162 has local state memory (referred to and
`shown in FIG. 7 as "shadow tag memory") 164 that stores the
`tags of data stored in the local cache memories (e.g., "L2"
`cache memories) (not shown) of each of the microprocessors
`152, 154, 156, 158. In other words, the shadow tag memory
`164 has a copy of the tags of data stored in the local cache
`memories of the microprocessors 152, 154, 156, 158.
`By having the shadow tag memory 164, the snoop filter 162
`forwards a received broadcast for requested data (by one of
`the microprocessors 152, 154, 156, 158 or from another mul(cid:173)
`tiprocessing node (not shown)) to a particular one of the
`microprocessors 152, 154, 156, 158 only if its shadow tag
`memory 164 indicates that the particular microprocessor has
`a copy of the requested data. Otherwise, if the snoop filter 162
`determines that none of the microprocessors 152, 154, 156,
`158 has a copy of the requested data, the snoop filter 162 is
`configured to cancel any subsequent relays of the broadcast to
`the microprocessors 152, 154, 156, 158, and instead, sends a
`message back to the requesting microprocessor (or multipro(cid:173)
`cessing node (not shown)) indicating that none of the other
`microprocessors (or none of the microprocessors) in the mul-
`tiprocessing node 150 has a copy of the requested data.
`In one or more embodiments of the present invention, a
`shadow tag memory may be optimistically maintained as a
`set -associative cache. Further, in one or more embodiments of
`the present invention, the set-associative cache may use a
`MOESI (Modified Owner Exclusive Shared Invalid) cache(cid:173)
`coherency protocol.
`At least partly to illustrate the difference between an imple(cid:173)
`mentation of a multiprocessing node that uses a snoop filter in
`accordance with one or more embodiments of the present
`invention and one that does not, FIG. 8 shows an exemplary
`flow of messages in a multiprocessing node 170 (having
`microprocessors 172, 174, 176, 178) that does not have a
`snoop filter. In FIG. 8, microprocessor 172 ("the requesting
`microprocessor") issues a broadcast for requested data. The
`broadcast is initially forwarded as request A to microproces(cid:173)
`sors 174, 176 and is then relayed to microprocessor 178 from
`microprocessor 176. In response to request A, microproces(cid:173)
`sors 174, 176, 178 send their responses B, C, D, respectively,
`back to the requesting microprocessor 172.
`Those skilled in the art will recognize that because in the
`case shown in FIG. 8, if only microprocessor 174 has a copy
`of the requested data, the bandwidth of the multiprocessing
`node 170 used for responses A and Dis unnecessarily con(cid:173)
`sumed.
`However, as discussed above with reference to FIG. 7, by
`using a snoop filter in accordance with one or more embodi(cid:173)
`ments of the present invention, requests for data are sent only
`to those microprocessors having copies of the requested data.
`For example, FIG. 9 shows an exemplary flow of messages in
`a multiprocessing node 180 in accordance with an embodi(cid:173)
`ment of the present invention.
`The multiprocessing node 180 is shown as having a snoop
`filter 192 that is connected via high-bandwidth interconnect
`(shown, but not labeled) to microprocessors 182, 184, 186,
`188. In FIG. 9, microprocessor 182 ("the requesting micro-
`
`15
`
`

`
`US 7,698,509 Bl
`
`7
`processor") issues a broadcast A for requested data. Due to
`the presence of the snoop filter 192, broadcast A is first routed
`to the snoop filter 192. Referencing its shadow tag memory
`194, the snoop filter 192 determines whether any of micro(cid:173)
`processors 184, 186, 188 have a copy of the data requested in
`broadcast A. In the exemplary case shown in FIG. 9, the snoop
`filter 192, referencing its shadow tag memory 194, deter(cid:173)
`mines that microprocessor 188 has a copy of the data
`requested in broadcast A, and thus, the snoop filter 192 for(cid:173)
`wards broadcast A to microprocessor 188. In response to
`broadcast A being forwarded to microprocessor 188, micro(cid:173)
`processor 188 issues a response B (having a copy of the
`requested data) that is forwarded back to the requesting
`microprocessor 182 through the snoop filter 192.
`By forwarding response B through the snoop filter 192, the
`snoop filter 192 is able to update its shadow tag memory 194.

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket