throbber
111111
`
`1111111111111111111111111111111111111111111111111111111111111
`US007003633B2
`
`(12) United States Patent
`Glasco
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,003,633 B2
`*Feb.21,2006
`
`(54) METHODS AND APPARATUS FOR
`MANAGING PROBE REQUESTS
`
`(75)
`
`Inventor: David B. Glasco, Austin, TX (US)
`
`(73) Assignee: Newisys, Inc., Austin, TX (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 205 days.
`
`This patent is subject to a terminal dis(cid:173)
`claimer.
`
`6,085,295 A
`6,108,737 A
`6,122,715 A
`6,148,378 A
`6,167,492 A
`6,173,393 B1
`6,189,078 B1
`6,192,451 B1
`6,205,520 B1
`6,209,055 B1
`6,292,705 B1
`6,292,906 B1
`6,330,643 B1
`6,334,172 B1
`
`7/2000 Ekanadham et a!.
`8/2000 Sharma eta!.
`9/2000 Palanca et a!.
`11/2000 Bordaz eta!. .............. 711!147
`12/2000 Keller et a!. ................ 711!154
`1!2001 Palanca et a!.
`2/2001 Bauman et a!. ............. 711!156
`2/2001 Arimilli eta!. ............. 711!141
`3/2001 Palanca et a!.
`3/2001 Van Doren et a!.
`9/2001 Wang eta!.
`9/2001 Fu eta!.
`12/2001 Arimilli et a!. ............. 711!141
`12/2001 Arimilli et a!. ............. 711!144
`
`(21) Appl. No.: 10/288,347
`
`(22) Filed:
`
`Nov. 4, 2002
`
`(65)
`
`Prior Publication Data
`
`US 2004/0088492 Al May 6, 2004
`
`(Continued)
`
`FOREIGN PATENT DOCUMENTS
`
`wo
`
`W00239242
`
`5!2002
`
`OTHER PUBLICATIONS
`
`(51)
`
`Int. Cl.
`(2006.01)
`G06F 12/00
`(52) U.S. Cl. ....................... 711/146; 711!141; 709/216;
`709/218
`(58) Field of Classification Search ................ 711/141,
`711/146,144, 145; 709/206,213,216,217,
`709/218, 219
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`Multicast snooping: a new coherence method using a mul(cid:173)
`ticast address network Bilir, E.E.; Dickson, R.M.; Ying Hu;
`Plakal, M.; Sarin, D.J.; Hill, M.D.; Wood, D.A.; Computer
`Architecture, 1999. Proceedings of the 26th International
`Symposium on, May 2-4, 1999.*
`
`(Continued)
`
`Primary Examiner-Brian R Peugh
`(74) Attorney, Agent, or Firm-Beyer Weaver & Thomas,
`LLP
`
`U.S. PATENT DOCUMENTS
`
`(57)
`
`ABSTRACT
`
`5,195,089 A
`5,394,555 A
`5,524,212 A *
`5,692,123 A
`5,751,995 A *
`5,829,032 A
`5,893,151 A *
`6,018,791 A
`6,038,652 A
`6,052,769 A
`6,067,603 A
`6,073,210 A
`
`3/1993 Sindhu
`2/1995 Hunter et a!. ............... 711!148
`6/1996 Somani et a!. ...... ... ..... 711!121
`11/1997 Logghe
`5/1998 Sarangdhar ................. 711!145
`10/1998 Komuro eta!. .... .. ... .... 711!141
`4/1999 Merchant .................... 711!140
`1!2000 Arimilli et a!. ............. 711!141
`3/2000 Van Ruben et a!.
`.......... 712/21
`4/2000 Huff et a!.
`5!2000 Carpenter et a!.
`6/2000 Palanca et a!.
`
`According to the present invention, methods and apparatus
`are provided for increasing the efficiency of data access in a
`multiple processor, multiple cluster system. Mechanisms for
`reducing the number of transactions in a multiple cluster
`system are provided. In one example, probe filter informa(cid:173)
`tion is used to limit the number of probe requests transmitted
`to request and remote clusters.
`
`39 Claims, 17 Drawing Sheets
`
`r::.~
`~;~~~
`oro'T""t
`""'~"""~I
`1105 "-L foJ"wardROQUe£!To J
`
`~WilhA
`Mom Uno
`
`MomoryCorllroHer
`- - ·· - -
`
`Petition for Inter Partes Review of 
`U.S. Pat. No. 7,296,121
`IPR2015‐00158
`EXHIBIT
`Sony‐
`
`1
`
`

`

`US 7,003,633 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`Specifying and verifying a broadcast and a multicast snoop(cid:173)
`ing cache coherence protocol Sarin, D.J.; Plakal, M.;
`Condon, A.E.; Hill, M.D.; Martin, M.M.K.; Wood, D.A.;
`Parallel and Distributed Systems, IEEE Transactions on ,
`vol.: 13 , Issue: 6 , Jun. 2002.*
`Kim, et al., "Power-aware Partitioned Cache Architectures",
`© 2001 ACM, p. 6467. *
`Powell, et al., "Reducing Set-Associative Cache Energy via
`Way-Prediction and Selective Direct-Mapping", © 2001
`IEEE, p. 54-65.*
`HyperTransport TM I/0 Link Specification Revision 1.03,
`HyperTransport TM Consortium, Oct. 10, 2001, Copyright©
`HyperTransport Technology Consortium.
`PCT Search Report PCT/US03/34756, Int'l filing date Oct.
`30, 2003, Search report Mailed Dec. 16, 2004.
`U.S. Office Action mailed Sep. 22, 2004, from related U.S.
`Appl. No. 10/106,426.
`U.S. Office Action mailed Mar. 7, 2005, from related U.S.
`Appl. No. 10/106,426.
`U.S. Office Action mailed Jul. 21, 2005, from related U.S.
`Appl. No. 10/106,426.
`U.S. Office Action mailed Sep. 23, 2004, from related U.S.
`Appl. No. 10/106,430.
`U.S. Office Action mailed Mar. 10, 2005, from related U.S.
`Appl. No. 10/106,430.
`U.S. Office Action mailed Jul. 21, 2005, from related U.S.
`Appl. No. 10/106,430.
`U.S. Office Action mailed Sep. 22, 2004, from related U.S.
`Appl. No. 10/106,299.
`U.S. Office Action mailed Mar. 10, 2005, from related U.S.
`Appl. No. 10/106,299.
`U.S. Office Action mailed Jul. 21, 2005, from related U.S.
`Appl. No. 10/106,299.
`D. E. Culler, J. P. Singh, A Gupta, "Parallel Computer
`Architecture", 1999 Morgan Kaufmann, San Francisco, CA
`USA XP002277658.
`Andrew Tanenbaum, "Computer Networks", Computer
`Networks, London: Prentice Hall International, GB, 1996,
`pp. 345-403, XP002155220.
`U.S. Office Action mailed Jul. 20, 2005, from related Ap(cid:173)
`plication No. 10/608,846.
`U.S. Office Action mailed Sep. 9, 2005, from related Ap(cid:173)
`plication No. 10/462,015.
`U.S. Office Action mailed Sep. 9, 2005, from related Ap(cid:173)
`plication No. 10/426,084.
`U.S. Office Action mailed Nov. 2, 2005, from related Ap(cid:173)
`plication No. 10/106/430.
`U.S. Office Action mailed Oct. 5, 2005, from related Ap(cid:173)
`plication No. 10/635,703.
`
`* cited by examiner
`
`6,338,122 B1
`6,343,347 B1
`6,385,705 B1
`6,405,289 B1
`6,463,529 B1
`6,467,007 B1
`6,490,661 B1
`6,542,926 B1
`6,615,319 B1
`6,631,447 B1
`6,633,945 B1
`6,633,960 B1
`6,636,906 B1 *
`6,640,287 B1
`6,658,526 B1
`6,665,767 B1
`6,704,842 B1
`6,738,870 B1
`6,738,871 B1
`6,751,698 B1
`6,751,721 B1 *
`6,754,782 B1
`6,760,809 B1
`6,760,819 B1
`6,799,252 B1 *
`6,865,595 B1
`6,892,282 B1
`2001!0013089 A1
`2001!0037435 A1
`2002/0007463 A1
`2002/0046327 A1
`2002/0052914 A1
`2002/0083149 A1 *
`2002/0083243 A1
`2002/0087807 A1 *
`2002/0087811 A1
`2003/0009623 A1
`2003/0182508 A1
`2003/0182509 A1
`2003/0182514 A1
`2003/0195939 A1
`2003/0196047 A1
`2003/0210655 A1
`2003/0212741 A1
`2003/0233388 A1
`2004/0073755 A1
`2004/0088493 A1
`2004/0088494 A1
`2004/0255002 A1
`
`711!150
`... ... .. 711!150
`
`............. 711!119
`
`1!2002 Baumgartner et a!.
`1!2002 Arimilli et a!.
`5!2002 Keller et a!. ................ 711!154
`6/2002 Arimilli et a!.
`............. 711!145
`10/2002 Miller et a!.
`10/2002 Armstrong et a!.
`12/2002 Keller et a!. ................ 711!150
`........... 709/213
`4/2003 Zalewski et a!.
`9/2003 Khare et a!.
`10/2003 Morioka et a!.
`10/2003 Fu et a!. ..................... 710/316
`10/2003 Kessler et a!.
`10/2003 Sharma eta!. ................ 710/22
`10/2003 Gharachorloo et a!.
`12/2003 Nguyen et a!.
`12/2003 Comisky et a!.
`3/2004 Janakiraman eta!.
`5!2004 Van Ruben et a!.
`5!2004 Van Ruben et a!.
`6/2004 Deneroff et a!.
`6/2004 Webb et a!.
`.................. 712/10
`6/2004 Arimilli et a!.
`7/2004 Arimilli et a!.
`7/2004 Dhong et a!.
`9/2004 Bauman ..................... 711!149
`3/2005 Glasco
`5!2005 Rass et a!. .................. 711!146
`8/2001 Weber
`11/2001 Van Doren
`1!2002 Fung
`4/2002 Gharachorloo et a!.
`5!2002 Zalewski eta!. ........... 709/203
`........ 709/215
`6/2002 Van Ruben et a!.
`6/2002 Van Ruben ................. 710/107
`7/2002 Gharachorloo eta!. ..... 711!141
`7/2002 Khare eta!.
`1!2003 Arimilli et a!.
`9/2003 Glasco
`9/2003 Glasco
`9/2003 Glasco
`10/2003 Edirisooriya et a!. ....... 709/212
`10/2003 Kessler et a!.
`11/2003 Glasco
`11/2003 Glasco
`12/2003 Glasco et a!.
`4/2004 Webb eta!. ................ 711!144
`5!2004 Glasco ....................... 711!141
`5!2004 Glasco
`12/2004 Kota eta!.
`
`............. 711!119
`
`01HER PUBLICATIONS
`
`Bandwidth adaptive snooping Martin, M.M.K.; Sarin, D.J.;
`Hill, M.D.; Wood, D.A.; High-Performance Computer
`Architecture, 2002. Proceedings. Eighth International Sym(cid:173)
`posium on , Feb. 2-6, 2002; pp. 251-262.*
`
`2
`
`

`

`U.S. Patent
`
`Feb.21,2006
`
`Sheet 1 of 17
`
`US 7,003,633 B2
`
`Figure lA
`
`Processing
`Cluster 101
`
`Processing
`Cluster 105
`
`'
`'
`'
`'
`
`'
`'
`'
`'
`'
`
`'
`'
`'
`'
`
`'
`'
`'
`'
`'
`
`Processing
`Cluster 103
`
`- ----- rl11c
`• .. ----- ~
`
`Processing
`Cluster 107
`
`Figure lB
`
`Processing
`Cluster 121
`
`Processing
`Cluster 123
`
`Switch 131
`.-----1_41---,b~J: ::>
`
`____ --~.--14_1_c ___ ---.
`
`1,_ · - --"
`
`Processing
`Cluster 125
`
`Processing
`Cluster 127
`
`3
`
`

`

`ll
`
`06a
`
`Figure 2
`
`Remote Clusters
`
`:'\/--208a
`
`· ... ·
`
`,-2oo
`
`~ I 206
`
`r - - - Processor 202a
`
`f\r232b
`:, ,1
`'·'
`
`Cache
`Coherence
`Controller 230
`
`.c;r-232a
`·,
`.'
`·-·
`
`Processor 202b -
`
`rl
`
`,- ·- ··:
`·- --[(_208e
`
`r-214f
`
`I
`
`I
`
`I
`
`1
`
`r - - - - - ' ' -_ - - - ,
`Service
`J--+1
`Processor 212
`
`214a
`
`;-214c
`
`214b_/
`
`.. - ---.
`V"""214c 232c:.P --·
`
`..-~· --.
`·- -- (_232d
`
`'-214d
`
`208b-~
`:· -- --)
`· - --
`
`I!
`
`r - Processor 202c
`
`(':
`
`(\
`
`Processor 202d f-------1
`
`208d-/
`
`~
`
`~
`
`'·\._208c
`
`W6c
`
`J; !10216
`
`V0220 Q =
`
`VO Switch 210
`
`i
`I BIOS I
`
`204
`
`I[
`206d -~
`
`d •
`\Jl
`•
`~
`~ ......
`~ = ......
`
`"!"j
`~
`?'
`N
`'"""' ~
`N c c
`
`0'1
`
`'JJ. =(cid:173)~
`~ .....
`N
`0 ......,
`'"""'
`---l
`
`e
`
`rJ'l
`
`"'""-l
`Q
`Q
`~
`
`~
`~
`~
`N
`
`4
`
`

`

`U.S. Patent
`US. Patent
`
`Feb.21,2006
`F
`60
`
`Sheet 3 of 17
`whS
`U
`
`US 7,003,633 B2
`US 7,003,633 B2
`
`
`
`magnum.fmomShammomofiwam608onM.
`
`
`
`
`
`<n
`0
`rl)
`
`Q)
`t=:
`.......
`bJ)
`t=:
`t:r.J.
`__,
`0
`(.)
`0
`+-'
`0
`1-o
`~
`
`.......
`......
`
`(Y)
`
`11.)
`(.)
`
`
`
`:mooflbfimE23852
`
`~
`1-o
`Q)
`.......
`.8
`+-'
`s:::
`11)
`1-o
`tU
`..Q
`0
`(.)
`~
`
`0 z
`
`
`
`
`
`
`
`("tj
`
`mopsmE
`
`(]..)
`~
`~
`bl)
`• .-I
`~
`
`
`
`t--
`0
`l')
`QJ
`(.)
`
`~
`1-o
`v
`+-'
`..8
`.....
`t=:
`Q)
`1-o
`Q)
`....c:
`0
`u
`
`
`
`m.pomuumfififiEEEoUn,f.momm
`
`
`
`5
`
`
`
`
`

`

`U.S. Patent
`US. Patent
`
`9
`Feb.21,2006
`Feb. 21
`2006
`
`Sheet 4 of 17
`Sheet 4 0f 17
`
`US 7,003,633 B2
`US 7,003,633 B2
`
`vogwfi
`
`
`
`
`
`6
`
`

`

`U.S. Patent
`US. Patent
`
`Feb.21,2006
`Feb. 21, 2006
`
`Sheet 5 of 17
`Sheet 5 0f 17
`
`US 7,003,633 B2
`US 7,003,633 B2
`
`u~
`;:88
`
`l()
`
`fl2&5
`
`3quHmoodéoz
`
`l()
`......:10
`l()
`
`2m
`
`7
`
`

`

`U.S. Patent
`US. Patent
`
`Feb. 21, 2006
`Feb.21,2006
`
`Sheet 6 0f 17
`Sheet 6 of 17
`
`US 7,003,633 B2
`US 7,003,633 B2
`
`0 Z
`
`H
`8m
`Otn
`*9
`C:
`
`U]
`a.)
`*0
`
`0 ZH
`
`Figure5B
`
`8
`
`

`

`U.S. Patent
`US. Patent
`
`Feb.21,2006
`Feb. 21, 2006
`
`Sheet 7 of 17
`Sheet 7 0f 17
`
`US 7,003,633 B2
`US 7,003,633 B2
`
`nu
`
`m-m<m
`
`meu
`
`m-HVm
`
`Hymnv
`
`N-HVm
`
`om05mm
`
`
`
`m¢m
`
`hwm
`
`awn
`
`Hmm
`
`.......
`u M ~~--------------~
`
`"¢
`lf)
`
`
`
`Hlva_-HVmUfigmU
`
`
`
`mowozQuoqéoz
`
`mmm
`
`9
`
`
`

`

`U.S. Patent
`US. Patent
`
`Feb.21,2006
`Feb. 21, 2006
`
`Sheet 8 of 17
`Sheet 8 0f 17
`
`US 7,003,633 B2
`US 7,003,633 B2
`
`10
`
`M
`
`
`u ~ ~--------------------~
`'D
`If)
`
`mm8&5
`
`
`
`“804952
`
`..-
`0 ~ ~--------------~
`\0
`tr)
`
`monoz
`
`2%
`
`10
`
`

`

`Figure 6
`
`C
`
`L ~CPU
`60:-2-\ 6:7
`60!-2
`
`609
`
`I
`
`J
`
`c
`621-3
`t
`
`L
`
`C
`
`L
`
`• C
`
`l/ 645 : \
`641-1 ~ 647 !--; 641-2
`L6~91
`
`~c 603-1
`
`Request
`Cluster 600
`
`L
`
`L
`627
`
`1/ 625 r-
`c f-+ MC f-Jo
`\ \ 62~-2
`
`621-1
`
`623-1
`
`Home
`Cluster 620
`
`Cluster 640
`
`c
`603-3
`
`CPU
`~
`601-3
`
`c ~
`
`CPU
`601-4
`
`603-4
`
`I
`/
`
`I
`c /
`621-4 I
`
`c
`~
`603-5
`
`...
`I c
`MC:
`~1-5 ~ 623-2
`
`d •
`\Jl
`•
`~
`~ ......
`~ = ......
`
`"'!"j
`~
`?'
`N
`'"""' ~
`N c c
`
`0'1
`
`'JJ. =(cid:173)~
`~ .....
`'0
`0 ......,
`'"""'
`-..J
`
`e
`
`rJ'l
`
`"'""-l
`Q
`Q
`~
`
`~
`~
`~
`N
`
`11
`
`

`

`U.S. Patent
`
`Feb.21,2006
`
`Sheet 10 of 17
`
`US 7,003,633 B2
`
`Figure 7
`
`Coherence Directory 701
`
`Memory Line 711
`
`State 713
`
`Address 721
`
`Address 731
`
`Address 741
`
`Address 751
`
`Address 761
`
`Address 771
`
`Invalid
`
`Invalid
`
`Shared
`
`Shared
`
`Owned
`
`Owned
`
`Dirty Data
`Owner715
`NIA
`
`NIA
`
`N/A
`
`N/A
`
`Occupancy Vector
`717
`
`N/A
`
`N/A
`
`Clusters 1 ,3
`
`Clusters 1, 2, 3, 4
`
`Cluster4
`
`Cluster 2, 3, 4
`
`Cluster 2
`
`Cluster 2, 4
`
`Address 781
`
`Modified
`
`Cluster 2
`
`Address 791
`
`Modified
`
`Cluster 3
`
`...
`
`...
`
`. ..
`
`NIA
`
`N/A
`
`...
`
`12
`
`

`

`U.S. Patent
`
`Feb.21,2006
`
`Sheet 11 of 17
`
`US 7,003,633 B2
`
`Figure 8
`
`Probe Filter
`Information 821
`
`Read Block (Read) 823
`
`Read Block Modify
`(Read/Write) 825
`
`Invalid 831
`
`Can use completion bit.
`Probe home cluster. (801)
`
`Can use completion bit.
`Probe home cluster.
`(809)
`
`Shared 833
`
`Can use completion bit.
`Probe home cluster. (803)
`
`N/A (811)
`
`Owned 835
`
`Can use completion bit.
`Probe remote cluster with
`line cached in owned
`state. (805)
`
`N/A (813)
`
`Modified 83 7
`
`Can use completion bit.
`Probe remote cluster with
`line cached in modified
`state. (807)
`
`Can use completion bit.
`Probe remote cluster.
`(815)
`
`13
`
`

`

`c ~ I~ 903-l
`
`Request
`Cluster 900
`
`L
`. 925
`
`l
`
`Figure 9
`
`c
`903-4
`
`CPU
`~
`901-4
`
`'-+ c
`903-5
`
`I I
`
`Completion Bit Set
`
`L
`927
`
`c
`921-3
`
`c
`921-4
`
`*-
`c H MC
`921-5
`
`1923-2
`
`c
`MC
`921-1 ~
`
`--+
`
`923-1 \
`
`c
`921-2
`
`\
`
`Remote
`Cluster 940
`
`d •
`\Jl
`•
`~
`~ ......
`~ = ......
`
`~
`?'
`N
`!""
`N c c
`
`0'1
`
`'Jl =(cid:173)~
`~ .....
`'"""' N
`0 ......,
`'"""'
`-..J
`
`e
`
`rJ'l
`
`"'""-l
`Q
`Q
`"'~
`~
`
`~
`~
`N
`
`14
`
`

`

`Figure 10
`
`c
`1003-4
`
`CPU
`1001-4
`
`c
`1003-5
`
`Completion Bit Set
`
`c
`1021-3
`
`c
`1021-4
`
`I
`\
`c
`1021-5
`
`MC
`1023-2
`
`L
`1025
`
`L
`1027
`
`c
`
`CPU
`100 l-l
`
`c
`1003-1
`
`Request
`Cluster lOOO
`
`c
`1021-1
`
`Home Cluster
`1020
`
`Remote
`Cluster 1040
`
`d •
`\Jl
`•
`~
`~ ......
`~ = ......
`
`"'!"j
`~
`?'
`N
`'"""' ~
`N c c
`
`0'1
`
`'JJ. =(cid:173)~
`~ .....
`'"""' ~
`0 ......,
`'"""'
`-..J
`
`e
`
`rJ'l
`
`"'""-l
`Q
`Q
`~
`
`~
`~
`~
`N
`
`15
`
`

`

`U.S. Patent
`
`Feb.21,2006
`
`Sheet 14 of 17
`
`US 7,003,633 B2
`
`Figure 11
`
`robe Request
`Handling At Home
`herence Contro
`
`Receive Probe Request
`Associated With A
`Memory Line
`
`Forward Request To
`Memmy Controller
`
`1101
`
`1105
`
`1109 \__
`
`Recieve Probe From
`Memory Controller
`
`1113
`
`1121
`
`1131
`
`No
`
`1141
`
`Access Coherence
`Directory And Probe
`Filter Infomntion
`
`Yes
`
`Yes
`
`No
`
`Receive Probe Responses
`And Forward Response
`To Request Cluster With
`Completion Indicator
`
`1145
`
`Proceed Using Memmy
`Controller Serialization
`With No Filtering And
`Completion Bit
`
`1135
`
`1139 -
`
`Forward Probe With
`Completion Indicator To
`Remote Cluster
`
`Receive Home Cluster
`Probe Responses And Do
`Not Forward Response To
`Request Cluster
`
`1149
`
`Receive Source Done
`From Request Cluster
`And Forward Source
`Done To Memmy
`Controller
`
`16
`
`

`

`U.S. Patent
`
`Feb.21,2006
`
`Sheet 15 of 17
`
`US 7,003,633 B2
`
`Figure 12
`
`Memory Controller Filter Infom1ation 1221
`
`Read Block [Read] 1223
`
`Read Block Modify
`[Read/Write] 1225
`
`Invalid 123 1
`
`Send request to target.
`(1201)
`
`Send request to target.
`(1209)
`
`Shared 1233
`
`Send request to target.
`(1203)
`
`Send request to target.
`(1211)
`
`Owned 1235
`
`Forward Probe To
`Owning Cluster. (1205)
`
`Send request to target.
`(1213)
`
`Modified 1237
`
`Forward Probe To
`Forward Probe To
`Modified Cluster. (1207) Modified Cluster. (1215)
`
`17
`
`

`

`c HCPUH c
`1303-5 T
`
`1301-4
`
`1303-4
`
`Completion Bit Set
`
`j,_sj
`
`Figure 13
`
`ICPul
`
`~·
`
`Request
`Cluster 1300
`
`Home Cluster
`1320
`
`Remote
`Cluster 1340
`
`d •
`\Jl
`•
`~
`~ ......
`~ = ......
`
`"'!"j
`~
`?'
`N
`'"""' ~
`N c c
`
`0'1
`
`'JJ. =-~
`~ .....
`'"""' 0'1
`0 ......,
`'"""'
`-..J
`
`e
`
`rJ'l
`
`"'""-l
`Q
`Q
`~
`
`~
`~
`~
`N
`
`18
`
`

`

`U.S. Patent
`
`Feb.21,2006
`
`Sheet 17 of 17
`
`US 7,003,633 B2
`
`Figure 14
`
`robe Request
`Handling At Home
`herence Contro
`
`1401
`
`1405
`
`1405
`
`Receive Probe Request
`Associated With A
`Particular Memory Line
`
`1403
`
`Identify Characteristics
`Associated With Probe
`Request
`
`Fmward Request To
`Serialization Point
`
`1411
`~--No---<:
`
`Proceed With Or Without
`Probe Filtering And
`Completion Indicator
`(Figure 6 Or Figure 11)
`
`1415
`
`1417
`
`Yes
`
`Block Probe Requests
`And Probe Remote
`Cluster (Figure 12)
`
`Unblock Memory Line
`After Receiving Source
`Done From Request
`Cluster
`
`19
`
`

`

`US 7,003,633 B2
`
`1
`METHODS AND APPARATUS FOR
`MANAGING PROBE REQUESTS
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`The present application is related to filed U.S. application
`Ser. No. 10/106,426 titled Methods And Apparatus For
`Speculative Probing At A Request Cluster, U.S. application
`Ser. No. 10/106,430 titled Methods And Apparatus For
`Speculative Probing With Early Completion And Delayed
`Request, and U.S. application Ser. No. 10/106,299 titled
`Methods And Apparatus For Speculative Probing With Early
`Completion And Early Request, the entireties of which are
`incorporated by reference herein for all purposes. The
`present application is also related to filed U.S. application
`Ser. No. 10/157,340, now U.S. Pat. No. 6,865,595, Ser. No.
`10/145,439, Ser. No. 10/145,438, and Ser. No. 10/157,388
`titled Methods And Apparatus For Responding To A Request
`Cluster by David B. Glasco, the entireties of which are
`incorporated by reference for all purposes. The present
`application is also related to concurrently filed U.S. appli(cid:173)
`cation Ser. No. 10/288,399 with the same title and inventor,
`the entirety of which is incorporated by reference herein for
`all purposes.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`The present invention generally relates to accessing data
`in a multiple processor system. More specifically, the
`present invention provides techniques for improving data
`access efficiency while maintaining cache coherency in a
`multiple processor system having a multiple cluster archi(cid:173)
`tecture.
`2. Description of Related Art
`Data access in multiple processor systems can raise issues
`relating to cache coherency. Conventional multiple proces(cid:173)
`sor computer systems have processors coupled to a system
`memory through a shared bus. In order to optimize access to
`data in the system memory, individual processors are typi(cid:173)
`cally designed to work with cache memory. In one example,
`each processor has a cache that is loaded with data that the
`processor frequently accesses. The cache is read or written
`by a processor. However, cache coherency problems arise
`because multiple copies of the same data can co-exist in
`systems having multiple processors and multiple cache
`memories. For example, a frequently accessed data block
`corresponding to a memory line may be loaded into the
`cache of two different processors. In one example, if both
`processors attempt to write new values into the data block at
`the same time, different data values may result. One value
`may be written into the first cache while a different value is
`written into the second cache. A system might then be unable
`to determine what value to write through to system memory. 55
`A variety of cache coherency mechanisms have been
`developed to address such problems in multiprocessor sys(cid:173)
`tems. One solution is to simply force all processor writes to
`go through to memory immediately and bypass the associ(cid:173)
`ated cache. The write requests can then be serialized before 60
`overwriting a system memory line. However, bypassing the
`cache significantly decreases efficiency gained by using a
`cache. Other cache coherency mechanisms have been devel(cid:173)
`oped for specific architectures. In a shared bus architecture,
`each processor checks or snoops on the bus to determine 65
`whether it can read or write a shared cache block. In one
`example, a processor only writes an object when it owns or
`
`2
`has exclusive access to the object. Each corresponding cache
`object is then updated to allow processors access to the most
`recent version of the object.
`Bus arbitration is used when both processors attempt to
`5 write the same shared data block in the same clock cycle.
`Bus arbitration logic decides which processor gets the bus
`first. Although, cache coherency mechanisms such as bus
`arbitration are effective, using a shared bus limits the num(cid:173)
`ber of processors that can be implemented in a single system
`10 with a single memory space.
`Other multiprocessor schemes involve individual proces(cid:173)
`sor, cache, and memory systems connected to other proces(cid:173)
`sors, cache, and memory systems using a network backbone
`such as Ethernet or Token Ring. Multiprocessor schemes
`15 involving separate computer systems each with its own
`address space can avoid many cache coherency problems
`because each processor has its own associated memory and
`cache. When one processor wishes to access data on a
`remote computing system, communication is explicit. Mes-
`20 sages are sent to move data to another processor and
`messages are received to accept data from another processor
`using standard network protocols such as TCP/IP. Multipro(cid:173)
`cessor systems using explicit communication including
`transactions such as sends and receives are referred to as
`25 systems using multiple private memories. By contrast, mul(cid:173)
`tiprocessor system using implicit communication including
`transactions such as loads and stores are referred to herein as
`using a single address space.
`Multiprocessor schemes using separate computer systems
`30 allow more processors to be interconnected while minimiz(cid:173)
`ing cache coherency problems. However, it would take
`substantially more time to access data held by a remote
`processor using a network infrastructure than it would take
`to access data held by a processor coupled to a system bus.
`35 Furthermore, valuable network bandwidth would be con(cid:173)
`sumed moving data to the proper processors. This can
`negatively impact both processor and network performance.
`Performance limitations have led to the development of a
`point-to-point architecture for connecting processors in a
`40 system with a single memory space. In one example, indi(cid:173)
`vidual processors can be directly connected to each other
`through a plurality of point-to-point links to form a cluster
`of processors. Separate clusters of processors can also be
`connected. The point-to-point links significantly increase the
`45 bandwidth for coprocessing and multiprocessing functions.
`However, using a point-to-point architecture to connect
`multiple processors in a multiple cluster system sharing a
`single memory space presents its own problems.
`Consequently, it is desirable to provide techniques for
`50 improving data access and cache coherency in systems
`having multiple clusters of multiple processors connected
`using point-to-point links.
`
`SUMMARY OF THE INVENTION
`
`According to the present invention, methods and appara(cid:173)
`tus are provided for increasing the efficiency of data access
`in a multiple processor, multiple cluster system. Mecha(cid:173)
`nisms for reducing the number of transactions in a multiple
`cluster system are provided. In one example, probe filter
`information is used to limit the number of probe requests
`transmitted to request and remote clusters.
`In one embodiment, a computer system is provided. The
`computer system includes a home cluster having a first
`plurality of processors and a home cache coherence con(cid:173)
`troller. The first plurality of processors and the home cache
`coherence controller are interconnected in a point-to-point
`
`20
`
`

`

`US 7,003,633 B2
`
`3
`architecture. The home cache coherence controller is con(cid:173)
`figured to receive a probe request and probe one or more
`selected clusters. The one or more clusters are selected based
`on the characteristics associated with the probe request.
`In another embodiment, a method for managing probes is
`provided. A probe request is received at a home cache
`coherence controller in a home cluster. The home cluster
`includes a first plurality of processors and the home cache
`coherence controller. The first plurality of processors and the
`home cache coherence controller are interconnected in a 10
`point-to-point architecture. One or more clusters are selected
`for probing based on the characteristics associated with the
`probe request. The one or more clusters are probed.
`A further understanding of the nature and advantages of
`the present invention may be realized by reference to the 15
`remaining portions of the specification and the drawings.
`
`5
`
`4
`will be understood that it is not intended to limit the
`invention to the described embodiments. On the contrary, it
`is intended to cover alternatives, modifications, and equiva(cid:173)
`lents as may be included within the spirit and scope of the
`invention as defined by the appended claims. Multi-proces(cid:173)
`sor architectures having point-to-point communication
`among their processors are suitable for implementing spe(cid:173)
`cific embodiments of the present invention. In the following
`description, numerous specific details are set forth in order
`to provide a thorough understanding of the present inven(cid:173)
`tion. The present invention may be practiced without some
`or all of these specific details. Well-known process opera(cid:173)
`tions have not been described in detail in order not to
`unnecessarily obscure the present invention. Furthermore,
`the present application's reference to a particular singular
`entity includes that possibility that the methods and appa-
`ratus of the present invention can be implemented using
`more than one entity, unless the context clearly dictates
`otherwise.
`Techniques are provided for increasing data access effi-
`ciency in a multiple processor, multiple cluster system. In a
`point-to-point architecture, a cluster of processors includes
`multiple processors directly connected to each other through
`point-to-point links. By using point-to-point links instead of
`25 a conventional shared bus or external network, multiple
`processors are used efficiently in a system sharing the same
`memory space. Processing and network efficiency are also
`improved by avoiding many of the bandwidth and latency
`limitations of conventional bus and external network based
`multiprocessor architectures. According to various embodi(cid:173)
`ments, however, linearly increasing the number of proces-
`sors in a point-to-point architecture leads to an exponential
`increase in the number of links used to connect the multiple
`processors. In order to reduce the number of links used and
`to further modularize a multiprocessor system using a point(cid:173)
`to-point architecture, multiple clusters are used.
`According to various embodiments, the multiple proces(cid:173)
`sor clusters are interconnected using a point-to-point archi(cid:173)
`tecture. Each cluster of processors includes a cache coher-
`40 ence controller used to handle communications between
`clusters. In one embodiment, the point-to-point architecture
`used to connect processors are used to connect clusters as
`well.
`By using a cache coherence controller, multiple cluster
`45 systems can be built using processors that may not neces(cid:173)
`sarily support multiple clusters. Such a multiple cluster
`system can be built by using a cache coherence controller to
`represent non-local nodes in local transactions so that local
`nodes do not need to be aware of the existence of nodes
`50 outside of the local cluster. More detail on the cache
`coherence controller will be provided below.
`In a single cluster system, cache coherency can be main-
`tained by sending all data access requests through a serial(cid:173)
`ization point. Any mechanism for ordering data access
`requests is referred to herein as a serialization point. One
`example of a serialization point is a memory controller.
`Various processors in the single cluster system send data
`access requests to the memory controller. In one example,
`the memory controller is configured to serialize or lock the
`60 data access requests so that only one data access request for
`a given memory line is allowed at any particular time. If
`another processor attempts to access the same memory line,
`the data access attempt is blocked until the memory line is
`unlocked. The memory controller allows cache coherency to
`65 be maintained in a multiple processor, single cluster system.
`A serialization point can also be used in a multiple
`processor, multiple cluster system where the processors in
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The invention may best be understood by reference to the 20
`following description taken in conjunction with the accom(cid:173)
`panying drawings, which are illustrative of specific embodi(cid:173)
`ments of the present invention.
`FIGS. 1A and 1B are diagrammatic representation depict(cid:173)
`ing a system having multiple clusters.
`FIG. 2 is a diagrammatic representation of a cluster
`having a plurality of processors.
`FIG. 3 is a diagrammatic representation of a cache coher(cid:173)
`ence controller.
`FIG. 4 is a diagrammatic representation showing a trans- 30
`action flow for a data access request from a processor in a
`single cluster.
`FIG. 5A-5D are diagrammatic representations showing
`cache coherence controller functionality.
`FIG. 6 is a diagrammatic representation depicting a trans- 35
`action flow for a probe request with multiple probe
`responses.
`FIG. 7 is a diagrammatic representation showing a cache
`coherence directory.
`FIG. 8 is a diagrammatic representation showing probe
`filter information that can be used to reduce the number of
`probes transmitted to various clusters.
`FIG. 9 is a diagrammatic representation showing a trans(cid:173)
`action flow for probing of a home cluster without probing of
`other clusters.
`FIG. 10 is a diagrammatic representation showing a
`transaction flow for probing of a single remote cluster.
`FIG. 11 is a flow process diagram showing the handling
`of a probe request with probe filter information.
`FIG. 12 is a diagrammatic representation showing
`memory controller filter information.
`FIG. 13 is a diagrammatic representation showing a
`transaction flow for probing a single remote cluster without
`probing a home cluster.
`FIG. 14 is a flow process diagram showing the handling 55
`of a probe request at a home cluster cache coherence
`controller using memory controller filter information.
`
`DETAILED DESCRIPTION OF SPECIFIC
`EMBODIMENTS
`
`Reference will now be made in detail to some specific
`embodiments of the invention including the best modes
`contemplated by the inventors for carrying out the invention.
`Examples of these specific embodiments are illustrated in
`the accompanying drawings. While
`the
`invention
`is
`described in conjunction with these specific embodiments, it
`
`21
`
`

`

`US 7,003,633 B2
`
`5
`the various clusters share a single address space. By using a
`single address space, internal point-to-point links can be
`used to significantly improve intercluster communication
`over traditional external network based multiple cluster
`systems. Various processors in various clusters send data 5
`access requests to a memory controller associated with a
`particular cluster such as a home cluster. The memory
`controller can similarly serialize all data requests from the
`different clusters. However, a serialization point in a mul(cid:173)
`tiple processor, multiple cluster system may not be as 10
`efficient as a serialization point in a multiple processor,
`single cluster system. That is, delay resulting from factors
`such as latency from transmitting between clusters can
`adversely affect the response times for various data access 15
`requests. It should be noted that delay also results from the
`use of probes in a multiple processor environment.
`Although delay in intercluster transactions in an architec(cid:173)
`ture using a shared memory space is significantly less than
`the delay in conventional message passing environments 20
`using external networks such as Ethernet or Token Ring,
`even minimal delay is a significant factor. In some applica(cid:173)
`tions, there may be millions of data access requests from a
`processor in a fraction of a second. Any delay can adversely
`impact processor performance.
`According to various embodiments, probe management is
`used to increase the efficiency of accessing data in a multiple
`processor, multiple cluster system. A mechanism for elicit(cid:173)
`ing a response from a node to maintain cache coherency in
`a system is referred to herein as a probe. In one example, a
`mechanism for snooping a cache is referred to as a probe. A
`response to a probe can be directed to the source or target of
`the initiating request. Any mechanism for filtering or reduc(cid:173)
`ing the number of probes and probe requests transmitted to
`various nodes is referred to herein as managing probes. In
`one example, managing probe entails characterizing a probe
`request to determine if a probe can be transmitted to a
`reduced number of entities.
`In typ

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket