throbber
(12) United States Patent
`Glasco
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 7,103,636 B2
`*Sep. 5,2006
`
`US007103636B2
`
`(54)
`
`(75)
`
`(73)
`(*)
`
`(21)
`(22)
`(65)
`
`(51)
`
`(52)
`
`(58)
`
`(56)
`
`METHODS AND APPARATUS FOR
`SPECULATIVE PROBING OF A REMOTE
`CLUSTER
`
`Inventor: David B. Glasco, Austin, TX (US)
`
`Assignee: NeWisys, Inc., Austin, TX (US)
`
`Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 413 days.
`
`This patent is subject to a terminal dis
`claimer.
`
`Appl. No.: 10/157,388
`Filed:
`May 28, 2002
`
`Prior Publication Data
`
`US 2003/0225979 A1 Dec. 4, 2003
`
`Int. Cl.
`G06F 13/14
`G06F 12/16
`
`(2006.01)
`(2006.01)
`
`US. Cl. ..................... .. 709/206; 709/216; 709/217;
`711/141; 711/146
`
`Field of Classi?cation Search ....... .. 709/206i2l3,
`709/2l6i2l8, 219; 711/141*146
`See application ?le for complete search history.
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5/1987 Allen et a1. ............... .. 709/234
`4,667,287 A
`5,166,674 A 11/1992 Baum et a1.
`714/752
`5,191,651 A
`3/1993 Halim et a1. .............. .. 709/250
`(Continued)
`FOREIGN PATENT DOCUMENTS
`
`OTHER PUBLICATIONS
`
`HyperTransport TM [/0 Link Specification Revision 1.03,
`HyperTransport TM Consortium, Oct. 10, 2001, Copyright ©
`2001 HyperTransport Technology Consortium.
`D. E. Culler, J. P. Singh, A. Gupta, “Parallel Computer
`Architecture”, 1999 Morgan Kaufmann, San Francisco, CA
`USA XP002277658.
`Andrew Tanenbaum, “Computer Networks”, Computer Net
`Works, London: Prentice Hall lntemational, GB, 1996, pp.
`3454403, XP002155220.
`GLASCO, David Brian, “Methods And Apparatus For
`Speculative Probing At A Request Cluster,” U.S. Appl. No.
`10/ 106,426, ?led Mar. 22, 2002, Of?ce Action mailed Sep.
`22, 2004.
`GLASCO, David Brian, “Methods And Apparatus For
`Speculative Probing With Early Completion And Delayed
`Request,” U.S. Appl. No. 10/106,430, ?led Mar. 22, 2002,
`Of?ce Action mailed Sep. 23, 2004.
`GLASCO, David Brian, “Methods And Apparatus For
`Speculative Probing With Early Completion And Early
`Request,” U.S. Appl. No. 10/106,299, ?led Mar. 22, 2002,
`Of?ce Action mailed Sep. 22, 2004.
`
`(Continued)
`Primary ExamineriKevin Verbrugge
`(74) Attorney, Agent, or F irmiBeyer Weaver & Thomas,
`LLP.
`
`(57)
`
`ABSTRACT
`
`According to the present invention, methods and apparatus
`are provided for increasing the ef?ciency of data access in a
`multiple processor, multiple cluster system. Techniques are
`provided for speculatively probing a remote cluster from
`either a request cluster or a home cluster. A speculative
`probe associated With a particular memory line is transmit
`ted to the remote cluster before the cache access request
`associated With the memory line is serialized at a home
`cluster. When a non-speculative probe is received at a
`remote cluster, the information associated With the response
`to the speculative probe is used to provide a response to the
`non-speculative probe.
`
`EP
`WO
`
`0978781
`WO0239242
`
`2/2000
`5/2002
`
`36 Claims, 15 Drawing Sheets
`
`CPU
`7014
`
`MC
`723-2
`
`Request
`Cluster 70o
`
`Rmwb:
`Cluster 740
`
`

`
`US 7,103,636 B2
`Page 2
`
`US. PATENT DOCUMENTS
`
`3/1993 Sindhu et 31
`5,195,089 A
`3/1993 Chen et a1. .................. .. 712/3
`5,197,130 A
`4/1994 Fushimi er a1~
`714/23
`5301311 A
`5,371,852 A 12/1994 Attanasio er a1- --------- -- 709/245
`5,394,555 A * 2/1995 Hunter et a1. ............. .. 7ll/l48
`5,561,768 A 10/1996 Smith ------- --
`-- 712/13
`5,623,644 A
`4/1997 Self et al. ................. .. 713/503
`5,682,512 A 10/1997 Tetrick ..................... .. 711/202
`5,692,123 A 11/1997 Logghe
`5,829,032 A 10/1998 Komufo er a1~ ----------- -- 711/141
`5931938 A
`8/1999 Drogichen er a1- --------- -- 712/15
`6,018,791 A * l/2000 Arimilliet al. .... ..
`. 711/141
`6,038,651 A * 3/2000 VanHuben et a1. ......... .. 712/21
`6,047,332 A
`4/2000 Viswanathan et a1. .... .. 709/245
`6,067,603 A * 5/2000 Carpenter et a1. ........ .. 711/141
`6,148,378 A 11/2000 BordaZ et al. ............ .. 711/147
`6,167,492 A 12/2000 Keller et al. .............. .. 711/154
`6,192,451 B1 * 2/2001 Arimilli er a1
`711/141
`6219775 B1
`4/2001 Wade er a1- ---------------- -- 712/11
`6,226,671 B1
`5/2001 Hagersten et a1. ........ .. 709/215
`6,256,671 B1
`7/2001 StrentZsch et a1.
`709/227
`6,259,701 B1
`7/2001 Shur et a1. ................ .. 370/401
`6’292’705 Bl
`9/2001 Wang e.t a1‘
`6,330,643 B1 * 12/2001 Ar1m1ll1 et a1. ........... .. 711/141
`6,331,983 B1
`12/2001 Haggerty et a1. ......... .. 370/400
`
`1/2002 Fung ........................ .. 713/320
`2002/0004915 A1
`1/2002 Fung
`2002/0007463 A1
`2002/0052914 A1 * 5/2002 Zalewski et a1. ......... .. 709/203
`2002/0083149 A1 >1<
`6/2002 Van Huben et a1‘ __
`709/215
`2002/0083243 A1 * 6/2002 Van Huben et a1. ...... .. 710/107
`2002/0087811 A1 * 7/2002 Khare et a1. .............. .. 711/146
`2002/0156ggg A1 10/2002 Lee et a1‘
`709/224
`2003/0009623 A1
`1/2003 Arimilli et a1. ........... .. 711/119
`2003/0182508 A1
`9/2003 Glasco
`2003/0182509 A1
`9/2003 Glasco
`2003/0182514 A1
`9/2003 Glasco
`2003/0195939 A1 10/2003 Edirisooriya et a1. ..... .. 709/212
`2003/0196047 A1 * 10/2003 Kessler et a1. ............ .. 711/147
`2003/0210655 A1
`11/2003 Glasco
`2003/0212741 A1 110003 Glasco
`2003/0225909 A1 12/2003 Glasco et a1‘
`2003/0225938 A1 l2/2003 Glasco et 31‘
`2003/0225978 A1 12/2003 Glasco
`2003/0233388 A1 12/2003 Glasco et 31,
`2004/0073755 A1 * 4/2004 Webb et a1. .............. .. 711/144
`2004/0098475 A1
`5/2004 Zeitleret a1. ............. .. 709/223
`
`OTHER PUBLICATIONS
`Zei?er et a1” “Methods And Apparatus For Distributing
`System Management Signals”, PCT/US03/34687, lnt’l ?l
`.
`dt 0 t 20 2003 P n. 1
`h
`n .1 dM 28
`mg ae 0'
`’
`’ a 12‘ Seam repo male
`ay
`’
`
`,
`
`,
`
`ms ong e a.
`
`6,334,172 B1 * 12/2001 Arimilli et a1. ........... .. 711/144
`6,338,122 B1 * 1/2002 Baumgartner et a1‘ ____ __ 711/141
`6,370,585 B1
`4/2002 Hagersten et a1. ........ .. 709/238
`6,385,705 B1
`5/2002 Keller et a1. .............. .. 711/154
`6,397,255 B1
`5/2002 Nurenberg et a1. ..
`709/228
`6,405,289 B1 * 6/2002 AI_imi11i et a1~ ~~~~ ~~
`- 711/145
`2’323333 51
`18/588? 2451161?‘ a1‘, 1
`6,490,661 B1
`12/2002 K611616131. .............. .. 711/150
`6,542,926 B1 * 4/2003 Zalewski et a1.
`709/213
`6,578,071 B1
`6/2003 Hagersten et a1.
`709/215
`6,615,319 B1 * 9/2003 Khare et a1. .... ..
`711/141
`6,631,447 B1 * 10/2003 Morioka et a1. ..
`711/141
`6,633,945 B1
`10/2003 Fu et a1. ......... ..
`710/316
`6,633,960 B1 * l0/2003 Kessler et a1. ..... ..
`7ll/l44
`6,704,842 B1 * 3/2004 Janakiraman et a1. ..... .. 711/141
`6,738,870 B1 : 5/2004 Van Huben et 31' ~~~~~~ " 711/150
`6,738,871 B1
`5/2004 Van Huben et a1. ...... .. 711/150
`6,751,698 B1 * 6/2004 Deneroifet a1.
`.. 710/317
`6,751,721 B1 * 6/2004 Webb, Jr. et a1. ........ .. 712/10
`6,754,782 B1 * 6/2004 Arimilli et a1.
`711/144
`6,760,809 B1
`7/2004 Arimilli et a1. ........... .. 711/119
`6,760,819 B1
`7/2004 Dhong et a1.
`6,799,252 B1
`9/2004 Bauman
`6,826,660 B1
`ll/2004 Hagersten et a1. ........ .. 7ll/l53
`6,847,993 B1
`V2005 Novaes et a1~
`-- 709/221
`2332235 51 * Z3882 2m“ {"1"
`ill/i912
`6,920,519 B1
`7/2005 Beukema et a1. .
`710/306
`2001/0014097 A1
`8/2001 Becket a1. ............... .. 370/401
`2001/0037435 A1
`11/2001 Van Doren
`
`,
`
`,
`
`ass e a.
`
`-
`
`~
`
`_
`_
`_
`2094-
`Ze1tler et al., “Methods And Apparatus For D1str1but1ng
`System Management Signals”, PCT/1180364687, lnt’l ?l
`ing date Oct. 20, 2003, lnt’l search report mailed Aug. 17,
`2004.
`U.S. Of?ce Action mailed Jul. 20, 2005, from related Appli
`Cation N°~ 1098,846
`U'tsi 056233118161 giggled NOV‘ 2’ ZOOS’frOm related Apph
`Calon 0'
`.’
`'.
`U-S- _O?_i°e Acnon malled S8P- 21’ 2005’ from related
`APPllcatlon NO- 10/157384
`US O?ice Action mailed Oct 20, 2005, from related
`Application NO. 10/156,893.
`U_S_ Of?ce Action mailed Sep, 21, 2005, from related
`Application NO_ 1()/157,4()9_
`U.S. Of?ce Action mailed Mar. 7, 2005, from related Appli
`Cation Now/106,426
`Us O?iCeACtiOnma?ed Jul 21 2005 fromrelatedA 1._
`'
`’
`’
`ppl
`
`~_
`
`Canon NO- 10009426- _
`U-S~ O?qlce ACUOH malled Mal 10> 2005, from related
`Application NO- 10/106,430
`U.S. Of?ce Action mailed Jul. 21, 2005, from related Appli
`cation No, lO/106,430,
`U.S. Of?ce Action mailed Mar. 10, 2005, from related
`APPHCa‘iOH “1010/1061”
`$318161 ggaglled Jul‘ 21’ 2005’ from related Apph
`'
`’
`'
`* cited by examiner
`
`~
`
`-
`
`-
`
`

`
`U.S. Patent
`
`Sep. 5, 2006
`
`Sheet 1 6f 15
`
`US 7,103,636 B2
`
`Figure 1A
`
`Processing
`Cluster 101
`
`‘
`
`/—l 1 1d
`;' 1,
`'
`"
`
`Processing
`Cluster 103
`
`
`
`llla?: :____::) Processing
`
`Cluster 105
`
`{
`
`i
`
`Processing
`
`Cluster 107
`
`Figure 1B
`
`Processing
`Cluster 121
`
`Processing
`Cluster 123
`
`141a-/“‘;“"'
`
`' i‘kmid
`
`Switch 131
`
`1411)
`
`1
`
`\II‘EII‘I
`
`1410
`
`rlrf
`
`Processing
`Cluster 125
`
`‘
`
`Processing
`Cluster 127
`
`

`
`U.S. Patent
`
`Sep. 5,2006
`
`Sheet 2 of 15
`
`US 7,103,636 B2
`
`8~n\
`
`£8
`
`Nousmfl
`
`
`
`22.930Bofium
`
`3888
`
`£30
`
`uucohosoo
`
`omm.HuSO.5fiOU
`
`III.S8Smsooa
`
`uo_.>.Em
`
`
`
`
`
`UmomuommooobmumomHommmooumm
`
`N5Hommuooum
`
`_SNBfimfim
`
`Boa83
`moaISfiofim
`
`EN§_amon
`
`ENOh
`
`
`
`
`

`
`U.S. Patent
`
`Sep. 5,2006
`
`Sheet 3 of 15
`
`US 7,103,636 B2
`
`momswam
`wfiunom
`
`
`
`momufimcm#8805
`
`
`
`a._..moommuuufiEobnoouoz
`
`mo.5mE
`
`
`
`pomuum.tB£E30300
`
`omm
`
`

`
`U.S. Patent
`
`Sep. 5, 2006
`
`Sheet 4 6f 15
`
`US 7,103,636 B2
`
`02
`
`NAB“.
`
`A
`
`w EswE
`
`DAD
`
`méow
`
`DmU
`
`02
`
`78¢
`
`DmU
`
`73v
`
`

`
`U.S. Patent
`
`Sep. 5, 2006
`
`Sheet 5 6f 15
`
`US 7,103,636 B2
`
`U2
`
`~-mom
`
`L
`
`DmU
`
`méom
`
`DmU
`
`i 2&5
`
`
`
`5602 EuoQéoZ
`
`\/
`
`OE
`
`Tmom
`
`DmU
`
`75m
`
`

`
`U.S. Patent
`
`Sep. 5, 2006
`
`Sheet 6 6f 15
`
`US 7,103,636 B2
`
`mm a?
`
`mémm
`
`\ k
`
`75m
`
`

`
`U.S. Patent
`
`Sep. 5, 2006
`
`Sheet 7 6f 15
`
`US 7,103,636 B2
`
`wrmvm
`
`l
`
`DmU
`
`MAE
`
`om 65mm
`
`Néwm
`
`\“/
`
`
`
`$602 $03.52
`
`mnm
`
`Tmwm
`
`DmU
`
`72%
`
`DB6 \ / mwm
`
`Cum
`
`mqm
`
`

`
`U.S. Patent
`
`Sep. 5, 2006
`
`Sheet 8 6f 15
`
`US 7,103,636 B2
`
`U2
`
`mlmwm
`
`mm 65mm
`
`
`
` 3% 32 Gm - 3% - u o A Q2 1
`
`,H
`
`a;
`
`7.5m
`
`V
`
`@8652
`
`3%
`
`

`
`U.S. Patent
`
`eS
`
`D..
`
`0
`
`Ll
`
`US 7,103,636 B2
`
`2,Q85:205«moswom
`
`moo4
`
`%A
`
`H 5:30MN-HSufiom9U
`%1mmN483%W30E0E83%ozuo4‘ozo
`
`o2&5
`
`28E8E8E838S88938E8280P602.6uPUEo0PB
`
`804
`
`Bofiom
`
`So~Bm3U
`
`
`

`
`U.S. Patent
`
`Sep. 5, 2006
`
`Sheet 10 6f 15
`
`US 7,103,636 B2
`
`02 o u 0 Q 02 o
`
`2i W5 #5 W5“ R6 112? I?
`
`
`
`
`
`32 IE 6.8“ 29. 32 2? 86 Na? Z2 ZR
`
`
`
`0 P6 0 P6 0 P6 T H ‘l o o ‘I P6
`
`w 25min.
`
`1H
`
`2; \
`A /
`
`5 Q
`
`0
`
`
`
`, 82236
`
`
`
`wig 050m
`
`
`
`mow “353%
`
`822620
`
`\ 6
`
`W5 33‘! E T I: u u A 0
`
`2R \
`
`q
`
`a: 895%
`
`82236
`
`

`
`U.S. Patent
`
`Sep. 5, 2006
`
`Sheet 11 6f 15
`
`US 7,103,636 B2
`
`Figure 8
`
`Speculative Probing
`From A Home Cluster
`
`y
`Receive Cache Access
`801 '\ Request Associated With
`A Memory Line
`
`803
`
`Yes
`
`805
`'\_
`
`Send Speculaéive Probe
`Correspon mg To
`Request To Remote
`Cluster
`
`807
`
`'
`Forvvard Request'To
`Serialization Pomt
`
`'
`
`7
`
`809 '\_
`
`Receive Probe From
`Serialization Point
`
`7
`
`311 -\ Broadcast Probes To
`Request And Remote
`Clusters
`
`Y
`
`813
`
`_ X RCCGIVC Probe Response
`
`I
`
`815 \_ Provide Probe Responses
`To Request Cluster
`
`

`
`U.S. Patent
`
`Sep. 5, 2006
`
`Sheet 12 0f 15
`
`US 7,103,636 B2
`
`Figure 9
`
`Speculative Probing
`From A Home Cluster
`
`7
`
`Receive Speculative
`901 "\_ Probe Associated With A
`Memory Line From A
`Home Cluster
`
`903 \ Speculatively Probe Local
`Nodes
`
`Receive Non-Speculative
`905 ’\ Probe Associated With
`The Memory Line From
`The Home Cluster
`
`I
`
`Receive Speculative
`Probe Responses
`
`909
`
`r
`
`Maintain Speculative
`Probe Responses
`
`7
`
`Provide Non-Speculative
`Probe Responses T 0
`Home Cluster Or Request k
`Cluster Using The Results
`Of The Speculative Probe
`
`Remove Speculative
`913 \ Probe From The Pending
`Buffer
`
`

`
`U.S. Patent
`
`Sep. 5,2006
`
`Sheet 13 of 15
`
`US 7,103,636 B2
`
`WE:
`
`

`
`U.S. Patent
`
`Sep. 5, 2006
`
`Sheet 14 6f 15
`
`US 7,103,636 B2
`
`Figure 11
`
`peculative Probni
`From A Request
`Cluster
`
`Receive Cache Access
`1101 '\_ Request Associated With
`A Memory Line
`
`I
`
`Forward Cache Access
`1103 —\ Request To Home Cluster
`
`1105 p\.
`
`Send Speculative Probe
`Corresponding To
`Request To All Remote
`Clusters
`
`1107 '\ Receive Probe From
`Home Cluster
`
`7
`
`1109 N Receive Probe Responses
`From Home Cluster And
`Remote Cluster
`
`

`
`U.S. Patent
`
`Sep. 5, 2006
`
`Sheet 15 6f 15
`
`US 7,103,636 B2
`
`Figure 12
`
`peculative Probm
`From A Request
`Cluster
`
`7
`Receive S eculative
`1201 X Probe Associated With A
`Memory Line From A
`Request Cluster
`
`1 203
`
`Yes
`
`Drop Speculative Probe
`
`'
`_
`1215 \ Receive Speculative
`Probe Responses
`
`1217
`
`\?
`
`V
`
`.
`
`.
`
`.
`
`Maintain S eculative
`Probe; Regponses
`l
`
`1207 \_ speculati?gilgobe Local
`
`7
`
`Receive Non-Speculative
`1209 \ Probe Associated With
`The Memory Line From
`The Home Cluster
`
`121 1
`
`V
`
`_
`_
`Provide Nonspeculative
`Probe Responses To
`\ Home Cluster Or Request ‘
`Cluster Using The Results
`Of The Speculative Probe
`
`7
`
`Remove Speculative
`1213 '\ Probe From The Pending
`Buffer
`
`

`
`US 7,103,636 B2
`
`1
`METHODS AND APPARATUS FOR
`SPECULATIVE PROBING OF A REMOTE
`CLUSTER
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`The present application is related to US. application Ser.
`No. 10/ 106,426 titled Methods And Apparatus For Specu
`lative Probing At A Request Cluster, US. application Ser.
`No. 10/ 106,430 titled Methods And Apparatus For Specu
`lative Probing With Early Completion And Delayed
`Request, and US. application Ser. No. 10/106,299 titled
`Methods And Apparatus For Speculative Probing With Early
`Completion And Early Request, the entireties of Which are
`incorporated by reference herein for all purposes. The
`present application is also related to US. application Ser.
`Nos. 10/145,439 and 10/145,438 both titled Methods And
`Apparatus For Responding To A Request Cluster by David
`B. Glasco ?led on May 13, 2002, the entireties of Which are
`incorporated by reference for all purposes. Furthermore, the
`present application is related to concurrently ?led US.
`application Ser. No. 10/ 157,340 also titled Methods And
`Apparatus For Speculative Probing Of A Remote Cluster by
`David B. Glasco, the entirety of Which is incorporated by
`reference for all purposes.
`The present application is also related to concurrently
`?led US. application Ser. Nos. 10/157,384, 10/156,893, and
`10/157,409 titled Transaction Management In Systems Hav
`ing Multiple Multi-Processor Clusters, Routing Mechanisms
`In Systems Having Multiple Multi-Processor Clusters, and
`Address Space Management In Systems Having Multiple
`Multi-Processor Clusters respectively, all by David B.
`Glasco, Carl Zeitler, Rajesh Kota, Guru Prasadh, and Rich
`ard R. Oehler, the entireties of Which are incorporated by
`reference for all purposes.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`The present invention generally relates to accessing data
`in a multiple processor system. More speci?cally, the
`present invention provides techniques for improving data
`access ef?ciency While maintaining cache coherency in a
`multiple processor system having a multiple cluster archi
`tecture.
`2. Description of Related Art
`Data access in multiple processor systems can raise issues
`relating to cache coherency. Conventional multiple proces
`sor computer systems have processors coupled to a system
`memory through a shared bus. In order to optimiZe access to
`data in the system memory, individual processors are typi
`cally designed to Work With cache memory. In one example,
`each processor has a cache that is loaded With data that the
`processor frequently accesses. The cache is read or Written
`by a processor. HoWever, cache coherency problems arise
`because multiple copies of the same data can co-exist in
`systems having multiple processors and multiple cache
`memories. For example, a frequently accessed data block
`corresponding to a memory line may be loaded into the
`cache of tWo different processors. In one example, if both
`processors attempt to Write neW values into the data block at
`the same time, different data values may result. One value
`may be Written into the ?rst cache While a different value is
`Written into the second cache. A system might then be unable
`to determine What value to Write through to system memory.
`A variety of cache coherency mechanisms have been
`developed to address such problems in multiprocessor sys
`
`2
`tems. One solution is to simply force all processor Writes to
`go through to memory immediately and bypass the associ
`ated cache. The Write requests can then be serialized before
`overWriting a system memory line. HoWever, bypassing the
`cache signi?cantly decreases ef?ciency gained by using a
`cache. Other cache coherency mechanisms have been devel
`oped for speci?c architectures. In a shared bus architecture,
`each processor checks or snoops on the bus to determine
`Whether it can read or Write a shared cache block. In one
`example, a processor only Writes an object When it oWns or
`has exclusive access to the object. Each corresponding cache
`object is then updated to alloW processors access to the most
`recent version of the object.
`Bus arbitration is used When both processors attempt to
`Write a shared data block in the same clock cycle. Bus
`arbitration logic decides Which processor gets the bus ?rst.
`Although, cache coherency mechanisms such as bus arbi
`tration are effective, using a shared bus limits the number of
`processors that can be implemented in a single system With
`a single memory space.
`Other multiprocessor schemes involve individual
`processor, cache, and memory systems connected to other
`processors, cache, and memory systems using a netWork
`backbone such as Ethernet or Token Ring. Multiprocessor
`schemes involving separate computer systems each With its
`oWn address space can avoid many cache coherency prob
`lems because each processor has its oWn associated memory
`and cache. When one processor Wishes to access data on a
`remote computing system, communication is explicit. Mes
`sages are sent to move data to another processor and
`messages are received to accept data from another processor
`using standard netWork protocols such as TCP/IP. Multipro
`cessor systems using explicit communication including
`transactions such as sends and receives are referred to as
`systems using multiple private memories. By contrast, mul
`tiprocessor system using implicit communication including
`transactions such as loads and stores are referred to herein as
`using a single address space.
`Multiprocessor schemes using separate computer systems
`alloW more processors to be interconnected While minimiZ
`ing cache coherency problems. HoWever, it Would take
`substantially more time to access data held by a remote
`processor using a netWork infrastructure than it Would take
`to access data held by a processor coupled to a system bus.
`Furthermore, valuable netWork bandWidth Would be con
`sumed moving data to the proper processors. This can
`negatively impact both processor and netWork performance.
`Performance limitations have led to the development of a
`point-to-point architecture for connecting processors in a
`system With a single memory space. In one example, indi
`vidual processors can be directly connected to each other
`through a plurality of point-to-point links to form a cluster
`of processors. Separate clusters of processors can also be
`connected. The point-to-point links signi?cantly increase the
`bandWidth for coprocessing and multiprocessing functions.
`HoWever, using a point-to-point architecture to connect
`multiple processors in a multiple cluster system sharing a
`single memory space presents its oWn problems.
`Consequently, it is desirable to provide techniques for
`improving data access and cache coherency in systems
`having multiple clusters of multiple processors connected
`using point-to-point links.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`65
`
`SUMMARY OF THE INVENTION
`According to the present invention, methods and appara
`tus are provided for increasing the ef?ciency of data access
`
`

`
`US 7,103,636 B2
`
`3
`in a multiple processor, multiple cluster system. Techniques
`are provided for speculatively probing a remote cluster from
`either a request cluster or a home cluster. A speculative
`probe associated With a particular memory line is transmit
`ted to the remote cluster before the cache access request
`associated With the memory line is serialiZed at a home
`cluster. When a non-speculative probe is received at a
`remote cluster, the information associated With the response
`to the speculative probe is used to provide a response to the
`non-speculative probe.
`According to various embodiments, a computer system is
`provided. The computer system includes a request cluster, a
`home cluster, and a remote cluster. The request cluster has
`a plurality of interconnected request cluster processors and
`a request cluster cache coherence controller. The home
`cluster has a plurality of interconnected home processors, a
`serialiZation point, and a home cache coherence controller.
`The remote cluster has a plurality of interconnected remote
`processors and a remote cache coherence controller. The
`remote cluster is con?gured to receive a ?rst probe corre
`sponding to a cache access request from a request cluster
`processor in the request cluster and a second probe corre
`sponding to the cache access request from the home cluster.
`According to other embodiments, a method for a cache
`coherence controller to manage data access in a multipro
`cessor system is provided. The method includes receiving a
`cache access request from a request cluster processor asso
`ciated With a request cluster, forWarding the cache access
`request to a home cluster, and sending a probe associated
`With the cache request to a remote cluster. The home cluster
`includes a home cluster cache coherence controller and a
`serialization point.
`According to still other embodiments, a computer system
`is provided. The computer system includes a ?rst cluster and
`a second cluster. The ?rst cluster includes a ?rst plurality of
`processors and a ?rst cache coherence controller. The ?rst
`plurality of processors and the ?rst cache coherence con
`troller are interconnected in a point-to-point architecture.
`The second cluster includes a second plurality of processors
`and a second cache coherence controller. The second plu
`rality of processors and the second cache coherence con
`troller are interconnected in a point-to-point architecture, the
`?rst cache coherence controller is coupled to the second
`cache coherence controller. The ?rst cache coherence con
`troller is con?gured to receive a cache access request origi
`nating from the ?rst plurality of processors and send a probe
`to a third cluster including a third plurality of processors
`before the cache access request is received by a serialization
`point in the second cluster.
`In still other embodiments, a computer system is pro
`vided. The computer system includes a ?rst cluster and a
`second cluster. The ?rst cluster includes a ?rst plurality of
`processors and a ?rst cache coherence controller. The ?rst
`plurality of processors and the ?rst cache coherence con
`troller are interconnected in a point-to-point architecture.
`The second cluster includes a second plurality of processors
`and a second cache coherence controller. The second plu
`rality of processors and the second cache coherence con
`troller are interconnected in a point-to-point architecture.
`The ?rst cache coherence controller is coupled to the second
`cache coherence controller and constructed to receive a
`cache access request originating from the ?rst plurality of
`processors and send a probe to a third cluster including a
`third plurality of processors before a memory line associated
`With the cache access request is locked.
`In yet another embodiment, a cache coherence controller
`is provided. The cache coherence controller includes inter
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`face circuitry coupled to a request cluster processor in a
`request cluster and a remote cluster cache coherence con
`troller in a remote cluster and a protocol engine coupled to
`the interface circuitry. The protocol engine is con?gured to
`receive a cache access request from the request cluster
`processor and speculatively probe a remote node in the
`remote cluster.
`A further understanding of the nature and advantages of
`the present invention may be realiZed by reference to the
`remaining portions of the speci?cation and the draWings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The invention may best be understood by reference to the
`folloWing description taken in conjunction With the accom
`panying draWings, Which are illustrative of speci?c embodi
`ments of the present invention.
`FIGS. 1A and 1B are diagrammatic representation depict
`ing a system having multiple clusters.
`FIG. 2 is a diagrammatic representation of a cluster
`having a plurality of processors.
`FIG. 3 is a diagrammatic representation of a cache coher
`ence controller.
`FIG. 4 is a diagrammatic representation shoWing a trans
`action ?oW for a data access request from a processor in a
`single cluster.
`FIGS. 5Ai5D are diagrammatic representations shoWing
`cache coherence controller functionality.
`FIG. 6 is a diagrammatic representation depicting a trans
`action ?oW probing a remote cluster.
`FIG. 7 is a diagrammatic representation showing a trans
`action ?oW for a speculative probing from a home cluster.
`FIG. 8 is a How process diagram shoWing speculative
`probing from a home cluster.
`FIG. 9 is a How process diagram shoWing speculative
`probing from a home cluster at a remote cluster.
`FIG. 10 is a diagrammatic representation shoWing a
`transaction ?oW for a speculative probing from a request
`cluster.
`FIG. 11 is a How process diagram shoWing speculative
`probing from a request cluster.
`FIG. 12 is a How process diagram shoWing speculative
`probing from a request cluster at a remote cluster.
`
`DETAILED DESCRIPTION OF SPECIFIC
`EMBODIMENTS
`
`Reference Will noW be made in detail to some speci?c
`embodiments of the invention including the best modes
`contemplated by the inventors for carrying out the invention.
`Examples of these speci?c embodiments are illustrated in
`the accompanying draWings. While the invention is
`described in conjunction With these speci?c embodiments, it
`Will be understood that it is not intended to limit the
`invention to the described embodiments. On the contrary, it
`is intended to cover alternatives, modi?cations, and equiva
`lents as may be included Within the spirit and scope of the
`invention as de?ned by the appended claims. Multi
`processor architectures having point-to-point communica
`tion among their processors are suitable for implementing
`speci?c embodiments of the present invention. In the fol
`loWing description, numerous speci?c details are set forth in
`order to provide a thorough understanding of the present
`invention. The present invention may be practiced Without
`some or all of these speci?c details. Well knoWn process
`operations have not been described in detail in order not to
`
`

`
`US 7,103,636 B2
`
`5
`unnecessarily obscure the present invention. Furthermore,
`the present application’s reference to a particular singular
`entity includes that possibility that the methods and appa
`ratus of the present invention can be implemented using
`more than one entity, unless the context clearly dictates
`otherWise.
`Techniques are provided for increasing data access effi
`ciency in a multiple processor, multiple cluster system. In a
`point-to-point architecture, a cluster of processors includes
`multiple processors directly connected to each other through
`point-to-point links. By using point-to-point links instead of
`a conventional shared bus or external network, multiple
`processors are used e?iciently in a system sharing the same
`memory space. Processing and netWork ef?ciency are also
`improved by avoiding many of the bandWidth and latency
`limitations of conventional bus and external netWork based
`multiprocessor architectures. According to various
`embodiments, hoWever, linearly increasing the number of
`processors in a point-to-point architecture leads to an expo
`nential increase in the number of links used to connect the
`multiple processors. In order to reduce the number of links
`used and to further modularize a multiprocessor system
`using a point-to-point architecture, multiple clusters are
`used.
`According to various embodiments, the multiple proces
`sor clusters are interconnected using a point-to-point archi
`tecture. Each cluster of processors includes a cache coher
`ence controller used to handle communications betWeen
`clusters. In one embodiment, the point-to-point architecture
`used to connect processors are used to connect clusters as
`Well.
`By using a cache coherence controller, multiple cluster
`systems can be built using processors that may not neces
`sarily support multiple clusters. Such a multiple cluster
`system can be built by using a cache coherence controller to
`represent non-local nodes in local transactions so that local
`nodes do not need to be aWare of the existence of nodes
`outside of the local cluster. More detail on the cache
`coherence controller Will be provided beloW.
`In a single cluster system, cache coherency can be main
`tained by sending all data access requests through a serial
`ization point. Any mechanism for ordering data access
`requests is referred to herein as a serialization point. One
`example of a serialization point is a memory controller.
`Various processors in the single cluster system send data
`access requests to the memory controller. In one example,
`the memory controller is con?gured to serialize or lock the
`data access requests so that only one data access request for
`a given memory line is alloWed at any particular time. If
`another processor attempts to access the same memory line,
`the data access attempt is blocked until the memory line is
`unlocked. The memory controller alloWs cache coherency to
`be maintained in a multiple processor, single cluster system.
`A serialization point can also be used in a multiple
`processor, multiple cluster system Where the processors in
`the various clusters share a single address space. By using a
`single address space, internal point-to-point links can be
`used to signi?cantly improve intercluster communication
`over traditional external netWork based multiple cluster
`systems. Various processors in various clusters send data
`access requests to a memory controller associated With a
`particular cluster such as a home cluster. The memory
`controller can similarly serialize all data requests from the
`different clusters. HoWever, a serialization point in a mul
`tiple processor, multiple cluster system may not be as
`ef?cient as a serialization point in a multiple processor,
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`single cluster system. That is, delay resulting from factors
`such as latency from transmitting betWeen clusters can
`adversely affect the response times for various data access
`requests. It should be noted that delay also results from the
`use of probes in a multiple processor environment.
`Although delay in intercluster transactions in an architec
`ture using a shared memory space is signi?cantly less than
`the delay in conventional message passing environments
`using external netWorks such as Ethernet or Token Ring,
`even minimal delay is a signi?cant factor. In some
`applications, there may be millions of data access requests
`from a processor in a fraction of a second. Any delay can
`adversely impact processor performance.
`According to various embodiments, speculative probing
`is used to increase the e?iciency of accessing data in a
`multiple processor, multiple cluster system. A mechanism
`for eliciting a response from a node to maintain cache
`coherency in a system is referred to herein as a probe. In one
`example, a mechanism for snooping a cache is referred to as
`a probe. A response to a probe can be directed to the source
`or target of the initiating request. Any mechanism for
`sending probes to nodes associated With cache blocks before
`a request associated With the probes is received at a serial
`ization point is referred to herein as speculative probing.
`According to various embodiments, the reordering or
`elimination of certain data access requests do not adversely
`affect cache coherency. That is, the end value in the cache is
`the same Whether or not snooping occurs. For example, a
`local processor attempting to read the cache data block can
`be alloWed to access the data block Without sending the
`requests through a serial

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket