`
`(12)
`
`United States Patent
`Glasco et a].
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 7,281,055 B2
`Oct. 9, 2007
`
`(54) ROUTING MECHANISMS IN SYSTEMS
`HAVING MULTIPLE MULTI-PROCESSOR
`
`2/1995 Hunter et a1.
`5,394,555 A
`5,561,768 A 10/1996 Smith ........................ .. 712/13
`
`CLUSTERS
`
`5,623,644 A
`
`4/1997 Self et a1. ................. .. 713/503
`
`(75) Inventors: David Brian Glasco’ Austin’ TX (Us);
`Carl Zeitler, Tomball, TX (US);
`Rajesh Kota’ Austin’ TX (Us); Guru
`Prasadh, Austin, TX (US); Richard R.
`Oehler, Somers, NY (US)
`
`_
`_
`(73) Ass1gnee: NeWisys, Inc., Austin, TX (US)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 797 days.
`
`(21) Appl. No.: 10/156,893
`
`(22) Filed:
`
`May 28, 2002
`
`(65)
`
`Prior Publication Data
`Us 2003/0225938 A1
`Dec. 4, 2003
`
`(51) Int- Cl‘
`
`(200601)
`G06F 15/16
`(2006-01)
`G06F 13/00
`(52) US. Cl. .................... .. 709/232; 709/217; 709/238;
`711/148
`(58) Field of Classi?cation Search ...... .. 709/2l7i2l9,
`
`7090307245’ 2127216; 3709897393;
`712/10415; 711/150,1474148
`See application ?le for complete search history.
`References Cited
`
`(56)
`
`*
`
`.
`
`5,682,512 A 10/1997 Tetrlck ..................... .. 711/202
`5’692’l23 A “H997 Logghe
`5,781,187 A
`7/1998 Gephardt et a1.
`5,796,605 A *
`8/1998 Hagersten .................... .. 700/5
`5,805,839 A *
`9/1998 Singhal .................... .. 710/112
`
`5,819,075 A 10/1998 Forsmo
`
`(Continued)
`
`FOREIGN PATENT DOCUMENTS
`
`EP
`
`0978781
`
`2/2000
`
`(Continued)
`
`OTHER PUBLICATIONS
`
`International Search Report dated Jul. 30, 2004, from corresponding
`PCT Application N0. PCT?JS2003/034687 (9 pages).
`
`(Continued)
`
`.
`
`. i
`
`23313055572”, 133131175312?
`y’ g 1
`y
`(57)
`ABSTRACT
`
`.
`
`LLP
`
`U'S' PATENT DOCUMENTS
`4,667,287 A *
`5/1987 Allen et a1. ............... .. 709/234
`4,783,657 A 11/1988 Bouchard et a1.
`4,783,687 A 11/1988 Rees
`5 J25 ’081 A
`6/1992 Chiba
`5166674 A * 11/1992 Baum et a1. .............. .. 714/752
`’
`’
`.
`5,191,651 A
`3/1993 Hallm et a1.
`709/250
`5,197,130 A *
`3/1993 Chen et a1. .................. .. 712/3
`5,301,311 A
`4/1994 Fushimi et a1. ............. .. 714/23
`5,371,852 A * 12/1994 Attanasio et a1. ......... .. 709/245
`
`A multi-processor computer system is described in Which
`address mapping, routing, and transaction identi?cation
`mechanisms are provided Which enable the interconnection
`of a plurality of multi-processor clusters, Wherein the num
`ber of rocessors interconnected exceeds limited address
`~p -
`-
`-
`-
`s
`node identi?cation, and transaction tag spaces associated
`With each of the individual Clusters
`'
`
`18 Claims, 9 Drawing Sheets
`
`[-200
`
`'
`51351251115
`
`[r1 1 m
`?
`
`V
`élusterlOQ
`
`111a_\-:;__
`
`j: [11“
`
`Processor 202;:
`
`Controller 230
`
`Proclssor 2021,
`
`Proig‘slliazzll
`
`214:
`
`I
`33:25
`
`f-lllb
`1’
`
`I
`Cluster 107
`
`Processor 202s
`
`Processor 202d
`
`I/O Switch 2l0
`
`
`
`US 7,281,055 B2
`Page 2
`
`US. PATENT DOCUMENTS
`
`5,822,531 A 10/1998 Gorczyca et 91
`5,931,938 A
`8/1999 Droglchen et a1. ......... .. 712/15
`5,940,870 A *
`8/1999 Chl et al. ................. .. 711/206
`6,003,075 A 12/1999 Affincltét a1~
`6,018,791 A
`V2000 Anmllll er a1
`6,038,651 A
`3/2000 VanHuben et 31
`6,047,332 A
`4/2000 Viswanathan et al. .... .. 709/245
`6,065,053 A
`5/2000 Noun er a1
`6,067,603 A
`5/2000 Carpenter et a1~
`6,085,295 A *
`7/2000 Ekanadham et al. ...... .. 711/145
`6097707 A
`8/2000 HOdZIC er 91 ----- --
`-- 370/321
`6,151,663 A 11/2000 Pawlowskl er a1
`6,167,492 A 12/2000 Keller et al. .............. .. 711/154
`6,192,451 B1
`2/2001 Anmllll er a1
`6,209,065 B1* 3/2001 Van Doren etal. ....... .. 711/150
`6,219,775 B1
`4/2001 Wade et al. ....... ..
`712/11
`6,226,671 B1
`5/2001 Hagersten et al.
`709/215
`6,256,671 B1* 7/2001 StrentZsch et al.
`709/227
`6,259,701 B1* 7/2001 Shuretal. ................ .. 370/401
`6,330,643 B1
`12/2001 Arimilli et al.
`6,331,983 B1* 12/2001 Haggerty etal. ......... .. 370/400
`6,334,172 B1
`120001 Arimilli et a1‘
`6,338,122 B1* 1/2002 Baumgartner et al. .... .. 711/141
`6,349,091 B1
`2/2002 Li ..................... ..
`370/238
`6 370 585 B1* 4/2002 Hagersten et al. ........ .. 709/238
`6,377,640 B2
`4/2002 Trans
`6,378,029 B1* 4/2002 Venkitakrishnan etal. . 710/317
`6,385,174 B1
`5/2002 Li ............................ .. 370/252
`6385 705 B1
`5/2002 Keller et al. ....... ..
`711/154
`6,397,255 B1* 5/2002 Nurenberg et al. ....... .. 709/228
`6,405,289 B1
`6/2002 Arimilli et a1‘
`6 463 529 B1
`10/2002 Miller et a1’
`6,467,007 B1
`10/2002 Armstrong etal.
`6,490,661 B1* 12/2002 Keller et al. .............. .. 711/150
`6,542,926 B2
`4/2003 Zalewskietal.
`6,553,439 B1
`400% Greger et 31‘
`6,578,071 B2
`6/2003 Hagersten etal. ........ .. 709/215
`6,598,130 B2 *
`7/2003 Harris et al. .............. .. 711/147
`6,601,165 B2
`7/2003 Morrison etal.
`6,615,319 B2
`900% Khare et a1‘
`6,631,447 B1
`10/2003 Morioka etal.
`6,633,960 B1
`10/2003 Kessler et al‘
`6,675,253 B1
`1/2004 Brinkmann etal.
`6,687,751 B1
`2/2004 Wils et al. ................ .. 709/230
`6,690,757 B1
`2/2004 Bunton etal.
`6,704,842 B1
`3/2004 Janakimman et a1‘
`6,718,552 B1
`4/2004 Goode ....................... .. 725/95
`6,751,698 B1
`6/2004 Denero?r et 31‘
`6,754,782 B2
`6/2004 Arimilli etal. ........... .. 711/144
`6,760,819 B2* 7/2004 Dhong et al. ............. .. 711/146
`6,772,226 B1
`8/2004 Bommareddy et al.
`709/245
`6,785,726 B1* 8/2004 Freeman et al. .......... .. 709/227
`6,820,174 B2
`11/2004 Vanderwiel
`6,826,660 B2
`11/2004 Hagersten etal. ........ .. 711/153
`6,842,857 B2
`1/2005 Lee etal.
`6,847,993 B1
`l/2005 Novaes et al. ............ .. 709/221
`6,854,069 B2
`2/2005 Kampe etal.
`6,856,621 B1* 2/2005 Artes ....................... .. 370/390
`6,865,595 B2
`3/2005 Glasco et al.
`6,920,519 B1
`7/2005 Beukema et al. ......... .. 710/306
`6,977,908 B2 12/2005 De Azevedo et al.
`7,010,617 B2
`3/2006 Kampe et al.
`7,043,569 B1
`5/2006 ChOu etal.
`7,103,636 B2
`9/2006 Glasco et 31~
`7,103,823 B2
`9/2006 Nemawarkar et al.
`7,117,419 B2* 10/2006 Nemawarkar et a1. .... .. 714/758
`7,155,525 B2 12/2006 Glasco et al.
`7,222,262 B2
`5/2007 Pfas?dll et 31~
`2001/0014097 A1* 8/2001 Beck et al. ............... .. 370/401
`2001/0037435 A1 11/2001 Van Doren
`2002/0004886 A1* 1/2002 Hagersten etal. ........ .. 711/141
`
`1/2002 Fung ........................ .. 713/320
`2002/0004915 A1
`1/2002 Fung etal.
`2002/0007463 A1
`5/2002 Zalewskietal.
`2002/0052914 A1
`6/2002 Van Huben et 31‘
`2002/00g3149 A1
`6/2002 Van Huben et 31‘
`2002/0083243 A1
`7/2002 Khare etal.
`2002/0087811 A1
`2002/0156888 A1* 10/2002 Lee etal. ................. .. 709/224
`2002/0157035 A1 10/2002 Wong etal.
`2002/017416g A1 11/2002 Beukema et 31‘
`2003/0149844 A1* 8/2003 Duncan et al. ........... .. 711/141
`2003/0196047 A1 10/2003 Kessleretal.
`2003/0212g73 A1
`11/2003 Lee et al‘
`2003/0225909 A1 12/2003 Glasco et al. ............. .. 709/245
`2003/0233388 A1 12/2003 Glasco etal. ............. .. 718/101
`2004/0073755 A1
`4/2004 Webb et a1‘
`2004/0098475 A1
`5/2004 Zeitleret al. ............. .. 709/223
`
`FOREIGN PATENT DOCUMENTS
`
`W0
`
`WO 02/39242
`
`5/2002
`
`OTHER PUBLICATIONS
`_
`_
`_
`_
`_
`1.03,
`HyperTransportTM I/O Llnk Speci?cation Revlslon
`HYPerTmnSPOYtTM Consortium, Oct 10, 2001, Copyright@ 2001
`HYWTT‘HISPOrt Technology Consomum
`D E Cullen l P Singh, A Gupta, “Parallel Computer Architec
`ture”, 1999 Morgan Kaufmann, San Francisco, CA USA
`XP002277658~
`Andrew T‘menbaum’ “Computer Networks”, Computer Networks,
`London: Prentice Hall International’ GB’ 1996’ PP 345-403’
`XP002155220~
`_
`_
`Bossen et a1., “Fault-tolerant deslgn ofthe IBM pSerles 690 system
`POWER4 processor technology”, IBM J. Res. and Dev. vol. 46, No.
`1’ J'fm' 2002
`_
`Nome of Allowance malled NO“ 30’ 2004’ from Us APPL N°~
`10057340
`_
`_
`US. Of?ce Actlon malled Nov. 30, 2004, from Us. Appl. No.
`104571388
`_
`Nome of Allowance malled May 11’ 2005’ from Us APPL N°~
`10/ 157,388
`_
`Notlce of Allowance malled Aug. 27, 2005, from Us. Appl. No.
`10/ 157,388
`_
`_
`US. Of?ce Actlon malled Sep. 21, 2005, from Us. Appl. No.
`10/ 157,384
`_
`_
`U~S~ Of?ce Actlon mailed 1m 30, 2006, from us APP1~ N°~
`104571384
`_
`Notlce of Allowance malled Jun. 15, 2006, from US. Appl. No.
`10057384
`_
`_
`Supplemental Notlce ofAllowance malled Sep. 25, 2006, from US.
`APP1~ N°~ 100571384 _
`U~S~ Of?ce Actlon malled SeP~ 21, 2005, from us APP1~ N°~
`104571409
`US Of?ce Action mailed Feb 7, 2006, from US APP1~ N9
`100571409
`_
`_
`US Of?ee Actlon malled Jul~ 6, 2006, from US APP1~ N9
`10051409.
`US. Of?ce Action mailed Dec. 4, 2006, from US. Appl. No.
`10/300,408.
`US. Of?ce Action mailed May 18, 2007, from US. Appl. No.
`10/300,408.
`US. Of?ce Action mailed Apr. 18, 2006, from US. Application No.
`10/356,393.
`US. Of?ce Action mailed Oct. 4, 2006, from US. Appl. No.
`10/356,393.
`U.S. Of?ce Action mailed May 21, 2007, from US. Appl. No.
`10/356,393,
`Us, ()f?ce Action mailed Aug, 14, 2006, from Us, Appl, No,
`10/635,700.
`Notice of Allowance mailed Jan. 24, 2007, from US. Appl. No.
`10/635,700.
`U.S. O?'lce Action mailed Dec. 12, 2005, from US. Appl. No.
`10/635,884.
`
`
`
`US 7,281,055 B2
`Page 3
`
`Notice of Allowance mailed Apr. 19, 2006, from U.S. Appl. No.
`10/635,884.
`U.S. Of?ce Action mailed Feb. 2, 2006 from U.S. Appl. No.
`10/635,705.
`Notice of Allowance mailed Aug. 18, 2006, from U.S. Appl. No.
`10/635,705.
`U.S. Of?ce Action mailed Nov. 15, 2005 from U.S. Appl. No.
`10/635,793.
`Notice of Allowance mailed May 23, 2006, from U.S. Appl. No.
`10/635,793.
`European Search report mailed Sep. 6, 2005, from European Appli
`cation No. 037780277
`European Search report mailed Mar. 29, 2006, from European
`Application No. 037780277
`
`European Search report mailed Dec. 19, 2006, from European
`Application No. 037780277
`International Search Report dated Jan. 12, 2005, from PCT Appli
`cation No. PCT/US2004/022935.
`Written Opinion dated Jan. 12, 2005, from PCT Application No.
`PCT/US2004/022935.
`Of?ce Action mailed Apr. 18, 2006 in U.S. Appl. No. 10/356,393,
`?led Jan. 30, 2003.
`European Search Report mailed Mar. 29, 2006 in Application No.
`03 778 0277-2211.
`
`* cited by examiner
`
`
`
`U.S. Patent
`
`Oct. 9, 2007
`
`Sheet 1 0f 9
`
`US 7,281,055 B2
`
`Fig. 1A
`
`Processing
`Cluster 101
`
`7
`
`fl 11d
`;' 3,
`=
`'
`
`Processing
`Cluster 103
`
`A
`
`111a“,
`
`I’
`
`A
`
`’ _____\/—1l1c
`
`Processing
`Cluster 105
`
`‘
`
`Processing
`Cluster 107
`
`Fig. 1B
`
`Processing
`Cluster 121
`
`‘
`
`Processing
`Cluster 123
`
`,
`
`14iaJ“‘;""'
`
`I
`
`'§F'\—141d
`
`Switch 131
`
`141bx ,_
`
`A K-141c
`
`Processing
`Cluster 125
`
`‘
`
`'
`
`Processing
`Cluster 127
`
`
`
`U.S. Patent
`
`Oct. 9, 2007
`
`0
`
`2B$
`
`h6.:NSF_F1..£8aimx
`2AUNMNJ_.%mH_.,.,..
`
`f
`
`Nma
`
`
`
`m.§m2OBofiom
`
`___5])Imoomk
`
`mm
`
`
`
`9wmomnommoooumomomuommoooam
`
`
`
`
`
`owoNI/
`
`2:Ipofiufim70WSmon.I.II
`yllllEmon
`0,wfiofim«om'&mofim1'
`
`OHMgozamOD
`
`;8808)
`
`
`
`
`
`
`U.S. Patent
`
`0a. 9, 2007
`
`Sheet 3 0f 9
`
`US 7,281,055 B2
`
`w?wcom
`
`
`
`mom po?zm mom @Emmm 3083a“
`
`
`
`H H m 8362a EBUQQQQQZ
`
`m .5
`
`
`
`Em 3863M E8200
`
`
`
`U.S. Patent
`
`mu.
`
`7002
`
`9f04W
`
`US 7,281,055 B2
`
`S$31
`
`nwov
`
`wgommuoogm
`
`04......
`
`uxmzmwcsm
`
`E%Eum
`
`owov
`
`«wow
`
`wam
`
`
`
`U.S. Patent
`
`0a. 9, 2007
`
`Sheet 5 0f 9
`
`US 7,281,055 B2
`
`W
`Local Map
`Global Map
`
`222E929
`Local Map
`
`Quad3
`
`\
`
`Node4
`‘\
`Node3 ‘
`\
`\
`Nodez _)TV QuadZ ‘ \\
`Node 1
`\\
`Quad 1
`g“
`‘\\\
`\
`Quad 0
`\ “\\
`‘\
`\
`\\
`\
`
`NodeO
`Quad 3
`
`“o
`N0d64
`502 \ Node3 \*\ Quad 3
`Node 2
`Quad 2
`Node 1
`Quad 1
`1’
`Node 0
`QuadO
`Node4 ,e’”
`Quad 2
`
`,’
`
`‘1 \\ \ \\
`\ \\ \ \\
`\ \, \
`\ \\
`‘
`\\ \\
`506 \ \\\\
`\\
`‘\ \
`
`Node 4
`503 \ Node 3
`Node 2
`Node 1
`Node 0
`Node 4
`Quad 1
`
`Node 4
`
`504
`
`Quad 0
`
`Node 3
`
`509
`
`Node3 /_ 510
`
`508
`
`511
`
`Fig. 5
`
`
`
`U.S. Patent
`
`0a. 9, 2007
`
`Sheet 6 0f 9
`
`US 7,281,055 B2
`
`/ 600
`
`Cluster 1
`
`Cluster 2
`
`L4; - Link number
`N# - Node number
`
`Cluster 3
`
`F1g. 6A
`
`Local Table
`
`Global Table
`
`Dest Node
`N0
`N1
`X
`L0
`L0
`X
`X
`L0
`L0
`X
`X
`L0
`L0
`X
`X
`L,
`Lo
`X
`
`Dest Cluster
`c1
`C2
`C0
`NA NA
`NA
`X
`L1
`L2
`NA NA
`NA
`Ll
`X
`L2
`NA NA
`NA
`L2
`L;
`X
`NA NA
`NA
`L2
`L2
`LI
`
`C3
`NA
`L1
`NA
`L2
`NA
`L1
`NA
`X
`
`Cluster 1
`
`SOUI'CG
`Cluster 0
`Node 0
`Node 1
`Node 0
`Node 1
`Node 0
`Node 1
`Node 0
`Node 1
`
`Cluster 2
`
`Cluster 3
`
`
`
`U.S. Patent
`
`Oct. 9, 2007
`
`Sheet 7 0f 9
`
`US 7,281,055 B2
`
`start
`
`V
`
`Receive locally /_‘ 702
`generated transaction
`
`V
`
`Allocate space in /— 704
`pending buffer
`
`V
`Append global transaction tag
`and cluster 1]) and transmit ’
`transaction
`
`706
`
`'7
`
`Receive incoming
`transmissions related to ’
`transaction
`
`70g
`
`Index incoming
`transmission in pending ’
`buffer using global tag
`
`710
`
`7 l 2
`——<Local tag required? >/_
`
`Use local tag ?rom /— 714
`pending buffer entry
`
`end
`
`
`
`U.S. Patent
`
`Oct. 9, 2007
`
`Sheet 8 0f 9
`
`US 7,281,055 B2
`
`start
`
`V
`
`Receive remotely
`generated transaction
`
`V
`
`Assign local
`transaction tag
`
`V
`
`Allocate space in
`pending buffer
`
`/—— 806
`
`V
`Insert entry with global
`and local tags in pending
`buffer
`
`Fig. 8
`
`V
`
`Receive outgoing
`transmission
`
`V
`Index outgoing
`transmission in pending
`buffer using local tag
`
`Use global tag for this and
`related subsequent
`transactions
`
`end
`
`
`
`U.S. Patent
`
`0a. 9, 2007
`
`Sheet 9 0f 9
`
`US 7,281,055 B2
`
`m8
`
`y f3 05 m8 A / Na
`
`o ‘04 Eu‘lu PGTQ P6 qAlo Q‘IEQ
`
`
`
`
`
`
`98m WSL‘IIOQ 18 28 £8 28 BO 28 Z8 38
`
`@ .wE
`
`8m \
`
`A W!
`#co mom
`
`
`
`mom 333%
`
`@8320
`
`02 o u v Q 02 0
`
`
`
`28 38 3% 2%“ So 28 ES
`
`33‘! o
`
`/ n8
`
`mg A
`
`o
`
`
`@NQI/ Q EQI/ \lam
`
`
`
`_ , @8320
`
`
`
`NANQ 080E
`
`
`
`owo .6330
`
`805mm
`
`
`
`US 7,281,055 B2
`
`1
`ROUTING MECHANISMS IN SYSTEMS
`HAVING MULTIPLE MULTI-PROCESSOR
`CLUSTERS
`
`BACKGROUND OF THE INVENTION
`
`The present invention relates generally to multi-processor
`computer systems. More speci?cally, the present invention
`provides techniques for building computer systems having a
`plurality of multi-processor clusters.
`A relatively neW approach to the design of multi-proces
`sor systems replaces broadcast communication among pro
`cessors With a point-to-point data transfer mechanism in
`Which the processors communicate similarly to netWork
`nodes in a tightly-coupled computing system. That is, the
`processors are interconnected via a plurality of communi
`cation links and requests are transferred among the proces
`sors over the links according to routing tables associated
`With each processor. The intent is to increase the amount of
`information transmitted Within a multi-processor platform
`per unit time.
`One limitation associated With such an architecture is that
`the node ID address space associated With the point-to-point
`infrastructure is ?xed, therefore alloWing only a limited
`number of nodes to be interconnected. In addition, the
`infrastructure is ?at, therefore alloWing a single level of
`mapping for address spaces and routing functions. It is
`therefore desirable to provide techniques by Which computer
`systems employing such an infrastructure as a basic building
`block are not so limited.
`
`SUMMARY OF THE INVENTION
`
`According to the present invention, a multi-processor
`system is provided in Which a plurality of multi-processor
`clusters, each employing a point-to-point communication
`infrastructure With a ?xed node ID space and ?at request
`mapping functions, are interconnected using additional
`point-to-point links in such a manner as to enable more
`processors to be interconnected than Would otherWise be
`possible With the local point-to-point architecture. The
`invention employs a mapping hierarchy to uniquely map
`various types of information from local, cluster-speci?c
`spaces to globally shared spaces.
`Thus, the present invention provides an interconnection
`controller for use in a computer system having a plurality of
`processor clusters interconnected by a plurality of global
`links. Each cluster includes a plurality of local nodes and an
`instance of the interconnection controller interconnected by
`a plurality of local links. The interconnection controller
`includes circuitry Which is operable to map locally generated
`transmissions directed to others of the clusters to the global
`links, and remotely generated transmissions directed to the
`local nodes to the local links. According to a speci?c
`embodiment, a computer system employing such an inter
`connection controller is also provided.
`A further understanding of the nature and advantages of
`the present invention may be realiZed by reference to the
`remaining portions of the speci?cation and the draWings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIGS. 1A and 1B are diagrammatic representations
`depicting systems having multiple clusters.
`FIG. 2 is a diagrammatic representation of an exemplary
`cluster having a plurality of processors for use With speci?c
`embodiments of the present invention.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`FIG. 3 is a diagrammatic representation of an exemplary
`interconnection controller for facilitating various embodi
`ments of the present invention.
`FIG. 4 is a diagrammatic representation of a local pro
`cessor for use With various embodiments of the present
`invention.
`FIG. 5 is a diagrammatic representation of a memory
`mapping scheme according to a particular embodiment of
`the invention.
`FIG. 6A is a simpli?ed block diagram of a four cluster
`system for illustrating a speci?c embodiment of the inven
`tion.
`FIG. 6B is a combined routing table including routing
`information for the four cluster system of FIG. 6A.
`FIGS. 7 and 8 are ?oWcharts illustrating transaction
`management in a multi-cluster system according to speci?c
`embodiments of the invention.
`FIG. 9 is a diagrammatic representation of communica
`tions relating to an exemplary transaction in a multi-cluster
`system.
`
`DETAILED DESCRIPTION OF SPECIFIC
`EMBODIMENTS
`
`Reference Will noW be made in detail to some speci?c
`embodiments of the invention including the best modes
`contemplated by the inventors for carrying out the invention.
`Examples of these speci?c embodiments are illustrated in
`the accompanying draWings. While the invention is
`described in conjunction With these speci?c embodiments, it
`Will be understood that it is not intended to limit the
`invention to the described embodiments. On the contrary, it
`is intended to cover alternatives, modi?cations, and equiva
`lents as may be included Within the spirit and scope of the
`invention as de?ned by the appended claims. Multi-proces
`sor architectures having point-to-point communication
`among their processors are suitable for implementing spe
`ci?c embodiments of the present invention. In the folloWing
`description, numerous speci?c details are set forth in order
`to provide a thorough understanding of the present inven
`tion. The present invention may be practiced Without some
`or all of these speci?c details. Well knoWn process opera
`tions have not been described in detail in order not to
`unnecessarily obscure the present invention. Furthermore,
`the present application’s reference to a particular singular
`entity includes that possibility that the methods and appa
`ratus of the present invention can be implemented using
`more than one entity, unless the context clearly dictates
`otherWise.
`FIG. 1A is a diagrammatic representation of one example
`of a multiple cluster, multiple processor system Which may
`employ the techniques of the present invention. Each pro
`cessing cluster 101, 103, 105, and 107 includes a plurality of
`processors. The processing clusters 101, 103, 105, and 107
`are connected to each other through point-to-point links
`111a-f The multiple processors in the multiple cluster
`architecture shoWn in FIG. 1A share a global memory space.
`In this example, the point-to-point links 111a-f are internal
`system connections that are used in place of a traditional
`front-side bus to connect the multiple processors in the
`multiple clusters 101, 103, 105, and 107. The point-to-point
`links may support any point-to-point coherence protocol.
`FIG. 1B is a diagrammatic representation of another
`example of a multiple cluster, multiple processor system that
`may employ the techniques of the present invention. Each
`processing cluster 121, 123, 125, and 127 is coupled to a
`sWitch 131 through point-to-point links 141a-d. It should be
`
`
`
`US 7,281,055 B2
`
`3
`noted that using a switch and point-to-point links allows
`implementation with fewer point-to-point links when con
`necting multiple clusters in the system. A switch 131 can
`include a general purpose processor with a coherence pro
`tocol interface. According to various implementations, a
`multi-cluster system shown in FIG. 1A may be expanded
`using a switch 131 as shown in FIG. 1B.
`FIG. 2 is a diagrammatic representation of a multiple
`processor cluster such as, for example, cluster 101 shown in
`FIG. 1A. Cluster 200 includes processors 202a-202d, one or
`more Basic I/O systems (BIOS) 204, a memory subsystem
`comprising memory banks 206a-206d, point-to-point com
`munication links 208a-208e, and a service processor 212.
`The point-to-point communication links are con?gured to
`allow interconnections between processors 202a-202d, I/O
`switch 210, and interconnection controller 230. The service
`processor 212 is con?gured to allow communications with
`processors 202a-202d, 1/0 switch 210, and interconnection
`controller 230 via a J TAG interface represented in FIG. 2 by
`links 21411-214]. It should be noted that other interfaces are
`supported. I/O switch 210 connects the rest of the system to
`I/O adapters 216 and 220, and to BIOS 204 for booting
`purposes.
`According to speci?c embodiments, the service processor
`of the present invention has the intelligence to partition
`system resources according to a previously speci?ed parti
`tioning schema. The partitioning can be achieved through
`direct manipulation of routing tables associated with the
`system processors by the service processor which is made
`possible by the point-to-point communication infrastructure.
`The routing tables can also be changed by execution of the
`BIOS code in one or more processors. The routing tables are
`used to control and isolate various system resources, the
`connections between which are de?ned therein.
`The processors 202a-d are also coupled to an intercon
`nection controller 230 through point-to-point links 232a-d.
`According to various embodiments and as will be described
`below in greater detail, interconnection controller 230 per
`forms a variety of functions which enable the number of
`interconnected processors in the system to exceed the node
`ID space and mapping table limitations associated with each
`of a plurality of processor clusters. According to some
`embodiments, interconnection controller 230 performs a
`variety of other functions including the maintaining of cache
`coherency across clusters. Interconnection controller 230
`can be coupled to similar controllers associated with other
`multiprocessor clusters. It should be noted that there can be
`more than one such interconnection controller in one cluster.
`Interconnection controller 230 communicates with both pro
`cessors 202a-d as well as remote clusters using a point-to
`point protocol.
`More generally, it should be understood that the speci?c
`architecture shown in FIG. 2 is merely exemplary and that
`embodiments of the present invention are contemplated
`having different con?gurations and resource interconnec
`tions, and a variety of alternatives for each of the system
`resources shown. However, for purpose of illustration, spe
`ci?c details of cluster 200 will be assumed. For example,
`most of the resources shown in FIG. 2 are assumed to reside
`on a single electronic assembly. In addition, memory banks
`206a-206d may comprise double data rate (DDR) memory
`which is physically provided as dual in-line memory mod
`ules (DIMMs). I/O adapter 216 may be, for example, an
`ultra direct memory access (UDMA) controller or a small
`computer system interface (SCSI) controller which provides
`access to a permanent storage device. I/O adapter 220 may
`be an Ethernet card adapted to provide communications with
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`a network such as, for example, a local area network (LAN)
`or the Internet. BIOS 204 may be any persistent memory like
`?ash memory.
`According to one embodiment, service processor 212 is a
`Motorola MPC855T microprocessor which includes inte
`grated chipset functions, and interconnection controller 230
`is an Application Speci?c Integrated Circuit (ASIC) sup
`porting the local point-to-point coherence protocol. Inter
`connection controller 230 can also be con?gured to handle
`a non-coherent protocol to allow communication with I/O
`devices. In one embodiment, interconnection controller 230
`is a specially con?gured programmable chip such as a
`programmable logic device or a ?eld programmable gate
`array. In another embodiment, the interconnect controller
`230 is an Application Speci?c Integrated Circuit (ASIC). In
`yet another embodiment, the interconnect controller 230 is
`a general purpose processor augmented with an ability to
`access and process interconnect packet traf?c.
`FIG. 3 is a diagrammatic representation of one example of
`an interconnection controller 230 for facilitating various
`aspects of the present invention. According to various
`embodiments, the interconnection controller includes a pro
`tocol engine 305 con?gured to handle packets such as
`probes and requests received from processors in various
`clusters of a multiprocessor system. The functionality of the
`protocol engine 305 can be partitioned across several
`engines to improve performance. In one example, partition
`ing is done based on packet type (request, probe and
`response), direction (incoming and outgoing), or transaction
`?ow (request ?ows, probe ?ows, etc).
`The protocol engine 305 has access to a pending buffer
`309 that allows the interconnection controller to track trans
`actions such as recent requests and probes and associate the
`transactions with speci?c processors. Transaction informa
`tion maintained in the pending buffer 309 can include
`transaction destination nodes, the addresses of requests for
`subsequent collision detection and protocol optimizations,
`response information, tags, and state information. As will
`become clear, this functionality is leveraged to enable par
`ticular aspects of the present invention.
`The interconnection controller has a coherent protocol
`interface 307 that allows the interconnection controller to
`communicate with other processors in the cluster as well as
`external processor clusters. The interconnection controller
`may also include other interfaces such as a non-coherent
`protocol interface 311 for communicating with I/O devices
`(e.g., as represented in FIG. 2 by links 2080 and 208d).
`According to various embodiments, each interface 307 and
`311 is implemented either as a full crossbar or as separate
`receive and transmit units using components such as mul
`tiplexers and buffers. It should be noted that the intercon
`nection controller 230 does not necessarily need to provide
`both coherent and non-coherent interfaces. It should also be
`noted that an interconnection controller 230 in one cluster
`can communicate with an interconnection controller 230 in
`another cluster.
`According to various embodiments of the invention,
`processors 202a-202d are substantially identical. FIG. 4 is a
`simpli?ed block diagram of such a processor 202 which
`includes an interface 402 having a plurality of ports 404a
`4040 and routing tables 406a-406c associated therewith.
`Each port 404 allows communication with other resources,
`e.g., processors or I/O devices, in the computer system via
`associated links, e.g., links 208a-208e of FIG. 2.
`The infrastructure shown in FIG. 4 can be generaliZed as
`a point-to-point, distributed routing mechanism which com
`prises a plurality of segments interconnecting the systems
`
`
`
`US 7,281,055 B2
`
`5
`processors according to any of a variety of topologies, e.g.,
`ring, mesh, etc. Each of the endpoints of each of the
`segments is associated With a connected processor Which has
`a unique node ID and a plurality of associated resources
`Which it “oWns,” e.g., the memory and I/O to Which it’s
`connected.
`The routing tables associated With each of the nodes in the
`distributed routing mechanism collectively represent the
`current state of interconnection among the computer system
`resources. Each of the resources (e.g., a speci?c memory
`range or I/O device) oWned by any given node (e.g.,
`processor) is represented in the routing table(s) associated
`With the node as an address. When a request arrives at a
`node, the requested address is compared to a tWo level entry
`in the node’s routing table identifying the appropriate node
`and link, i.e., given a particular address Within a range of
`addresses, go to node x; and for node x use link y.
`As shoWn in FIG. 4, processor 202 can conduct point-to
`point communication With three other processors according
`to the information in the associated routing tables. Accord
`ing to a speci?c embodiment, routing tables 406a-406c
`comprise tWo-level tables, a ?rst level associating the unique
`addresses of system resources (e.g., a memory bank) With a
`corresponding node (e.g., one of the processors), and a
`second level associating each node With the link (e.g.,
`208a-208e) to be used to reach the node from the current
`node.
`Processor 202 also has a set of JTAG handshake registers
`408 Which, among other things, facilitate communication
`betWeen the service processor (e.g., service processor 212 of
`FIG. 2) and processor 202. That is, the service processor can
`Write routing table entries to handshake registers 408 for
`eventual storage in routing tables 406a-406c. It should be
`understood that the processor architecture depicted in FIG.
`4 is merely exemplary for the purpose of describing a
`speci?c embodiment of the present invention. For example,
`a feWer or greater number of ports and/ or routing tables may
`be used to implement other embodiments of the invention.
`As mentioned above, the basic protocol upon Which the
`clusters in speci?c embodiments of the invention are based
`provides for a limited node ID space Which, according to a
`particular implementation, is a 3-bit space, therefore alloW
`ing for the unique identi?cation of only 8 nodes. That is, if
`this basic protocol is employed Without the innovations
`represented by the present invention, only 8 nodes may be
`interconnected in a single cluster via the point-to-point
`infrastructure. To get around this limitation, the present
`invention introduces a hierarchical mechanism Which pre
`serves the single-layer identi?cation scheme Within particu
`lar clusters While enabling interconnection With and com
`munication betWeen other similarly situated clusters and
`processing nodes.
`According to a speci?c embodiment, one of the nodes in
`each multi-processor cluster is an interconnection controller,
`e.g., interconnection controller 230 of FIG. 2, Which man
`ages the hierarchical mapping of information thereby
`enabling multiple clusters to share a single memory address
`space While simultaneously alloWing the processors Within
`its cluster to operate and to interact With any processor in
`any cluster Without “knowledge” of anything outside of their
`oWn cluster. The interconnection controller appears to its
`associated processor to be just another one of the processors
`or nodes in the cluster.
`In the basic protocol, When a particular processor in a
`cluster generates a request, a set of address mapping tables
`are employed to map the request to one of the other nodes
`in the cluster. That is, each node in a cluster has a portion of
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`a shared memory space With Which it is associated. There are
`different types of address mapping tables for main memory,
`memory-mapped I/O, different types of I/O space, etc. These
`address mapping tables map the address identi?ed in the
`request to a particular node in the cluster.
`A set of routing tables are then employed to determine
`hoW to get from the requesting node to the node identi?ed
`from the address mapping table. That is, as discussed above,
`each processor (i.e., cluster node) has associated routing
`tables Which identify a particular link in the point-to-point
`infrastructure Which may be used to transmit the request
`from the current node to the node identi?ed from the address
`mapping tables. Although generally a node may correspond
`to one or a plurality of resources (including, for example, a