`
`(12)
`
`United States Patent
`Glasco et a].
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 7,155,525 B2
`Dec. 26, 2006
`
`(54) TRANSACTION MANAGEMENT IN
`SYSTEMS HAVING MULTIPLE
`MULTI-PROCESSOR CLUSTERS
`
`(75) Inventors: David Brian Glasco, Austin, TX (US);
`Carl Zeitler, Tomball, TX (US);
`Rajesh Kota’ Austin, TX (Us); Guru
`Prasadh Austin TX
`Richard R_
`Oehler ’SOmerS’NY ms)’
`’
`’
`(73) Assignee Newisys Inc Austin TX (Us)
`'
`’
`"
`’
`
`5,682,512 A 10/1997 Tetrick ..................... .. 711/202
`5,692,123 A 11/1997 Logghe
`5,781,187 A
`7/1998 Gephardt et a1.
`5,796,605 A
`8/1998 Hagersten
`5,805,839 A
`9/1998 Singhal
`5,819,075 A 10/1998 Forsmo
`5,822,531 A 10/1998 GorcZyca et a1.
`5,931,938 A
`8/1999 Drogichen et al. ......... .. 712/15
`6,003,075 A 12/1999 Arendtetal.
`6,038,651 A
`3/2000 VanHuben 6161.
`6,047,332 A *
`4/2000 Viswanathan et a1. .... .. 709/245
`6,085,295 A
`7/2000 Ekanadham et al.
`
`( * ) Notice:
`
`
`
`Subject to any disclaimer’ the tenn ofthis patent is extended or adjusted under 35
`
`A
`
`HOClZlC et al. . . . . . . . . . . . . ..
`
`U.S.C. 154(b) by 739 days.
`
`(Continued)
`
`(21) Appl. N0.: 10/157,384
`
`(22) Filed:
`
`May 28, 2002
`
`FOREIGN PATENT DOCUMENTS
`
`EP
`
`0978781
`
`2/2000
`
`(65)
`
`Prior Publication Data
`US 2003/0233388 A1
`Dec. 18, 2003
`
`d)
`(C t_
`on 1nue
`
`OTHER PUBLICATIONS
`
`(51)
`
`5/16
`G06F 15/167
`
`(2006 01)
`(2006'01)
`
`International Search Report dated Jul. 30, 2004, from corresponding
`PCT Application No. PCT/US2003/034687 (9 pages).
`
`(52) US. Cl. .................... .. 709/229; 709/212; 709/216;
`
`(Continued)
`
`709/228
`
`'
`
`,
`
`.
`
`.
`
`(58) Field of Classi?cation Search .............. .. 709/201,
`709/2l2i2l6, 227, 228, 229; 714/100
`See application ?le for complete search history
`'
`
`0135x150
`g .
`(74) Attorney’ Agent’ or FlrmiBeyer Weaver & Thomas
`LLP
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`(57)
`
`ABSTRACT
`
`5/1987 Allen et al. ............... .. 709/234
`4,667,287 A
`4,783,687 A 11/1988 Rees
`5,125,081 A
`6/1992 Chiba
`5,166,674 A 11/1992 Baum et a1. .............. .. 714/752
`5,191,651 A
`3/1993 Halim et al.
`709/250
`
`. . . .. 712/3
`3/1993 Chen et al. . . . . .
`5,197,130 A
`714/23
`4/1994 Fushimi et al. .
`5,301,311 A *
`5,371,852 A 12/1994 Attanasio et al. ......... .. 709/245
`5,561,768 A 10/1996 Smith ........................ .. 712/13
`5,623,644 A *
`4/1997 Self et al. ................. .. 713/503
`
`A multi-processor computer system is described in Which
`address mapping, routing, and transaction identi?cation
`mechanisms are provided Which enable the interconnection
`of a plurality of multi-processor clusters, Wherein the num
`ber of processors interconnected exceeds limited address,
`node identi?cation, and transaction tag spaces associated
`With each of the individual clusters.
`
`30 Claims, 9 Drawing Sheets
`
`Processing
`Cluster 103
`
`5
`
`Pmcessi
`Cluster 105
`
`l
`
`J
`
`E
`
`o
`
`Pmcessi
`Cluster 1%]
`
`‘I: “[1410
`La’ Processing
`Cluster 127
`
`1
`
`Ptccessi
`Cluster
`L__
`
`
`
`US 7,155,525 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`6,151,663 A 11/2000 Pawlowski er 91-
`6,167,492 A 12/2000 Keller et al. .............. .. 711/154
`6,209,065 B1
`3/2001 Van Doren er 91-
`6,219,775 B1
`4/2001 Wade et al. ................ .. 712/11
`6,226,671 131*
`5/2001 Hagersten er a1
`709/215
`6,256,671 B1
`7/2001 StrentZsch et al.
`709/227
`6,259,701 B1
`7/2001 Shur et al. ......... ..
`370/401
`6,331,983 B1
`12/2001 Haggerty er a1
`370/400
`6338.122 B1
`V2002 Baumgartner er a1
`711/141
`6,349,091 B1
`2/2002 Li --------------------- -
`370/238
`6,370,585 B1
`4/2002 Hagersten et al.
`.. 709/238
`6,377,640 B1
`‘V2002 Fans
`6,385,174 B1
`5/2002 L1 ............................ .. 370/252
`6,385,705 B1
`5/2002 Keller et al. ....... ..
`711/154
`6,397,255 B1
`5/2002 Nurenberg et al. ....... .. 709/228
`6,463,529 B1
`10/2002 Miller et al.
`6,467,007 B1
`10/2002 Armstrong et al.
`6,490,661 B1
`12/2002 Keller et al. .............. .. 711/150
`6,553,439 B1
`4/2003 Greger et a1.
`6,578,071 B1 *
`6/2003 Hagersten et al. ........ .. 709/215
`6,598,130 B1
`7/2003 Harris et al.
`6,687,751 B1
`2/2004 Wils et al. ................ .. 709/230
`6,690,757 B1
`2/2004 Bunton et al.
`6,718,552 B1
`4/2004 Goode ....................... .. 725/95
`6,760,819 B1
`7/2004 Dhong et al. ...... ..
`711/146
`6,772,226 B1
`8/2004 Bommareddy et al.
`709/245
`6,785,726 B1
`8/2004 Freeman et a1.
`6,820,174 B1
`11/2004 Vanderwiel
`711/153
`6,826,660 131* ll/2004 Hagersten 61 al.
`6,847,993 B1 *
`1/2005 NoVaes et al. ............ .. 709/221
`6,854,069 B1
`2/2005 Kampe er a1.
`. 370/390
`6,856,621 B1
`2/2005 Artes .......... ..
`6,920,519 B1* 7/2005 Beukema et al. ......... .. 710/306
`6,977,908 B1
`12/2005 De AZevedo et al.
`
`3/2006 Kampe et al.
`7,010,617 B1
`5/2006 Chou etal.
`7,043,569 B1
`8/2001 Becket al. ............... .. 370/401
`2001/0014097 A1
`2001/0037435 A1 11/2001 Van Doren
`2002/0004915 A1
`1/2002 Fung ........................ .. 713/320
`2002/0007463 A1
`1/2002 Fung et 31‘
`2002/0156888 A1 10/2002 Lee etal. ................. .. 709/224
`2002/0l57035 A1 10/2002 Wong et 31‘
`2002/0174168 A1
`11/2002 Beukema et 31‘
`2003/0225909 A1 12/2003 Glasco etal. ............. .. 709/245
`2003/0225938 A1 12/2003 Glasco etal.
`713/375
`2004/0098475 A1
`5/2004 Zeitleret al. ............. .. 709/223
`
`FOREIGN PATENT DOCUMENTS
`
`W0
`
`WO 02/39242
`
`5/2002
`
`OTHER PUBLICATIONS
`
`D. E. Culler, J. P. Singh, A. Gupta, “Parallel Computer Architec
`ture”, 1999 Morgan Kaufmann, San Francisco, CA USA
`XP002277658.
`Andrew Tanenbaum, “Computer Networks”, Computer Networks,
`London: Prentice Hall International, GB, 1996, pp. 345-403,
`XP002155220.
`1.03,
`HyperTransportTM I/O Link Speci?cation Revision
`HyperTransportTM Consortium, Oct. 10, 2001, Copyright ©
`HyperTransport Technology Consortium.
`U.S.Appl. No. 10/356,393, ?led Jan. 30, 2003, Of?ce Action mailed
`Apr. 18, 2006.
`European Search Report, Application No, 03 778 0277-2211,
`Mailed Mar, 29, 2006,
`Of?ceActionmailedJul. 6, 2006 inU.S.Appl. N0. 10/156,893,?1ed
`May 23, 2002,
`
`* cited by examiner
`
`
`
`U.S. Patent
`
`Dec. 26, 2006
`
`Sheet 1 0f 9
`
`US 7,155,525 B2
`
`Fig. 1A
`
`Processing
`Cluster 101
`
`/—1 1 1d
`'3
`:‘
`
`1
`
`‘
`Prssessing
`I
`jg Cluster 103
`
`I
`
`Processing
`Cluster 105
`
`\‘
`
`Processing
`Cluster 107
`
`Fig. 1B
`
`[___________
`Precessing
`Cluster 121
`
`Processing
`Cluster 123
`
`Processing
`Cluster 125
`
`Processin g
`Cl t 127
`us er
`
`
`
`U.S. Patent
`
`Dec. 26,2006
`
`Sheet 2 of 9
`
`US 7,155,525 B2
`
`uofiofim
`
`BzH
`
`US
`
`Nuommououm
`
`
`
`1qxisom
`
`oHNExamOh
`
`mmmm
`
`..m..V...
`
`.nn_.
`
`omm.8=o.:.:oU
`
`Nma
`
`4
`
`
`
`w.H0um._.._._UmwO...HH®.%_
`
`in
`
`ootcom
`
`mamHowmooowm
`
`88
`
`gofiofim
`
`38-
`
`
`
`
`
`
`U.S. Patent
`
`Dec. 26,2006
`
`Sheet 3 of 9
`
`US 7,155,525 B2
`
`momsham
`wfiwsom
`
`
`
`momuimsmHooououm
`
`fl
`
`
`
`SmoonfiofiwEubnoomoz
`
`mma
`
`
`
`EmoomfiugfiE8300
`
`omm
`
`
`
`
`
`U.S. Patent
`
`Dec. 26, 2006
`
`Sheet 4 0f 9
`
`US 7,155,525 B2
`
`NENK
`
`hum,“ /\/ £3
`
`
`
`lillliillv M6 M16 {JIIFIIv
`
`llllllv 6.5 96 a? “may, 63 56 lllililv
`
`
`
`+i|elt||1 db 0 TE]!
`
`Till- v30 68 v30 A
`
`
`
`Alllllll snags snmwgu Allll-lll
`
`w .wE
`
`nwow
`
`‘Til v30
`
`
`
`llllllllv 6.8 $6
`
`Allllllll. s05 95
`
`.._ 4.5 howwooohm
`
`
`
`tllll'liL 50 @058 052%
`
`a Maximum
`
`Em cm
`
`f‘ SN
`
`
`
`Reg nesting 5 Quad
`Local Map
`Global Map
`Node4 '\
`/— 505
`Node3 \ \‘
`Qua“ -,
`M Quad 2
`‘-.
`Node 2
`\
`x
`\
`Quad 1
`‘N
`‘\
`Node 1
`\
`‘ ‘Xx
`Quad 0
`\_
`Node 0
`‘
`
`“k ‘1:, \\ “\ \:\\
`3
`\i‘‘.
`\N\\\\
`-
`‘t
`\\ ‘x
`Node 4
`Nod“ f\
`\ ‘,\
`\
`502
`Node3
`\ Node3 ‘\ Quad 3
`X‘ \X‘
`\\ \
`Node 2
`Node 2
`{god 2
`\ \\
`x‘ ‘\
`Node 1
`Node 1
`1/ Quad 1
`X \\
`a
`Node 0
`\\
`Node 0 /
`QuadO
`‘\ ‘\\\
`\\ ‘d Node 4
`Node 4 ,e"
`506 \ \\\
`\\
`\
`‘- \\
`\ Quad 2
`Quad 2
`\\
`\\ X‘
`‘
`Node 4
`Node 4 \\
`‘\ \\ \\
`Node 3 /~ 0
`‘\ NOd63
`\\
`X‘ \\
`Node 2
`Node 2
`507 ix‘
`\‘ \
`Node 1
`Node 1
`“
`\‘ \\
`Node 0
`Node 0
`‘x
`\\
`\\
`Node 4
`Node 4
`‘\1
`‘a ‘
`Quad 1
`1‘ Quad 1
`\
`
`‘
`
`\
`
`‘1.
`\‘
`\\
`\
`‘\
`\\
`\
`
`U.S. Patent
`
`Dec. 26, 2006
`
`Sheet 5 0f 9
`
`US 7,155,525 B2
`
`501
`
`Tar et uad
`local Map
`
`\‘\~
`‘\x
`‘
`
`509
`
`“
`
`\
`
`\
`
`\
`
`1
`
`Node 4
`Node 3 f’
`Node 2
`Node 1
`Node 0
`Quad 0
`
`504
`
`Node 4
`
`2
`
`Quad 0
`
`508
`
`Fig. 5
`
`
`
`U.S. Patent
`
`Dec. 26, 2006
`
`Sheet 6 6f 9
`
`US 7,155,525 B2
`
`C_lg_s_t_er_9 ____________________ __
`:602 -\
`606
`
`f 600
`
`Cluster 1
`
`L#— Link number
`Nr: — Node number
`
`F1 g. 6A
`
`Local Table
`
`Global Table
`
`Dest Node
`N0
`N1
`Source
`X
`L6
`Cluster 0
`Node 0
`L0
`X
`Node 1
`X
`L0
`Node 0
`Lo
`X
`Node 1
`X
`L,
`Node 0
`L,
`X
`Node 1
`X
`Ln
`Node 0
`Node 1 L L0
`X
`
`Cluster 2
`
`Cluster 3
`
`Cluster 1
`
`Dest Cluster
`Cl
`C2
`Cu
`NA NA
`NA
`X
`L,
`L2
`NA NA
`NA
`L1
`X
`L2
`NA NA
`NA
`L2
`L;
`X
`NA NA
`NA
`L2
`L;
`L,
`
`C3
`NA
`L1
`NA
`L2
`NA
`L;
`NA
`X
`
`Fig. 6B
`
`
`
`U.S. Patent
`
`Dec. 26, 2006
`
`Sheet 7 0f 9
`
`US 7,155,525 B2
`
`Receivs lucally
`generated transac?on
`
`Alienate space in
`pending bufffsr
`
`V
`Append global transaction tag
`and cluster 3D and transmit
`transaction
`
`Fig. 7
`
`Receive incoming
`transmisslons related to
`transactwn
`
`b
`Index incoming
`transmission in pending
`buffer us ‘mg global tag
`
`Use local tag ?‘om
`pending buffer entry
`
`
`
`U.S. Patent
`
`Dec. 26, 2006
`
`Sheet 8 0f 9
`
`US 7,155,525 B2
`
`{
`
`start
`
`i
`
`V
`
`Receive remotely f- 302
`generated transaction
`
`I
`
`Assign local W“ 804
`transaction tag
`
`V
`
`Allocate space in f‘ 806
`pending buffer
`
`tr
`lnsert entry with global
`and local tags in pending /
`buffer
`
`80g
`
`Fig. 8
`
`V
`
`Receive outgoing r 810
`transmission
`
`V
`
`Index outgoing
`transmission in pending Y
`buffer using local tag
`
`8 12
`
`V
`
`Use global tag for this and
`related subsequent
`transactions
`
`314
`/—_
`
`end
`
`
`
`U.S. Patent
`
`Dec. 26, 2006
`
`Sheet 9 0f 9
`
`US 7,155,525 B2
`
`mlmmo
`mémm
`
`gm
`
`wmm
`
`mom
`
`@ .wE
`
`m- 3m
`
`DmU
`mimom U
`DmU
`
`3% ,
`
`T/
`
`N- :5 Duo 7 gm 0 Q 0
`
`Tm
`
`O2
`
`Tmmm
`Tam 0
`
`w H m ‘I woo Q 5
`
`
`
`owm $656
`
`080$
`
`
`
`92» 205mm
`
`o; 652D
`
`
`
`wow LM _rr mom
`
`
`
`Tmom 75a 0 DmU
`
`N H m
`
`
`
`com 83.30
`
`323%
`
`
`
`US 7,155,525 B2
`
`1
`TRANSACTION MANAGEMENT IN
`SYSTEMS HAVING MULTIPLE
`MULTI-PROCESSOR CLUSTERS
`
`BACKGROUND OF THE INVENTION
`
`The present invention relates generally to multi-processor
`computer systems. More speci?cally, the present invention
`provides techniques for building computer systems having a
`plurality of multi-processor clusters.
`A relatively neW approach to the design of multi-proces
`sor systems replaces broadcast communication among pro
`cessors With a point-to-point data transfer mechanism in
`Which the processors communicate similarly to netWork
`nodes in a tightly-coupled computing system. That is, the
`processors are interconnected via a plurality of communi
`cation links and requests are transferred among the proces
`sors over the links according to routing tables associated
`With each processor. The intent is to increase the amount of
`information transmitted Within a multi-processor platform
`per unit time.
`One limitation associated With such an architecture is that
`the node ID address space associated With the point-to-point
`infrastructure is ?xed, therefore alloWing only a limited
`number of nodes to be interconnected. In addition, the
`infrastructure is ?at, therefore alloWing only a single level of
`mapping for address spaces and routing functions. It is
`therefore desirable to provide techniques by Which computer
`systems employing such an infrastructure as a basic building
`block are not so limited.
`
`SUMMARY OF THE INVENTION
`
`According to the present invention, a multi-processor
`system is provided in Which a plurality of multi-processor
`clusters, each employing a point-to-point communication
`infrastructure With a ?xed node ID space and ?at request
`mapping functions, are interconnected using additional
`point-to-point links in such a manner as to enable more
`processors to be interconnected than Would otherWise be
`possible With the local point-to-point architecture. The
`invention employs a mapping hierarchy to uniquely map
`various types of information from local, cluster-speci?c
`spaces to globally shared spaces.
`Thus, the present invention provides an interconnection
`controller for use in a computer system having a plurality of
`processor clusters. Each cluster includes a plurality of local
`nodes and an instance of the interconnection controller
`interconnected by a local point-to-point architecture. Each
`cluster has a local transaction space associated thereWith for
`uniquely identifying locally generated transactions Within
`the cluster. The interconnection controller includes circuitry
`Which is operable to uniquely map selected ones of locally
`generated transactions directed to others of the clusters to a
`global transaction space, and remotely generated transac
`tions directed to the local nodes to the local transaction
`space. According to a speci?c embodiment, a computer
`system employing such an interconnection controller is also
`provided.
`A further understanding of the nature and advantages of
`the present invention may be realiZed by reference to the
`remaining portions of the speci?cation and the draWings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIGS. 1A and 1B are diagrammatic representations
`depicting systems having multiple clusters.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`FIG. 2 is a diagrammatic representation of an exemplary
`cluster having a plurality of processors for use With speci?c
`embodiments of the present invention.
`FIG. 3 is a diagrammatic representation of an exemplary
`interconnection controller for facilitating various embodi
`ments of the present invention.
`FIG. 4 is a diagrammatic representation of a local pro
`cessor for use With various embodiments of the present
`invention.
`FIG. 5 is a diagrammatic representation of a memory
`mapping scheme according to a particular embodiment of
`the invention.
`FIG. 6A is a simpli?ed block diagram of a four cluster
`system for illustrating a speci?c embodiment of the inven
`tion.
`FIG. 6B is a combined routing table including routing
`information for the four cluster system of FIG. 6A.
`FIGS. 7 and 8 are ?oWcharts illustrating transaction
`management in a multi-cluster system according to speci?c
`embodiments of the invention.
`FIG. 9 is a diagrammatic representation of communica
`tions relating to an exemplary transaction in a multi-cluster
`system.
`
`DETAILED DESCRIPTION OF SPECIFIC
`EMBODIMENTS
`
`Reference Will noW be made in detail to some speci?c
`embodiments of the invention including the best modes
`contemplated by the inventors for carrying out the invention.
`Examples of these speci?c embodiments are illustrated in
`the accompanying draWings. While the invention is
`described in conjunction With these speci?c embodiments, it
`Will be understood that it is not intended to limit the
`invention to the described embodiments. On the contrary, it
`is intended to cover alternatives, modi?cations, and equiva
`lents as may be included Within the spirit and scope of the
`invention as de?ned by the appended claims. Multi-proces
`sor architectures having point-to-point communication
`among their processors are suitable for implementing spe
`ci?c embodiments of the present invention. In the folloWing
`description, numerous speci?c details are set forth in order
`to provide a thorough understanding of the present inven
`tion. The present invention may be practiced Without some
`or all of these speci?c details. Well knoWn process opera
`tions have not been described in detail in order not to
`unnecessarily obscure the present invention. Furthermore,
`the present application’s reference to a particular singular
`entity includes that possibility that the methods and appa
`ratus of the present invention can be implemented using
`more than one entity, unless the context clearly dictates
`otherWise.
`FIG. 1A is a diagrammatic representation of one example
`of a multiple cluster, multiple processor system Which may
`employ the techniques of the present invention. Each pro
`cessing cluster 101, 103, 105, and 107 includes a plurality of
`processors. The processing clusters 101, 103, 105, and 107
`are connected to each other through point-to-point links
`111117‘: The multiple processors in the multiple cluster
`architecture shoWn in FIG. 1A share a global memory space.
`In this example, the point-to-point links lllaif are internal
`system connections that are used in place of a traditional
`front-side bus to connect the multiple processors in the
`multiple clusters 101, 103, 105, and 107. The point-to-point
`links may support any point-to-point coherence protocol.
`
`
`
`US 7,155,525 B2
`
`3
`FIG. 1B is a diagrammatic representation of another
`example of a multiple cluster, multiple processor system that
`may employ the techniques of the present invention. Each
`processing cluster 121, 123, 125, and 127 is coupled to a
`sWitch 131 through point-to-point links 141a*d. It should be
`noted that using a sWitch and point-to-point links alloWs
`implementation With feWer point-to-point links When con
`necting multiple clusters in the system. A sWitch 131 can
`include a general purpose processor With a coherence pro
`tocol interface. According to various implementations, a
`multi-cluster system shoWn in FIG. 1A may be expanded
`using a sWitch 131 as shoWn in FIG. 1B.
`FIG. 2 is a diagrammatic representation of a multiple
`processor cluster such as, for example, cluster 101 shoWn in
`FIG. 1A. Cluster 200 includes processors 202ai202d, one or
`more Basic I/O systems (BIOS) 204, a memory subsystem
`comprising memory banks 206ai206d, point-to-point com
`munication links 208ai208e,and a service processor 212.
`The point-to-point communication links are con?gured to
`alloW interconnections betWeen processors 202ai202d, I/O
`sWitch 210, and interconnection controller 230. The service
`processor 212 is con?gured to alloW communications With
`processors 2021114 202d, I/O sWitch 210, and interconnec
`tion controller 230 via a JTAG interface represented in FIG.
`2 by links 214a*214f It should be noted that other interfaces
`are supported. I/O sWitch 210 connects the rest of the system
`to I/O adapters 216 and 220, and to BIOS 204 for booting
`purposes.
`According to speci?c embodiments, the service processor
`of the present invention has the intelligence to partition
`system resources according to a previously speci?ed parti
`tioning schema. The partitioning can be achieved through
`direct manipulation of routing tables associated With the
`system processors by the service processor Which is made
`possible by the point-to-point communication infrastructure.
`The routing tables can also be changed by execution of the
`BIOS code in one or more processors. The routing tables are
`used to control and isolate various system resources, the
`connections betWeen Which are de?ned therein.
`The processors 202aid are also coupled to an intercon
`nection controller 230 through point-to-point links 232aid.
`According to various embodiments and as Will be described
`beloW in greater detail, interconnection controller 230 per
`forms a variety of functions Which enable the number of
`interconnected processors in the system to exceed the node
`ID space and mapping table limitations associated With each
`of a plurality of processor clusters. According to some
`embodiments, interconnection controller 230 performs a
`variety of other functions including the maintaining of cache
`coherency across clusters. Interconnection controller 230
`can be coupled to similar controllers associated With other
`multi-processor clusters. It should be noted that there can be
`more than one such interconnection controller in one cluster.
`Interconnection controller 230 communicates With both pro
`cessors 202aid as Well as remote clusters using a point-to
`point protocol.
`More generally, it should be understood that the speci?c
`architecture shoWn in FIG. 2 is merely exemplary and that
`embodiments of the present invention are contemplated
`having different con?gurations and resource interconnec
`tions, and a variety of alternatives for each of the system
`resources shoWn. HoWever, for purpose of illustration, spe
`ci?c details of cluster 200 Will be assumed. For example,
`most of the resources shoWn in FIG. 2 are assumed to reside
`on a single electronic assembly. In addition, memory banks
`206ai206d may comprise double data rate (DDR) memory
`Which is physically provided as dual in-line memory mod
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`ules (DIMMs). I/O adapter 216 may be, for example, an
`ultra direct memory access (UDMA) controller or a small
`computer system interface (SCSI) controller Which provides
`access to a permanent storage device. I/O adapter 220 may
`be an Ethernet card adapted to provide communications With
`a netWork such as, for example, a local area netWork (LAN)
`or the Internet. BIOS 204 may be any persistent memory like
`?ash memory.
`According to one embodiment, service processor 212 is a
`Motorola MPC855T microprocessor Which includes inte
`grated chipset functions, and interconnection controller 230
`is an Application Speci?c Integrated Circuit (ASIC) sup
`porting the local point-to-point coherence protocol. Inter
`connection controller 230 can also be con?gured to handle
`a non-coherent protocol to alloW communication with I/O
`devices. In one embodiment, interconnection controller 230
`is a specially con?gured programmable chip such as a
`programmable logic device or a ?eld programmable gate
`array. In another embodiment, the interconnect controller
`230 is an Application Speci?c Integrated Circuit (ASIC). In
`yet another embodiment, the interconnect controller 230 is
`a general purpose processor augmented With an ability to
`access and process interconnect packet traf?c.
`FIG. 3 is a diagrammatic representation of one example of
`an interconnection controller 230 for facilitating various
`aspects of the present invention. According to various
`embodiments, the interconnection controller includes a pro
`tocol engine 305 con?gured to handle packets such as
`probes and requests received from processors in various
`clusters of a multi-processor system. The functionality of the
`protocol engine 305 can be partitioned across several
`engines to improve performance. In one example, partition
`ing is done based on packet type (request, probe and
`response), direction (incoming and outgoing), or transaction
`?oW (request ?oWs, probe ?oWs, etc).
`The protocol engine 305 has access to a pending buffer
`309 that alloWs the interconnection controller to track trans
`actions such as recent requests and probes and associate the
`transactions With speci?c processors. Transaction informa
`tion maintained in the pending buffer 309 can include
`transaction destination nodes, the addresses of requests for
`subsequent collision detection and protocol optimizations,
`response information, tags, and state information. As Will
`become clear, this functionality is leveraged to enable par
`ticular aspects of the present invention.
`The interconnection controller has a coherent protocol
`interface 307 that alloWs the interconnection controller to
`communicate With other processors in the cluster as Well as
`external processor clusters. The interconnection controller
`may also include other interfaces such as a non-coherent
`protocol interface 311 for communicating with I/O devices
`(e.g., as represented in FIG. 2 by links 2080 and 208d).
`According to various embodiments, each interface 307 and
`311 is implemented either as a full crossbar or as separate
`receive and transmit units using components such as mul
`tiplexers and buffers. It should be noted that the intercon
`nection controller 230 does not necessarily need to provide
`both coherent and non-coherent interfaces. It should also be
`noted that an interconnection controller 230 in one cluster
`can communicate With an interconnection controller 230 in
`another cluster.
`According to various embodiments of the invention,
`processors 202ai202d are substantially identical. FIG. 4 is
`a simpli?ed block diagram of such a processor 202 Which
`includes an interface 402 having a plurality of ports
`
`
`
`US 7,155,525 B2
`
`5
`40411-4040 and routing tables 406114060 associated there
`With. Each port 404 allows communication With other
`resources, e.g., processors or I/O devices, in the computer
`system via associated links, e.g., links 208a-208e of FIG. 2.
`The infrastructure shoWn in FIG. 4 can be generalized as
`a point-to-point, distributed routing mechanism Which com
`prises a plurality of segments interconnecting the systems
`processors according to any of a variety of topologies, e.g.,
`ring, mesh, etc. Each of the endpoints of each of the
`segments is associated With a connected processor Which has
`a unique node ID and a plurality of associated resources
`Which it “oWns,” e.g., the memory and I/O to Which it’s
`connected.
`The routing tables associated With each of the nodes in the
`distributed routing mechanism collectively represent the
`current state of interconnection among the computer system
`resources. Each of the resources (e.g., a speci?c memory
`range or I/O device) oWned by any given node (e.g.,
`processor) is represented in the routing table (s) associated
`With the node as an address. When a request arrives at a
`node, the requested address is compared to a tWo level entry
`in the node’s routing table identifying the appropriate node
`and link, i.e., given a particular address Within a range of
`addresses, go to node x; and for node x use link y.
`As shoWn in FIG. 4, processor 202 can conduct point-to
`point communication With three other processors according
`to the information in the associated routing tables. Accord
`ing to a speci?c embodiment, routing tables 406a-406c
`comprise tWo-level tables, a ?rst level associating the unique
`addresses of system resources (e.g., a memory bank) With a
`corresponding node (e.g., one of the processors), and a
`second level associating each node With the link (e.g.,
`208a-208e) to be used to reach the node from the current
`node.
`Processor 202 also has a set of JTAG handshake registers
`408 Which, among other things, facilitate communication
`betWeen the service processor (e.g., service processor 212 of
`FIG. 2) and processor 202. That is, the service processor can
`Write routing table entries to handshake registers 408 for
`eventual storage in routing tables 406a-406c. It should be
`understood that the processor architecture depicted in FIG.
`4 is merely exemplary for the purpose of describing a
`speci?c embodiment of the present invention. For example,
`a feWer or greater number of ports and/ or routing tables may
`be used to implement other embodiments of the invention.
`As mentioned above, the basic protocol upon Which the
`clusters in speci?c embodiments of the invention are based
`provides for a limited node ID space Which, according to a
`particular implementation, is a 3-bit space, therefore alloW
`ing for the unique identi?cation of only 8 nodes. That is, if
`this basic protocol is employed Without the innovations
`represented by the present invention, only 8 nodes may be
`interconnected in a single cluster via the point-to-point
`infrastructure. To get around this limitation, the present
`invention introduces a hierarchical mechanism Which pre
`serves the single-layer identi?cation scheme Within particu
`lar clusters While enabling interconnection With and com
`munication betWeen other similarly situated clusters and
`processing nodes.
`According to a speci?c embodiment, one of the nodes in
`each multi-processor cluster is an interconnection controller,
`e.g., interconnection controller 230 of FIG. 2, Which man
`ages the hierarchical mapping of information thereby
`enabling multiple clusters to share a single memory address
`space While simultaneously alloWing the processors Within
`its cluster to operate and to interact With any processor in
`any cluster Without “knowledge” of anything outside of their
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`oWn cluster. The interconnection controller appears to its
`associated processor to be just another one of the processors
`or nodes in the cluster.
`In the basic protocol, When a particular processor in a
`cluster generates a request, a set of address mapping tables
`are employed to map the request to one of the other nodes
`in the cluster. That is, each node in a cluster has a portion of
`a shared memory space With Which it is associated. There are
`different types of address mapping tables for main memory,
`memory-mapped I/O, different types of I/O space, etc. These
`address mapping tables map the address identi?ed in the
`request to a particular node in the cluster.
`A set of routing tables are then employed to determine
`hoW to get from the requesting node to the node identi?ed
`from the address mapping table. That is, as discussed above,
`each processor (i.e., cluster node) has associated routing
`tables Which identify a particular link in the point-to-point
`infrastructure Which may be used to transmit the request
`from the current node to the node identi?ed from the address
`mapping tables. Although generally a node may correspond
`to one or a plurality of resources (including, for example, a
`processor), it should be noted that the terms node and
`processor are often used interchangeably herein. According
`to a particular implementation, a node comprises multiple
`sub-units, e.g., CPUs, memory controllers, I/O bridges, etc.,
`each of Which has a unit ID.
`In addition, because individual transactions may be seg
`mented in non-consecutive packets, each packet includes a
`unique transaction tag to identify the transaction With Which
`the packet is associated With reference to the node Which
`initiated the transaction. According to a speci?c implemen
`tation, a transaction tag identi?es the source node (3-bit
`?eld), the source node unit (2-bit ?eld), and a transaction ID
`(5-bit ?eld).
`Thus, When a transaction is initiated at a particular node,
`the address mapping tables are employed to identify the
`destination node (and unit) Which are then appended to the
`packet and used by the routing tables to identify the appro
`priate link (s) on Which to route the packet. The source
`information is used by the destination node and any other
`nodes Which are probed With the request to respond to the
`request appropriately.
`According to a speci?c embodiment and as mentioned
`above, the interconnection controller in each cluster appears
`to the other processors in its cluster as just another processor
`in the cluster. HoWever, the portion of the shared memory
`space associated With the interconnection controller actually
`encompasses the remainder of the globally shared memory
`space, i.e., the memory associated With all other clusters in
`the system. That is, from the perspective of the local
`processors in a particular cluster, the memory space associ
`ated With all of the other multi-processor clusters in the
`system are represented by the interconnection controller (s)
`in their oWn cluster.
`According to an even more speci?c embodiment Which
`Will be described With reference to FIG. 5, each cluster has
`?ve nodes (e.g., as shoWn in FIG. 2) Which include four
`processors 202a-d and an interconnection controller 230,
`each of Which is represented by a 3-bit node ID Which is
`unique Within the cluster. As mentioned above, each pro
`cessor (i.e., cluster node) may represent a number of sub
`units including, for example, CPUs, memory controllers,
`etc.
`An illustration of an exemplary address mapping scheme
`designed according to the invention and assuming such a
`cluster con?guration is shoWn in FIG. 5. In the illustrated
`example, it is also assumed that the global memory space is
`
`
`
`US 7,155,525 B2
`
`7
`shared by 4 such clusters also referred to herein as quads (in
`that each contains four local processors). As Will be under
`stood, the number of clusters and nodes Within each cluster
`may vary according to different embodiments.
`To extend the address mapping function beyond a single
`cluster, each cluster maps its local memory space, i.e., the
`portion of the global memory space associated With the
`processors in that cluster, into a contiguous region While the
`remaining portion of the global memory space above and
`beloW this region is mapped to the local interconnection
`controller(s). The interconnection controller in each cluster
`maintains tWo mapping tables: a global map and local map.
`The global map maps outgoing requests to remote clusters.
`The local map maps incoming requests from remote clusters
`to a particular node Within the local cluster.
`Referring noW to FIG. 5, e