`
`(12) United States Patent
`Glasco et al.
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 7,251,698 B2
`Jul. 31, 2007
`
`(54) ADDRESS SPACE MANAGEMENT IN
`SYSTEMS HAVING MULTIPLE
`MULTI-PROCESSOR CLUSTERS
`
`(75) Inventors: David Blrian Glasco, Austin, TX (US);
`Cafl Zel?er, Tombans TX (Us);
`Ralesh Km?” ‘511599 TX (Us); Guru
`grilsladh’sAusnn’g
`Rlchard R‘
`e er, omers,
`
`(73) Assignee: NeWisys, Inc., Austin, TX (US)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 834 days.
`
`(21) Appl. No.: 10/157,409
`
`(22) Filed:
`
`May 28, 2002
`
`(65)
`
`Prior Publication Data
`US 2003/0225909 A1
`Dec. 4, 2003
`
`(51) Int_ CL
`(2006.01)
`G06F 15/16
`(52) US. Cl. ..................... .. 709/237; 709/220; 711/202
`(58) Field of Classi?cation Search .............. .. 709/237,
`709/220; 711/202
`See application ?le for complete search history.
`
`(56)
`
`References Cited
`
`5,796,605 A
`5,805,839 A
`5,931,938 A
`
`8/1998 Hagersten
`9/1998 Singhai
`8/1999 Drogichen et al. ......... .. 712/15
`
`4/2000 Viswanathan et al. .... .. 709/245
`6,047,332 A
`7/2000 Ekanadham et a1‘
`6,085,295 A
`6,167,492 A 12/2000 Keller et al. .............. .. 711/154
`6,209,065 B1
`3/2001 Van Doren et al.
`6,219,775 B1
`4/2001 Wade et al. ................ .. 712/11
`
`(Commued)
`FOREIGN PATENT DOCUMENTS
`
`EP
`
`0978781
`
`2/2000
`
`(Continued)
`OTHER PUBLICATIONS
`
`Copy of International Search Report dated Jul. 30, 2004, from
`corresponding PCT Application No. PCT/US2003/034687 (9
`pages)‘
`
`(Continued)
`
`Prir'nary ExamiueriWilliam C. Vaughn, Jr.
`‘4551mm’ ExammeriThanh T Nguyen
`(74) Attorney, Agent, or FirmiBeyer Weaver & Thomas,
`LLP
`
`US. PATENT DOCUMENTS
`
`(57)
`
`ABSTRACT
`
`5/1987 Allen et al. ............... .. 709/234
`4,667,287 A
`4,783,687 A 11/1988 Rees
`5,166,674 A 11/1992 Bal1_m et a1~ ~~~~~~~~~~~~~~ ~~ 714/752
`5,191,651 A
`3/1993 Hallm et a1~
`709/250
`5,197,130 A
`3/1993 Chen_ et_ al' ~ ~ ~ ~ ~
`~ ~ ~ " 712/3
`5301311 A
`4/1994 Fushlml_ et a1‘ '
`714/23
`;
`i
`‘sAtt'fltrlllaslo et a1‘
`
`A multi-processor computer system is described in Which
`address mapping, routing, and transaction identi?cation
`mechanisms are provided Which enable the interconnection
`of a plurality of multi-processor clusters, Wherein the num
`ber of processors interconnected exceeds limited address,
`
`5’623’644 A
`4/l997 Snluf
`' ' ' ' ' ' ' ' ' '
`5’682’5l2 A * “M997 Tztrifka' '
`5,692,123 A 11/1997 Logghe
`5,781,187 A
`7/ 1998 Gephardt et al.
`
`' ' ' ' "713/503
`711002
`
`node identi?cation, and transaction tag spaces associated
`With each of the individual clusters.
`
`23 Claims, 9 Drawing Sheets
`
`Processing
`Cluster 101
`
`5
`
`/—111d
`,5 ‘1
`'
`
`——_'__——7
`,4
`
`Processing
`Cluster 103
`
`,
`
`A
`
`,.-— "N.
`
`/~-- ">4,
`
`A
`
`£1116
`
`111iJ
`
`1: jj'jI‘/—1UC
`
`1119i:
`
`1
`
`'
`
`Processing
`Cluster 105
`
`r1111)
`I ‘3
`1
`-'
`
`Processing
`Cluster 107
`
`I
`1
`
`
`
`US 7,251,698 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`6,226,671
`6,256,671
`6,259,701
`6,331,983
`6,338,122
`6,370,585
`6,385,705
`6,397,255
`6,463,529
`6,467,007
`6,490,661
`6,578,071
`6,598,130
`6,760,819
`6,785,726
`6,820,174
`6,826,660
`6,847,993
`6,856,621
`6,920,519
`6,977,908
`2001/0014097
`2001/0037435
`2002/0004915
`
`5/2001
`7/2001
`7/2001
`12/2001
`1/2002
`4/2002
`5/2002
`5/2002
`10/2002
`10/2002
`12/2002
`6/2003
`7/2003
`7/2004
`8/2004
`11/2004
`11/2004
`1/2005
`2/2005
`7/2005
`12/2005
`8/2001
`11/2001
`1/2002
`
`Hagersten et al. ........ .. 709/215
`StrentZsch et al.
`709/227
`Shur et al. ......... ..
`370/401
`Haggerty et al. ......... .. 370/400
`Baumgartner et al. .... .. 711/141
`Hagersten et al. ........ .. 709/238
`
`711/154
`.. 709/228
`
`Keller et al. ....... ..
`Nurenberg et al. ..
`Miller et al.
`Armstrong et al.
`Keller et al. .............. .. 711/150
`Hagersten et al. ........ .. 709/215
`Harris et al.
`Dhong et al. ............. .. 711/146
`Freeman et a1.
`Vanderwiel
`Hagersten et al. ........ .. 711/153
`Novaes et al.
`709/221
`Artes .......... ..
`370/390
`
`Beukema et al. ......... .. 710/306
`De AZevedo et al.
`Beck et a1. ............... .. 370/401
`Van Doren
`Fung ........................ .. 713/320
`
`1/2002 Fung et al.
`2002/0007463 A1
`2002/0156888 A1 10/2002 Lee et al. ................. .. 709/224
`2003/0225938 A1 12/2003 Glasco et al. ............. .. 713/375
`2003/0233388 A1 12/2003 Glasco et al. ............. .. 718/101
`2004/0098475 A1
`5/2004 Zeitler et al. ............. .. 709/223
`
`FOREIGN PATENT DOCUMENTS
`
`W0
`
`WO 02/39242
`
`5/2002
`
`OTHER PUBLICATIONS
`
`D. E. Culler, J. P. Singh, A. Gupta, “Parallel Computer Architec
`ture”, 1999 Morgan Kaufmann, San Francisco, CA USA
`XP002277658.
`Andrew Tanenbaum, “Computer Networks”, Computer Networks,
`London: Prentice Hall International, GB, 1996, pp. 345-403,
`XP002155220.
`1.03,
`HyperTransportTM I/O Link Speci?cation Revision
`HyperTransportTM Consortium, Oct. 10, 2001; Copyright © 2001
`HyperTransport Technology Consortium.
`Mailed Apr. 18, 2006 in US. Appl. No. 10/356,393, Filed Jan. 30,
`2003.
`European Search Report mailed Mar. 29, 2006 in US. Appl. No. 03
`778 0277-2211.
`
`* cited by examiner
`
`
`
`U.S. Patent
`
`Jul. 31, 2007
`
`Sheet 1 0f 9
`
`US 7,251,698 B2
`
`Fig. 1A
`
`Processing
`Cluster 101
`
`‘
`
`/—1 1 1d
`,5 ‘3
`
`Processing
`Cluster 103
`
`V
`
`A
`
`111a‘\:
`
`~ ~ ~ .
`
`x - -
`
`J
`C1116
`
`_ - — -
`
`~ < ~_
`
`A
`
`111fJ
`
`, Iii/‘111°
`
`V
`
`7
`
`Processing
`Cluster 105
`
`‘
`
`r1 1 1b
`I ‘3
`
`Processing
`Cluster 107
`
`V
`
`Fig. 1B
`
`Processing
`Cluster 121
`
`2
`
`Processing
`Cluster 123
`
`;
`
`"“"‘;"¥141d
`141aJ‘T'”
`Switch 131
`MIbWiiI.)
`{12f 1410
`
`Processing
`Cluster 125
`
`‘
`
`>
`
`Processing
`Cluster 127
`
`
`
`U.S. Patent
`
`Jul. 31, 2007
`
`Sheet 2 of 9
`
`US 7,251,698 B2
`
`
`
`QNONuommoooum
`
`ootzmm
`
`mamgommoooum
`
`Hommoooi
`
`omomHommooobm
`
`Efiofim
`
`moi
`
`Em
`
`ommOhI
`
`oHNfizamOh
`
`\\sfiofim
`
`88.-
`
`N.3
`
`
`
`EQEEUOHOHQDM
`
`88
`
`
`
`
`U.S. Patent
`
`Jul. 31, 2007
`
`Sheet 3 0f 9
`
`US 7,251,698 B2
`
`m .wE
`
`
`
`Sm 09%85 EBUQQU
`
`/ OS
`
`
`
`U.S. Patent
`
`Jul. 31, 2007
`
`Sheet 4 0f 9
`
`US 7,251,698 B2
`
`nix“
`
`owow
`
`‘Ill v50
`
`IV 83 96
`
`Alllll 85 E6
`
`NQVK
`
`w?wm /\/ £3
`
`, db hommouohm
`
`
`
`llilv v50 v30 l||l|+
`
`
`
`
`
`IIIIIIIIIIY ,HHU EBWHQM
`
`KNON
`
`
`
`AIIIII! 6396 $556 All
`
`
`
`
`
`w .wE
`
`lllv 889B 25 52 $596 _.lll.v
`
`
`
`
`
`
`
`|I|v ‘E0 60? H6 llv
`
`M052, wuzmk
`
`
`Allslll S82, v50 ,
`
`
`Ailll: H6 Til
`
`
`
`U.S. Patent
`
`Jul. 31, 2007
`
`Sheet 5 0f 9
`
`US 7,251,698 B2
`
`Reg nesting 5 Quad
`Local Map
`Global Map
`
`Target Q uad
`Local Map
`
`
`
`\ Node4 \\\ Node 3 \ “ Quad 3
`
`
`
`
`
`/‘“‘ ~
`
`\‘x
`‘e
`‘\
`‘ \\\\
`\\
`\‘t ‘\\\
`‘\\
`\\ ‘1
`\\\\
`\\s\
`\\
`\\\\\
`\\\\
`\\ \\
`‘ \\
`\ \Q\
`\\ \\
`\\ \\ \\ \
`‘\ \\\‘\
`\ \\
`‘a \‘\
`“ \X
`\ \\
`\\ \\ X‘
`
`Node4
`Node 3
`Node 2
`Node 1
`Node 0
`\\ \\
`\\ \ Node 4
`‘\ Quad 2
`
`\
`
`R
`
`\
`
`\
`
`\
`
`\ \
`
`509
`
`51 0
`
`\
`
`Node 2 "T' Quad 2
`\
`Quad 1
`Nodel
`\\
`Q ad 0
`Node 0
`\
`u
`Quad 3
`\_
`Node4 ‘ ~\
`502
`\ Node 3 \\ Quad 3
`Node 2
`Quad 2
`Node 1
`Quad 1
`r’
`Node 0
`QuadO
`Node 4 1-"
`Quad 2
`
`‘\~,
`
`Node 4
`503 \ Node 3
`Node 2
`Node 1
`Node 0
`Node 4
`Quad 1
`
`Node 4
`504 \ Node 3
`Node 2
`Node 1
`Node 0
`Quad 0
`
`508
`
`‘
`\
`‘\
`\\
`‘
`
`51 1
`Node 3 /_
`Node 2
`Node 1
`Node 0
`Quad 0
`
`Fig. 5
`
`
`
`U.S. Patent
`
`Jul. 31, 2007
`
`Sheet 6 0f 9
`
`US 7,251,698 B2
`
`600
`/_
`
`Cluster 0
`£602 \ 606
`5
`L0 L0
`1'
`
`E
`E
`
`
`IC 4“ 1 5“ [C
`N1
`5
`5
`N1
`
`_
`f 608
`L° L"
`
`_C_|u_s_te_r_ :1
`E
`i
`5
`
`
`
`
`
`l __________________ 5 :: @iLl:
`
`L
`
`
`
`i _____ L1 :'
`
`
`
`
`
`
`
`
`
`
`
`i ______ -112 ___________________ 5 Ell-1 L2 610 L0 L0 604
`
`
`
`
`
`g
`
`=
`
`5
`
`N1
`
`=
`
`5
`
`N1
`
`Cluster 2
`
`L# - Link number
`N# - Node number
`
`Cluster 3
`
`Fig. 6A
`
`Cluster 1
`
`Source
`Cluster 0
`Node 0
`Node 1
`Node 0
`Node 1
`Node 0
`Node 1
`Node 0
`Node 1
`
`Cluster 2
`
`Cluster 3
`
`Local Table
`
`Global Table
`
`Dest Node
`N0
`N1
`X
`L,
`L0
`X
`X
`L,
`L0
`X
`X
`Lo
`L0
`X
`X
`L0
`L0
`X
`
`Dest Cluster
`C1
`C1
`C0
`NA NA
`NA
`X
`L1
`L1
`NA NA
`NA
`Ll
`X
`L2
`NA NA
`NA
`L2
`L;
`X
`NA NA
`NA
`L2
`L2
`L
`
`C3
`NA
`L1
`NA
`L2
`NA
`L1
`NA
`X
`
`E1. QQ ox W
`
`
`
`U.S. Patent
`
`Jul. 31, 2007
`
`Sheet 7 0f 9
`
`US 7,251,698 B2
`
`Fig. 7
`
`start
`
`V
`
`Receive locally
`generated transaction
`
`/— 702
`
`V
`
`Allocate space in
`pending buffer
`
`/— 704
`
`V
`Append global transaction
`tag
`and cluster 1]) and transmit
`transaction
`
`V
`Receive incoming
`transmisslons related to
`transactlon
`
`V
`Index incoming
`transmission in pending
`buffer using global tag
`
`712
`——-<Local tag required?>/i
`
`Use local tag ?rom
`pending buffer entry
`
`/— 714
`
`>'
`
`end
`
`
`
`U.S. Patent
`
`Jul. 31, 2007
`
`Sheet 8 0f 9
`
`US 7,251,698 B2
`
`i
`
`start
`
`i
`
`Receive remotely
`generated transaction
`
`V
`
`Assign local
`transaction tag
`
`V
`
`Allocate space in /—— 806
`pending buffer
`
`V
`Insert entry with global
`and local tags in pending
`buffer
`
`Fig. 8
`
`Receive outgoing
`transmission
`
`7
`Index outgoing
`transmission in pending
`buffer using local tag
`
`V
`Use global tag for this and
`related subsequent
`transactions
`
`V
`end
`
`i
`
`<
`
`
`
`U.S. Patent
`
`m
`
`m
`
`M
`
`US 7,251,698 B2
`
`98¢3.8_:838OODADD
`
`ANa
`
`82.250mw;03mom.3:am
`
`38mémEomawnmomE838
`:8oP5Eo0PB
`
`Qam
`
`mom4
`
`8a
`
`Emii!Sm
`
`mI
`mmamaEmaEma3%ES2%So:oooz‘oz0
`
`ma
`
`,
`
`9%I83230f38VoaomMIx:
`
`m.:&Sm2%QE0
`
`miqea}Eagm
`
`Em
`
`g
`
`Bofimm
`
`manSoEwan
`
`
`
`
`US 7,251,698 B2
`
`1
`ADDRESS SPACE MANAGEMENT IN
`SYSTEMS HAVING MULTIPLE
`MULTI-PROCESSOR CLUSTERS
`
`BACKGROUND OF THE INVENTION
`
`The present invention relates generally to multi-processor
`computer systems. More speci?cally, the present invention
`provides techniques for building computer systems having a
`plurality of multi-processor clusters.
`A relatively neW approach to the design of multi-proces
`sor systems replaces broadcast communication among pro
`cessors With a point-to-point data transfer mechanism in
`Which the processors communicate similarly to netWork
`nodes in a tightly-coupled computing system. That is, the
`processors are interconnected via a plurality of communi
`cation links and requests are transferred among the proces
`sors over the links according to routing tables associated
`With each processor. The intent is to increase the amount of
`information transmitted Within a multi-processor platform
`per unit time.
`One limitation associated With such an architecture is that
`the node ID address space associated With the point-to-point
`infrastructure is ?xed, therefore alloWing only a limited
`number of nodes to be interconnected. In addition, the
`infrastructure is ?at, therefore alloWing a single level of
`mapping for address spaces and routing functions. It is
`therefore desirable to provide techniques by Which computer
`systems employing such an infrastructure as a basic building
`block are not so limited.
`
`SUMMARY OF THE INVENTION
`
`According to the present invention, a multi-processor
`system is provided in Which a plurality of multi-processor
`clusters, each employing a point-to-point communication
`infrastructure With a ?xed node ID space and ?at request
`mapping functions, are interconnected using additional
`point-to-point links in such a manner as to enable more
`processors to be interconnected than Would otherWise be
`possible With the local point-to-point architecture. The
`invention employs a mapping hierarchy to uniquely map
`various types of information from local, cluster-speci?c
`spaces to globally shared spaces.
`Thus, the present invention provides an interconnection
`controller for use in a computer system having a plurality of
`processor clusters and a global address space associated
`thereWith. Each cluster includes a plurality of local nodes
`and an instance of the interconnection controller intercon
`nected by a local point-to-point architecture. Each cluster
`has a local address space associated thereWith corresponding
`to a ?rst portion of the global address space. The intercon
`nection controller includes circuitry Which is operable to
`map locally generated address information to others of the
`clusters in the global address space, and remotely generated
`address information to the local nodes in the local address
`space. According to a speci?c embodiment, the present
`invention also provides a computer system employing such
`an interconnection controller.
`A further understanding of the nature and advantages of
`the present invention may be realiZed by reference to the
`remaining portions of the speci?cation and the draWings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIGS. 1A and 1B are diagrammatic representations
`depicting systems having multiple clusters.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`FIG. 2 is a diagrammatic representation of an exemplary
`cluster having a plurality of processors for use With speci?c
`embodiments of the present invention.
`FIG. 3 is a diagrammatic representation of an exemplary
`interconnection controller for facilitating various embodi
`ments of the present invention.
`FIG. 4 is a diagrammatic representation of a local pro
`cessor for use With various embodiments of the present
`invention.
`FIG. 5 is a diagrammatic representation of a memory
`mapping scheme according to a particular embodiment of
`the invention.
`FIG. 6A is a simpli?ed block diagram of a four cluster
`system for illustrating a speci?c embodiment of the inven
`tion.
`FIG. 6B is a combined routing table including routing
`information for the four cluster system of FIG. 6A.
`FIGS. 7 and 8 are ?oWcharts illustrating transaction
`management in a multi-cluster system according to speci?c
`embodiments of the invention.
`FIG. 9 is a diagrammatic representation of communica
`tions relating to an exemplary transaction in a multi-cluster
`system.
`
`DETAILED DESCRIPTION OF SPECIFIC
`EMBODIMENTS
`
`Reference Will noW be made in detail to some speci?c
`embodiments of the invention including the best modes
`contemplated by the inventors for carrying out the invention.
`Examples of these speci?c embodiments are illustrated in
`the accompanying drawings. While the invention is
`described in conjunction With these speci?c embodiments, it
`Will be understood that it is not intended to limit the
`invention to the described embodiments. On the contrary, it
`is intended to cover alternatives, modi?cations, and equiva
`lents as may be included Within the spirit and scope of the
`invention as de?ned by the appended claims. Multi-proces
`sor architectures having point-to-point communication
`among their processors are suitable for implementing spe
`ci?c embodiments of the present invention. In the folloWing
`description, numerous speci?c details are set forth in order
`to provide a thorough understanding of the present inven
`tion. The present invention may be practiced Without some
`or all of these speci?c details. Well knoWn process opera
`tions have not been described in detail in order not to
`unnecessarily obscure the present invention. Furthermore,
`the present application’s reference to a particular singular
`entity includes that possibility that the methods and appa
`ratus of the present invention can be implemented using
`more than one entity, unless the context clearly dictates
`otherWise.
`FIG. 1A is a diagrammatic representation of one example
`of a multiple cluster, multiple processor system Which may
`employ the techniques of the present invention. Each pro
`cessing cluster 101, 103, 105, and 107 includes a plurality of
`processors. The processing clusters 101, 103, 105, and 107
`are connected to each other through point-to-point links
`111117‘: The multiple processors in the multiple cluster
`architecture shoWn in FIG. 1A share a global memory space.
`In this example, the point-to-point links lllaif are internal
`system connections that are used in place of a traditional
`front-side bus to connect the multiple processors in the
`multiple clusters 101, 103, 105, and 107. The point-to-point
`links may support any point-to-point coherence protocol.
`FIG. 1B is a diagrammatic representation of another
`example of a multiple cluster, multiple processor system that
`
`
`
`US 7,251,698 B2
`
`3
`may employ the techniques of the present invention. Each
`processing cluster 121, 123, 125, and 127 is coupled to a
`sWitch 131 through point-to-point links 141a*d. It should be
`noted that using a sWitch and point-to-point links alloWs
`implementation With feWer point-to-point links When con
`necting multiple clusters in the system. A sWitch 131 can
`include a general purpose processor With a coherence pro
`tocol interface. According to various implementations, a
`multi-cluster system shoWn in FIG. 1A may be expanded
`using a sWitch 131 as shoWn in FIG. 1B.
`FIG. 2 is a diagrammatic representation of a multiple
`processor cluster such as, for example, cluster 101 shoWn in
`FIG. 1A. Cluster 200 includes processors 202ai202d, one or
`more Basic I/O systems (BIOS) 204, a memory subsystem
`comprising memory banks 206ai206d, point-to-point com
`munication links 208ai208e, and a service processor 212.
`The point-to-point communication links are con?gured to
`alloW interconnections betWeen processors 202ai202d, I/O
`sWitch 210, and interconnection controller 230. The service
`processor 212 is con?gured to alloW communications With
`processors 202ai202d, I/O sWitch 210, and interconnection
`controller 230 via a J TAG interface represented in FIG. 2 by
`links 214a*214f It should be noted that other interfaces are
`supported. I/O sWitch 210 connects the rest of the system to
`I/O adapters 216 and 220, and to BIOS 204 for booting
`purposes.
`According to speci?c embodiments, the service processor
`of the present invention has the intelligence to partition
`system resources according to a previously speci?ed parti
`tioning schema. The partitioning can be achieved through
`direct manipulation of routing tables associated With the
`system processors by the service processor Which is made
`possible by the point-to-point communication infrastructure.
`The routing tables can also be changed by execution of the
`BIOS code in one or more processors. The routing tables are
`used to control and isolate various system resources, the
`connections betWeen Which are de?ned therein.
`The processors 202aid are also coupled to an intercon
`nection controller 230 through point-to-point links 232aid.
`According to various embodiments and as Will be described
`beloW in greater detail, interconnection controller 230 per
`forms a variety of functions Which enable the number of
`interconnected processors in the system to exceed the node
`ID space and mapping table limitations associated With each
`of a plurality of processor clusters. According to some
`embodiments, interconnection controller 230 performs a
`variety of other to functions including the maintaining of
`cache coherency across clusters. Interconnection controller
`230 can be coupled to similar controllers associated With
`other multiprocessor clusters. It should be noted that there
`can be more than one such interconnection controller in one
`cluster. Interconnection controller 230 communicates With
`both processors 202aid as Well as remote clusters using a
`point-to-point protocol.
`More generally, it should be understood that the speci?c
`architecture shoWn in FIG. 2 is merely exemplary and that
`embodiments of the present invention are contemplated
`having different con?gurations and resource interconnec
`tions, and a variety of alternatives for each of the system
`resources shoWn. HoWever, for purpose of illustration, spe
`ci?c details of cluster 200 Will be assumed. For example,
`most of the resources shoWn in FIG. 2 are assumed to reside
`on a single electronic assembly. In addition, memory banks
`206ai206d may comprise double data rate (DDR) memory
`Which is physically provided as dual in-line memory mod
`ules (DIMMs). I/O adapter 216 may be, for example, an
`ultra direct memory access (UDMA) controller or a small
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`computer system interface (SCSI) controller Which provides
`access to a permanent storage device. I/O adapter 220 may
`be an Ethernet card adapted to provide communications With
`a netWork such as, for example, a local area netWork (LAN)
`or the Internet. BIOS 204 may be any persistent memory like
`?ash memory.
`According to one embodiment, service processor 212 is a
`Motorola MPC855T microprocessor Which includes inte
`grated chipset functions, and interconnection controller 230
`is an Application Speci?c Integrated Circuit (ASIC) sup
`porting the local point-to-point coherence protocol. Inter
`connection controller 230 can also be con?gured to handle
`a non-coherent protocol to alloW communication with I/O
`devices. In one embodiment, interconnection controller 230
`is a specially con?gured programmable chip such as a
`programmable logic device or a ?eld programmable gate
`array. In another embodiment, the interconnect controller
`230 is an Application Speci?c Integrated Circuit (ASIC). In
`yet another embodiment, the interconnect controller 230 is
`a general purpose processor augmented With an ability to
`access and process interconnect packet traf?c.
`FIG. 3 is a diagrammatic representation of one example of
`an interconnection controller 230 for facilitating various
`aspects of the present invention. According to various
`embodiments, the interconnection controller includes a pro
`tocol engine 305 con?gured to handle packets such as
`probes and requests received from processors in various
`clusters of a multiprocessor system. The functionality of the
`protocol engine 305 can be partitioned across several
`engines to improve performance. In one example, partition
`ing is done based on packet type (request, probe and
`response), direction (incoming and outgoing), or transaction
`?oW (request ?oWs, probe ?oWs, etc).
`The protocol engine 305 has access to a pending buffer
`309 that alloWs the interconnection controller to track trans
`actions such as recent requests and probes and associate the
`transactions With speci?c processors. Transaction informa
`tion maintained in the pending buffer 309 can include
`transaction destination nodes, the addresses of requests for
`subsequent collision detection and protocol optimizations,
`response information, tags, and state information. As Will
`become clear, this functionality is leveraged to enable par
`ticular aspects of the present invention.
`The interconnection controller has a coherent protocol
`interface 307 that alloWs the interconnection controller to
`communicate With other processors in the cluster as Well as
`external processor clusters. The interconnection controller
`may also include other interfaces such as a non-coherent
`protocol interface 311 for communicating with I/O devices
`(e.g., as represented in FIG. 2 by links 2080 and 208d).
`According to various embodiments, each interface 307 and
`311 is implemented either as a full crossbar or as separate
`receive and transmit units using components such as mul
`tiplexers and buffers. It should be noted that the intercon
`nection controller 230 does not necessarily need to provide
`both coherent and non-coherent interfaces. It should also be
`noted that an interconnection controller 230 in one cluster
`can communicate With an interconnection controller 230 in
`another cluster.
`According to various embodiments of the invention,
`processors 202ai202d are substantially identical. FIG. 4 is
`a simpli?ed block diagram of such a processor 202 Which
`includes an interface 402 having a plurality of ports
`404114040 and routing tables 406a406c associated there
`With. Each port 404 alloWs communication With other
`resources, e.g., processors or I/O devices, in the computer
`system via associated links, e.g., links 208ai208e of FIG. 2.
`
`
`
`US 7,25l,698 B2
`
`5
`The infrastructure shown in FIG. 4 can be generalized as
`a point-to-point, distributed routing mechanism Which com
`prises a plurality of segments interconnecting the systems
`processors according to any of a variety of topologies, e.g.,
`ring, mesh, etc. Each of the endpoints of each of the
`segments is associated With a connected processor Which has
`a unique node ID and a plurality of associated resources
`Which it “oWns,” e.g., the memory and I/O to Which it’s
`connected.
`The routing tables associated With each of the nodes in the
`distributed routing mechanism collectively represent the
`current state of interconnection among the computer system
`resources. Each of the resources (e.g., a speci?c memory
`range or I/O device) oWned by any given node (e.g.,
`processor) is represented in the routing table(s) associated
`With the node as an address. When a request arrives at a
`node, the requested address is compared to a tWo level entry
`in the node’s routing table identifying the appropriate node
`and link, i.e., given a particular address Within a range of
`addresses, go to node x; and for node x use link y.
`As shoWn in FIG. 4, processor 202 can conduct point-to
`point communication With three other processors according
`to the information in the associated routing tables. Accord
`ing to a speci?c embodiment, routing tables 40611-4060
`comprise tWo-level tables, a ?rst level associating the unique
`addresses of system resources (e.g., a memory bank) With a
`corresponding node (e.g., one of the processors), and a
`second level associating each node With the link (e.g.,
`208a-208e) to be used to reach the node from the current
`node.
`Processor 202 also has a set of JTAG handshake registers
`408 Which, among other things, facilitate communication
`betWeen the service processor (e.g., service processor 212 of
`FIG. 2) and processor 202. That is, the service processor can
`Write routing table entries to handshake registers 408 for
`eventual storage in routing tables 40611-4060. It should be
`understood that the processor architecture depicted in FIG.
`4 is merely exemplary for the purpose of describing a
`speci?c embodiment of the present invention. For example,
`a feWer or greater number of ports and/ or routing tables may
`be used to implement other embodiments of the invention.
`As mentioned above, the basic protocol upon Which the
`clusters in speci?c embodiments of the invention are based
`provides for a limited node ID space Which, according to a
`particular implementation, is a 3-bit space, therefore alloW
`ing for the unique identi?cation of only 8 nodes. That is, if
`this basic protocol is employed Without the innovations
`represented by the present invention, only 8 nodes may be
`interconnected in a single cluster via the point-to-point
`infrastructure. To get around this limitation, the present
`invention introduces a hierarchical mechanism Which pre
`serves the single-layer identi?cation scheme Within particu
`lar clusters While enabling interconnection With and com
`munication betWeen other similarly situated clusters and
`processing nodes.
`According to a speci?c embodiment, one of the nodes in
`each multi-processor cluster is an interconnection controller,
`e.g., interconnection controller 230 of FIG. 2, Which man
`ages the hierarchical mapping of information thereby
`enabling multiple clusters to share a single memory address
`space While simultaneously alloWing the processors Within
`its cluster to operate and to interact With any processor in
`any cluster Without “knowledge” of anything outside of their
`oWn cluster. The interconnection controller appears to its
`associated processor to be just another one of the processors
`or nodes in the cluster.
`
`20
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`In the basic protocol, When a particular processor in a
`cluster generates a request, a set of address mapping tables
`are employed to map the request to one of the other nodes
`in the cluster. That is, each node in a cluster has a portion of
`a shared memory space With Which it is associated. There are
`different types of address mapping tables for main memory,
`memory-mapped I/O, different types of I/O space, etc. These
`address mapping tables map the address identi?ed in the
`request to a particular node in the cluster.
`A set of routing tables are then employed to determine
`hoW to get from the requesting node to the node identi?ed
`from the address mapping table. That is, as discussed above,
`each processor (i.e., cluster node) has associated routing
`tables Which identify a particular link in the point-to-point
`infrastructure Which may be used to transmit the request
`from the current node to the node identi?ed from the address
`mapping tables. Although generally a node may correspond
`to one or a plurality of resources (including, for example, a
`processor), it should be noted that the terms node and
`processor are often used interchangeably herein. According
`to a particular implementation, a node comprises multiple
`sub-units, e.g., CPUs, memory controllers, I/O bridges, etc.,
`each of Which has a unit ID.
`In addition, because individual transactions may be seg
`mented in non-consecutive packets, each packet includes a
`unique transaction tag to identify the transaction With Which
`the packet is associated With reference to the node Which
`initiated the transaction. According to a speci?c implemen
`tation, a transaction tag identi?es the source node (3-bit
`?eld), the source node unit (2-bit ?eld), and a transaction ID
`(5-bit ?eld).
`Thus, When a transaction is initiated at a particular node,
`the address mapping tables are employed to identify the
`destination node (and unit) Which are then appended to the
`packet and used by the routing tables to identify the appro
`priate link(s) on Which to route the packet. The source
`information is used by the destination node and any other
`nodes Which are probed With the request to respond to the
`request appropriately.
`According to a speci?c embodiment and as mentioned
`above, the interconnection controller in each cluster appears
`to the other processors in its cluster as just another processor
`in the cluster. HoWever, the portion of the shared memory
`space associated With the interconnection controller actually
`encompasses the remainder of the globally shared memory
`space, i.e., the memory associated With all other clusters in
`the system. That is, from the perspective of the local
`processors in a particular cluster, the memory space associ
`ated With all of the other multi-processor clusters in the
`system are represented by the interconnection controller(s)
`in their oWn cluster.
`According to an even more speci?c embodiment Which
`Will be described With reference to FIG. 5, each cluster has
`?ve nodes (e.g., as shoWn in FIG. 2) Which include four
`processors 202a-d and an interconnection controller 230,
`each of Which is represented by a 3-bit node ID Which is
`unique Within the cluster. As mentioned above, each pro
`cessor (i.e., cluster node) may represent a number of sub
`units including, for example, CPUs, memory controllers,
`etc.
`An illustration of an exemplary address mapping scheme
`designed according to the invention and assuming such a
`cluster con?guration is shoWn in FIG. 5. In the illustrated
`example, it is also assumed that the global memory space is
`shared by 4 such clusters also referred to herein as quads (in
`that each contains four local processors). As Will be under
`
`
`
`US 7,251,698 B2
`
`7
`stood, the number of clusters and nodes Within each cluster
`may vary according to di?ferent embodiments.
`To extend the address mapping function beyond a single
`cluster, each cluster maps its local memory space, i.e., the
`portion of the global memory space associated With the
`processors in that cluster, into a contiguous region While the
`remaining portion of the global memory space above and
`beloW this region is mapped to the local interconnection
`controller(s). The interconnection controller in each cluster
`maintains tWo mapping tables: a global map and local map.
`The global map maps outgoing requests to remote clusters.
`The local map maps incoming requests from remote clusters
`to a particular node Within the local cluster.
`Referring noW to FIG. 5, each local cluster has a local
`memory map (501*504), Which maps the local memory
`space (i.e., the contiguous portion of the global memory
`space associated With the local processors) into the respec
`tive nodes and maps all remote memory spaces (i.e., the
`remainder of the global memory space) into one or tWo map
`entries associated With the local interconnection controller
`(s), e.g., Node 4 ofQuad 3. Each node in the local cluster has
`a copy of the local map. The interconnection controller in
`each cluster also maintains a global map (505*508) relating
`these remote memory spaces With each of the other clusters
`in the system. Each interconnection controller uses its copy
`of the local map (509*511) to map requests received from
`remote cluster