`4as1!
`
`411231anti;#3::“av"jgV
`
`
`
`
`23...”.u...iains!ms]:~51"-.,..25!“
`
`
`
`
`
`
`‘tits::1“‘§a=i‘.‘:%=":13::xi‘:‘.‘.‘..::“".'.1+1521:"«,s*;.;.Hm"-lmilHumuuiflilmiia.
`
`
`11.5?
`
`Attorney Docket No. NWISPO13
`
`PATENT APPLICATION
`
`ROUTING MECHANISMS IN SYSTEMS
`
`HAVING MULTIPLE MULTI-PROCESSOR CLUSTERS
`
`Inventors:
`
`David Brian Glasco
`10337 Ember Glen Drive
`
`Austin, Texas 78726
`United States citizen
`
`Carl Zeitler
`
`11835 Brush Canyon Drive
`Tomball, TX 77375
`United States citizen
`
`Raj esh Kota
`12320 Alamaeda Trace Circle #1 107
`
`Austin, Texas 78727
`India citizen
`
`Guru Prasadh
`1 1 1 11 Callanish Park Drive
`
`Austin, Texas 78750
`United States citizen
`
`Richard R. Oehler
`
`8 Bonny Drive
`Somers, New York 10589
`United States citizen
`
`Assignee:
`
`Newisys, Inc.
`A Delaware corporation
`
`BEYER WEAVER & THOMAS, LLP
`P.O. Box 778
`
`Berkeley, California 94704-0778
`(510) 843-6200
`
`Memory lntegrily, LLE
`lPR2lJ15—I:II1I15B, —IJI1I15B, —I'.IIJ1E3
`EXHIBIT
`Inte r"
`
`-ZIJZE
`
`Memu
`
`
`
`PATENT
`
`Attorney Docket No. NWISPO l 3
`
`ROUTING MECHANISMS IN SYSTEMS
`HAVING MULTIPLE MULTI-PROCESSOR CLUSTERS
`
`BACKGROUND OF THE INVENTION
`
`The present invention relates generally to multi—processor computer systems. More
`
`specifically, the present invention provides techniques for building computer systems having
`
`a plurality of multi—processor clusters.
`
`A relatively new approach to the design ofmulti—processor systems replaces
`
`broadcast communication among processors with a point-to—point data transfer mechanism
`
`in which the processors communicate similarly to network nodes in a tightly-coupled
`
`computing system. That is, the processors are interconnected via a plurality of
`
`communication links and requests are transferred among the processors over the links
`
`according to routing tables associated with each processor. The intent is to increase the
`
`amount of information transmitted within a multi—processor platform per unit time.
`
`10
`
`15
`
`
`
`One limitation associated with such an architecture is that the node ID address space
`
`associated with the point-to—point infrastructure is fixed, therefore allowing only a limited
`
`number of nodes to be interconnected. In addition, the infrastructure is flat, therefore
`
`allowing a single level of mapping for address spaces and routing functions. It is therefore
`
`20
`
`desirable to provide techniques by which computer systems employing such an infrastructure
`
`as a basic building block are not so limited.
`
`
`
`SUMMARY OF THE ]NVENTION_
`
`According to the present invention, a multi-processor system is provided in which a
`
`plurality of multi-processor clusters, each employing a point-to-point communication
`
`infrastructure With a fixed node 1]) space and flat request mapping functions, are
`
`interconnected using additional point-to—point links in such a manner as to enable more
`
`processors to be interconnected than would otherwise be possible with the local point-to-
`
`point architecture. The invention employs a mapping hierarchy to uniquely map various
`
`types of information from local, cluster-specific spaces to globally shared spaces.
`
`Thus, the present invention provides an interconnection controller for use in a
`
`computer system having a plurality ofprocessor clusters interconnected by a plurality of
`
`global links. Each cluster includes a plurality of local nodes and an instance of the
`
`interconnection controller interconnected by a plurality of local links. The interconnection
`
`controller includes circuitry which is operable to map locally generated transmissions
`
`directed to others of the clusters to the global links, and remotely generated transmissions
`
`directed to the local nodes to the local links. According to a specific embodiment, a
`
`computer system employing such an interconnection controller is also provided.
`
`A fiirther understanding of the nature and advantages of the present invention may be
`
`realized by reference to the remaining portions of the specification and the drawings.
`
`
`
`15
`
`
`
`BRIEF DESCRIPTION OF THE DRAVVINGS
`
`Figs. 1A and 1B are diagrammatic representations depicting systems having multiple
`
`clusters.
`
`Fig. 2 is a diagrammatic representation of an exemplary cluster having a plurality of
`
`processors for use with specific embodiments of the present invention.
`
`Fig. 3 is a diagrammatic representation of an exemplary interconnection controller
`
`for facilitating various embodiments of the present invention.
`
`Fig. 4 is a diagrammatic representation of a local processor for use with various
`
`embodiments of the present invention.
`
`Fig. 5 is a diagrammatic representation of a memory mapping scheme according to a
`
`particular embodiment of the invention.
`
`Fig. 6A is a simplified block diagram of a four cluster system for illustrating a
`
`specific embodiment of the invention.
`
`Fig. 6B is a combined routing table including routing information for the four cluster
`
`
`
`15
`
`system of Fig. 6A.
`
`Figs. 7 and 8 are flowcharts illustrating transaction management in a multi—cluster
`
`system according to specific embodiments of the invention.
`
`Fig. 9 is a diagrammatic representation of communications relating to an exemplary
`
`transaction in a multi-cluster system.
`
`20
`
`
`
`DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS
`
`Reference will now be made in detail to some specific embodiments of the invention
`
`including the best modes contemplated by the inventors for carrying out the invention.
`
`Examples of these specific embodiments are illustrated in the accompanying drawings.
`
`While the invention is described in conjunction with these specific embodiments, it will be
`
`understood that it is not intended to limit the invention to the described embodiments. On
`
`the contrary, it is intended to cover alternatives, modifications, and equivalents as may be
`
`included within the spirit and scope of the invention as defined by the appended claims.
`
`Multi-processor architectures having point—to-point communication among their processors
`
`are suitable for implementing specific embodiments of the present invention. In the
`
`following description, numerous specific details are set forth in order to provide a thorough
`
`understanding of the present invention. The present invention may be practiced without
`
`some or all of these specific details. Well known process operations have not been described
`
`in detail in order not to unnecessarily obscure the present invention. Furthermore, the
`
`present application’s reference to a particular singular entity includes that possibility that the
`
`methods and apparatus of the present invention can be implemented using more than one
`
`entity, unless the context clearly dictates otherwise.
`
`Fig. 1A is a diagrammatic representation of one example of a multiple cluster,
`
`multiple processor system which may employ the techniques of the present invention. Each
`
`processing cluster 101, 103, 105, and 107 includes a plurality of processors. The processing
`
`clusters 101, 103, 105, and 107 are connected to each other through point—to-point links
`
`llla—f. The multiple processors in the multiple cluster architecture shown in Fig. 1A share a
`
`global memory space. In this example, the point-to-point links 11 la-f are internal system
`
`connections that are used in place of a traditional front-side bus to connect the multiple
`
`
`
`15
`
`20
`
`
`
`
`
`
`
`
`
`‘T33?!il”’i1iifffii‘E355ijffiii'ii"“£391Emu'5l...i3lim?3%.".,....il‘1%...3?
`
`
`
`
`
`
`
`
`
`
`
`
`
` 'r,;‘:$
`
`10
`
`7:55
`'-WW‘-"‘
`
`processors in the multiple clusters 101, 103, 105, and 107. The point-to-point links may
`
`support any point-to—point coherence protocol.
`
`Fig. 1B is a diagrammatic representation of another example of a multiple cluster,
`
`multiple processor system that may employ the techniques of the present invention. Each
`
`processing cluster 121, 123, 125, and 127 is coupled to a switch 131 through point-to—point
`
`links 141 a-d. It should be noted that using a switch and point-to-point links allows
`
`implementation with fewer point-to-point links when connecting multiple clusters in the
`
`system. A switch 131 can include a general purpose processor with a coherence protocol
`
`interface. According to various implementations, a multi-cluster system shown in Fig. 1A
`
`may be expanded using a switch 131 as shown in Fig. 1B.
`
`Fig. 2 is a diagrammatic representation of a multiple processor cluster such as, for
`
`example, cluster 101 shown in Fig. 1A. Cluster 200 includes processors 202a—202d, one or
`
`more Basic I/O systems (BIOS) 204, a memory subsystem comprising memory banks 206a-
`
`206d, point-to-point communication links 208a-208e, and a service processor 212. The
`
`point-to-point communication links are configured to allow interconnections between
`
`processors 202a—202d, I/O switch 210, and interconnection controller 230. The service
`
`processor 212 is configured to allow communications with processors 202a—202d, I/O switch
`
`210, and interconnection controller 230 via a JTAG interface represented in Fig. 2 by links
`
`214a-214f. It should be noted that other interfaces are supported. I/O switch 210 connects
`
`the rest of the system to I/O adapters 216 and 220, and to BIOS 204 for booting purposes.
`
`According to specific embodiments, the service processor of the present invention
`
`has the intelligence to partition system resources according to a previously specified
`
`partitioning schema. The partitioning can be achieved through direct manipulation of
`
`routing tables associated with the system processors by the service processor which is made
`
`possible by the point-to-point communication infrastructure. The routing tables can also be
`
`15
`
`20
`
`25
`
`
`
`changed by execution of the BIOS code in one or more processors. The routing tables are
`
`used to control and isolate Various system resources, the connections between which are
`
`defined therein.
`
`The processors 202a—d are also coupled to an interconnection controller 230 through
`
`point-to-point links 232a-d. According to various embodiments and as will be described
`
`below in greater detail, interconnection controller 230 performs a variety of functions which
`
`enable the number of interconnected processors in the system to exceed the node ID space
`
`and mapping table limitations associated with each of a plurality of processor clusters.
`
`According to some embodiments, interconnection controller 230 performs a variety of other
`
`functions including the maintaining of cache coherency across clusters. Interconnection
`
`controller 230 can be coupled to similar controllers associated with other multiprocessor
`
`clusters. It should be noted that there can be more than one such interconnection controller
`
`in one cluster. Interconnection controller 230 communicates with both processors 202a-d as
`
`well as remote clusters using a point—to—point protocol.
`
`More generally, it should be understood that the specific architecture shown in Fig. 2
`
`is merely exemplary and that embodiments of the present invention are contemplated having
`
`different configurations and resource interconnections, and a Variety of alternatives for each
`
`of the system resources shown. However, for purpose of illustration, specific details of
`
`
`
`15
`
`cluster 200 will be assumed. For example, most ofthe resources shown in Fig. 2 are
`
`20
`
`assumed to reside on a single electronic assembly. In addition, memory banks 206a—206d
`
`may comprise double data rate (DDR) memory which is physically provided as dual in—line
`
`memory modules (DIMMS). I/O adapter 216 may be, for example, an ultra direct memory
`
`access (UDMA) controller or a small computer system interface (SCSI) controller which
`
`provides access to a permanent storage device. I/O adapter 220 may be an Ethernet card
`
`
`
`adapted to provide communications with a network such as, for example, a local area
`
`network (LAN) or the Internet. BIOS 204 may be any persistent memory like flash memory.
`
`According to one embodiment, service processor 212 is a Motorola MPC855T
`
`microprocessor which includes integrated chipset functions, and interconnection controller
`
`230 is an Application Specific Integrated Circuit (ASIC) supporting the local point-to-point
`
`coherence protocol. Interconnection controller 230 can also be configured to handle a non-
`
`coherent protocol to allow communication with I/O devices. In one embodiment,
`
`interconnection controller 230 is a specially configured programmable chip such as a
`
`programmable logic device or a field programmable gate array. In another embodiment, the
`
`interconnect controller 230 is an Application Specific Integrated Circuit (ASIC). In yet
`
`another embodiment, the interconnect controller 230 is a general purpose processor
`
`augmented with an ability to access and process interconnect packet traffic.
`
`Fig. 3 is a diagrammatic representation of one example of an interconnection
`
`controller 230 for facilitating various aspects of the present invention. According to Various
`
`embodiments, the interconnection controller includes a protocol engine 305 configured to
`
`handle packets such as probes and requests received from processors in various clusters of a
`
`multiprocessor system. The functionality of the protocol engine 305 can be partitioned
`
`across several engines to improve performance. In one example, partitioning is done based
`
`on packet type (request, probe and response), direction (incoming and outgoing), or
`
`transaction flow (request flows, probe flows, etc).
`
`The protocol engine 305 has access to a pending buffer 309 that allows the
`
`interconnection controller to track transactions such as recent requests and probes and
`
`associate the transactions with specific processors. Transaction information maintained in
`
`the pending buffer 309 can include transaction destination nodes, the addresses of requests
`
`for subsequent collision detection and protocol optimizations, response information, tags,
`
`
`
`15
`
`20
`
`25
`
`
`
`and state information. As will become clear, this functionality is leveraged to enable
`
`particular aspects of the present invention.
`
`The interconnection controller has a coherent protocol interface 307 that allows the
`
`interconnection controller to communicate with other processors in the cluster as well as
`
`external processor clusters. The interconnection controller may also include other interfaces
`
`such as a non-coherent protocol interface 311 for communicating with I/O devices (e.g., as
`
`represented in Fig. 2 by links 2080 and 208d). According to various embodiments, each
`
`interface 307 and 311 is implemented either as a full crossbar or as separate receive and
`
`transmit units using components such as multiplexers and buffers. It should be noted that
`
`the interconnection controller 230 does not necessarily need to provide both coherent and
`
`non-coherent interfaces. It should also be noted that an interconnection controller 230 in one
`
`cluster can communicate with an interconnection controller 230 in another cluster.
`
`According to Various embodiments of the invention, processors 202a-202d are
`
`substantially identical. Fig. 4 is a simplified block diagram of such a processor 202 which
`
`includes an interface 402 having a plurality of ports 404a-404c and routing tables 406a-406c
`
`associated therewith. Each port 404 allows communication with other resources, eg,
`
`processors or I/O devices, in the computer system via associated links, e.g., links 208a-2086
`ofFig. 2.
`
`The infrastructure shown in Fig. 4 can be generalized as a point-to-point, distributed
`
`routing mechanism which comprises a plurality of segments interconnecting the systems
`
`processors according to any of a variety of topologies, e.g., ring, mesh, etc. Each of the
`
`endpoints of each of the segments is associated with a connected processor which has a
`
`unique node ID and a plurality of associated resources which it “owns,” eg, the memory
`
`and 1/0 to which it’s connected.
`
`
`
`15
`
`20
`
`
`
`The routing tables associated with each of the nodes in the distributed routing
`
`mechanism collectively represent the current state of interconnection among the computer
`
`system resources. Each of the resources (e.g., a specific memory range or 1/0 device) owned
`
`by any given node (e.g., processor) is represented in the routing table(s) associated with the
`
`node as an address. When a request arrives at a node, the requested address is compared to a
`
`two level entry in the node’s routing table identifying the appropriate node and link, i.e.,
`
`given a particular address within a range of addresses, go to node X; and for node x use link
`
`y.
`
`As shown in Fig. 4, processor 202 can conduct point—to-point communication with
`
`three other processors according to the information in the associated routing tables.
`
`According to a specific embodiment, routing tables 406a-406c comprise two-level tables, a
`
`first level associating the unique addresses of system resources (e.g., a memory bank) with a
`
`corresponding node (e.g., one of the processors), and a second level associating each node
`
`with the link (e.g., 208a—208e) to be used to reach the node from the current node.
`
`Processor 202 also has a set of JTAG handshake registers 408 which, among other
`
`things, facilitate communication between the service processor (e.g., service processor 212
`
`of Fig. 2) and processor 202. That is, the service processor can write routing table entries to
`
`handshake registers 408 for eventual storage in routing tables 406a-406c. It should be
`
`understood that the processor architecture depicted in Fig. 4 is merely exemplary for the
`
`purpose of describing a specific embodiment of the present invention. For example, a fewer
`
`or greater number ofports and/or routing tables may be used to implement other
`
`embodiments of the invention.
`
`As mentioned above, the basic protocol upon which the clusters in specific
`
`embodiments of the invention are based provides for a limited node ID space which,
`
`according to a particular implementation, is a 3-bit space, therefore allowing for the unique
`
`
`
`15
`
`20
`
`25
`
`
`
`10
`
`identification of only 8 nodes. That is, if this basic protocol is employed without the
`
`innovations represented by the present invention, only 8 nodes may be interconnected in a
`
`single cluster Via the point-to-point infiastructure. To get around this limitation, the present
`
`invention introduces a hierarchical mechanism which preserves the single-layer
`
`identification scheme within particular clusters while enabling interconnection with and
`
`communication between other similarly situated clusters and processing nodes.
`
`According to a specific embodiment, one of the nodes in each multi-processor cluster
`
`is an interconnection controller, e.g., interconnection controller 230 of Fig. 2, which
`
`manages the hierarchical mapping of information thereby enabling multiple clusters to share
`
`a single memory address space while simultaneously allowing the processors within its
`
`cluster to operate and to interact with any processor in any cluster without “knowledge” of
`
`anything outside of their own cluster. The interconnection controller appears to its
`
`associated processor to be just another one of the processors or nodes in the cluster.
`
`In the basic protocol, when a particular processor in a cluster generates a request, a
`
`set of address mapping tables are employed to map the request to one of the other nodes in
`
`the cluster. That is, each node in a cluster has a portion of a shared memory space with
`
`which it is associated. There are different types of address mapping tables for main memory,
`
`memory-mapped I/O, different types of I/O space, etc. These address mapping tables map
`
`the address identified in the request to a particular node in the cluster.
`
`
`
`:..3
`
`15
`
`20
`
`25
`
`A set of routing tables are then employed to determine how to get from the
`
`requesting node to the node identified from the address mapping table. That is, as discussed
`
`above, each processor (i.e., cluster node) has associated routing tables which identify a
`
`particular link in the point-to-point infrastructure which may be used to transmit the request
`
`from the current node to the node identified from the address mapping tables. Although
`
`generally a node may correspond to one or a plurality of resources (including, for example, a
`
`
`
`ll
`
`processor), it should be noted that the terms node and processor are often used
`
`interchangeably herein. According to a particular implementation, a node comprises
`
`multiple sub-units, e.g., CPUs, memory controllers, I/O bridges, etc., each of which has a
`
`unit ID.
`
`In addition, because individual transactions may be segmented in non-consecutive
`
`packets, each packet includes a unique transaction tag to identify the transaction with which
`
`the packet is associated with reference to the node which initiated the transaction.
`
`According to a specific implementation, a transaction tag identifies the source node (3-bit
`
`field), the source node unit (2—bit field), and a transaction ID (5—bit field).
`
`Thus, when a transaction is initiated at a particular node, the address mapping tables
`
`are employed to identify the destination node (and unit) which are then appended to the
`
`packet and used by the routing tables to identify the appropriate link(s) on which to route the
`
`packet. The source information is used by the destination node and any other nodes which
`
`are probed with the request to respond to the request appropriately.
`
`According to a specific embodiment and as mentioned above, the interconnection
`
`controller in each cluster appears to the other processors in its cluster as just another
`
`processor in the cluster. However, the portion of the shared memory space associated with
`
`the interconnection controller actually encompasses the remainder of the globally shared
`
`memory space, i.e., the memory associated with all other clusters in the system. That is,
`
`from the perspective of the local processors in a particular cluster, the memory space
`
`associated with all of the other multi-processor clusters in the system are represented by the
`
`interconnection controller(s) in their own cluster.
`
`According to an even more specific embodiment which will be described with
`
`reference to Fig. 5, each cluster has five nodes (e.g., as shown in Fig. 2) which include four
`
`processors 202a-d and an interconnection controller 230, each of which is represented by a
`
`
`
`15
`
`20
`
`25
`
`
`
`12
`
`3-bit node ID which is unique within the cluster. As mentioned above, each processor (i.e.,
`
`cluster node) may represent a number of sub-units including, for example, CPUs, memory
`
`controllers, etc.
`
`An illustration of an exemplary address mapping scheme designed according to the
`
`invention and assuming such a cluster configuration is shown in Fig. 5. In the illustrated
`
`example, it is also assumed that the global memory space is shared by 4 such clusters also
`
`referred to herein as quads (in that each contains four local processors). As will be
`
`understood, the number of clusters and nodes within each cluster may vary according to
`
`different embodiments.
`
`To extend the address mapping function beyond a single cluster, each cluster maps
`
`its local memory space, i.e., the portion of the global memory space associated with the
`
`processors in that cluster, into a contiguous region while the remaining portion ofthe global
`
`memory space above and below this region is mapped to the local interconnection
`
`controller(s). The interconnection controller in each cluster maintains two mapping tables: a
`
`global map and local map. The global map maps outgoing requests to remote clusters. The
`
`local map maps incoming requests from remote clusters to a particular node within the local
`
`cluster.
`
`
`
`15
`
`Referring now to Fig. 5, each local cluster has a local memory map (501-504), which
`
`maps the local memory space (i.e., the contiguous portion of the global memory space
`
`associated with the local processors) into the respective nodes and maps all remote memory
`
`spaces (i.e., the remainder of the global memory space) into one or two map entries
`
`associated with the local interconnection contro1ler(s), e.g., Node 4 of Quad 3. Each node in
`
`the local cluster has a copy of the local map. The interconnection controller in each cluster
`
`also maintains a global map (505-508) relating these remote memory spaces with each of the
`
`other clusters in the system. Each interconnection controller uses its copy of the local map
`
`20
`
`25
`
`
`
`13
`
`(509-511) to map requests received fiom remote clusters to the individual nodes in its
`
`cluster.
`
`An exemplary transaction described with reference to Fig. 5 may be illustrative. In
`
`this example, Node 2 in Quad 3 generates a request that maps (via map 501) to the local
`
`interconnection controller (i.e., Node 4). When the interconnection controller receives this
`
`request, its global map 505 maps the address to Quad 2. The interconnection controller then
`
`forwards the request to Quad 2. The interconnection controller at Quad 2 uses its local
`
`memory map to determine the proper node to target for the request ~ Node 1 in this example.
`
`In a particular implementation, each processor or cluster node is limited to eight
`
`memory map registers. The scheme described above with reference to Fig. 5 requires four
`
`entries for the local memory space and at most two registers for remote space. Therefore,
`
`according to more specific embodiments, the two remaining entries can be used to subdivide
`
`regions. The eight mapping register limit requires that all memory local to a quad be
`
`allocated within a contiguous block. The interconnection controller’s local memory map in
`
`such embodiments is also eight entries. However, the size of the interconnection controller’s
`
`
`
`15
`
`global map size is detennined by the number of clusters in the system. According to various
`
`embodiments, the memory mapped I/O space is mapped by an identical set of mapping
`
`registers.
`
`As described above, on the local cluster level, information from address mapping
`
`20
`
`tables is used to identify the appropriate link on which to transmit information to a
`
`destination node within the cluster. To effect transmissions between clusters using the
`
`global mapping described above, a similar mechanism is needed. Therefore, according to
`
`Various embodiments, in addition to the local routing tables associated with each node in a
`
`cluster, the interconnection controller maintains global routing information which maps the
`
`
`
`14
`
`other clusters in the system to the various point—to-point transmission links interconnecting
`
`the clusters (e.g., links 111 of Fig. 1A).
`
`According to a specific embodiment of the invention, two types of local routing
`
`tables are employed: one for directed packets and one for broadcast packets. Each table
`
`(e.g., tables 406 of Fig. 4) maintains a mapping between target nodes and links. For directed
`
`packets, a separate table is used for request and for responses. This allows responses to be
`
`routed back to the requester along the same path as the request. Maintaining the same route
`
`simplifies debugging and is not required for correctness. For broadcast packets, the
`
`corresponding table indicates on which links the broadcast packet is forwarded. A broadcast
`
`packet may thus be routed to multiple links.
`
`
`
`10
`
`In a particular implementation of the interconnection controller of the present
`
`invention, its local tables map a local destination node to one of four links for directed
`
`packets and any number of links for broadcast packets. The interconnection controller also
`
`maintains a global routing table which maps remote destination clusters to a particular
`
`remote link. According to a particular embodiment, the interconnection controller also
`
`supports multicast of packets at the global routing level.
`
`A specific embodiment of a routing mechanism designed according to the present
`
`invention will now be described with reference to Figs. 6A and 6B. System 600 of Fig. 6A
`
`includes four clusters each having a plurality of local nodes including nodes N0 and N1. The
`
`table of Fig. 6B combines all of the local and global routing tables of the system for
`
`illustrative purposes.
`
`As part of an exemplary transaction, a CPU 602 at node N0 in Cluster 0 generates a
`
`packet directed to a CPU 604 at node No in the Cluster 3. This packet could be, for example,
`
`a memory request that maps to a memory controller at that node. Because CPU 602 has no
`
`knowledge of anything outside of its cluster, it generates the packet targeting node N1 in
`
`15
`
`20
`
`25
`
`
`
`15
`
`Cluster 0 (i.e., the local interconnection controller 606) as the destination. As discussed
`
`above, this is due to the fact that the local memory map owned by node No (see the relevant
`
`portion of the table of Fig. 6B) identifies node N1 as corresponding to all memory owned by
`
`remote clusters. Interconnection controller 606 receives the packet, uses its global address
`
`map (e.g., as described above) to determine that the final destination of the packet is Cluster
`
`3, and generates a remote packet targeting Cluster 3. Then, using its global routing table
`
`(i.e., relevant portion of Fig. 6B), interconnection controller 606 determines that this packet
`
`must be sent out on link L1. Similar to the local routing mechanism described above,
`
`information identifying the source and destination cluster is appended to the packet.
`
`When interconnection controller 608 at Cluster 1 receives the packet, it also
`
`determines that the packet is destined for Cluster 3 and determines from its global routing
`
`table (Fig. 6B) that link L2 must be used to send the packet. Interconnection controller 610
`
`at Cluster 3 receives the packet, determines that the packet is targeting the local cluster, and
`
`uses its local routing table (Fig. 6B) to determine that local link Lo must be used to send the
`
`packet to its destination. CPU 604 at node No then receives the packet via link Lo.
`
`According to specific embodiments in which the node ID space is a 3-bit ID space, this
`
`multi-level routing mechanism can be extended to eight local nodes with no specific limit on
`
`the number of clusters.
`
`
`
`15
`
`Embodiments of the invention also address the issue of transaction identification in a
`
`20
`
`25
`
`system having a plurality of multi-processor clusters. In general, the importance of the
`
`unique identification of transactions in a multi-processor environment is understood. And
`
`where the transaction identification or tag space is limited, mechanisms to extend it are
`
`needed to enable the interconnection ofmore than the maximum number ofprocessors
`
`supported by the limited tag space. That is, in an environment with a plurality of clusters
`
`operating with identical local transaction tag spaces, there is a potential for more than one
`
`
`
`16
`
`transaction to be generated in different clusters simultaneously with the identical tag. Where
`
`those transactions occur between nodes in different clusters, the potential for conflict is
`
`obvious. Therefore, embodiments of the present invention provide mechanisms which
`
`extend the local tag spaces such that each transaction in the multi-cluster system is uniquely
`
`identified.
`
`More specifically, these embodiments map transactions from the local transaction tag
`
`space to a larger global transaction tag space. As described above, the local tag space is
`
`specified using the node ID, the unit ID, and a transaction ID. On top of that, the global tag
`
`space is specified using a global cluster 1]) and a global transaction ID. According to one
`
`embodiment, the interconnection controllers in the system use their pending buffers to
`
`simplify the allocation and management of the mapping and remapping actions. According
`
`to an even more specific embodiment and as will be described, additional protocol
`
`management is used to maintain the uniqueness of the global transaction tags.
`
`According to a specific embodiment, all transactions Within a cluster are tagged With
`
`a unique 1]) generated by the requesting node. The processors in each cluster which are not
`
`the interconnection controller support a 3-bit node ID, a 2-bit unit ID and a 5-bit transaction
`
`ID. The combination of these fields creates a 10 bit tag which is unique within the cluster.
`
`The unit ID represents sub-units within a node. It should be noted that a particular node may
`
`or may not include a processor as one of its sub-units, e.g., the node might contain only
`
`
`
`15
`
`20
`
`memory.
`
`According to one embodiment, to extend to the transaction tag space beyond the
`
`local cluster, each cluster’s interconnection controller maps each its cluster’s local tag space
`
`into the global tag space using a Q—bit Cluster ID and a T-bit Transaction 1D. In the
`
`exemplary system in which each cluster has a 5-bit transaction ID and there are four clusters,
`
`25
`
`T might be 7 and Q might be 2.
`
`
`
`17
`
`According to one embodiment illustrated in Fig. 7, the local to global mapping
`
`process is accomplished as follows. New outgoing