`
`base latency for the transmission of a message of L bits along D channels. Make the analogy
`between packets without headers and flits in wormhole switching, and determine the optimal
`flit size.
`
`Consider a network using Wormhole switching, and a node architecture that is able to send
`and receive up to four packets simultaneously without interference. The start-up latency to
`initiate a new packet transmission after initiating the previous one is ts. Assuming that there
`are four minimal paths between two nodes that are D links apart, compute the base latency
`to send a message of L bits when the message is split into four sequences of packets, each
`sequence following a difierent path. Assume that routers are input-buffered only.
`
`A hierarchical network topology has channels of two different widths, W and —V2Y—, all of them
`being clocked at the same frequency B. The network uses wormhole switching, and routers
`have input buffers and output buffers deep enough to avoid filling the buffers in the absence of
`contention. Internal data paths are W bits wide. Compute the base latency to send a message
`of L bits across two channels in two cases: (a) the width of the first and second channels are W
`and %, respectively; (b) the Width of the first and second channels are % and VV, respectively.
`Assume that messages are not split into packets.
`
`71
`
`
`
`Chapter 7
`
`Network Architectures
`
`In many instances, particularly in fine-grained parallel machines, the performance of the applications
`are communication-limited rather than computation-limited. In this case, the design of the network
`architecture is crucial. By network architecture we refer to the topology of interconnection between
`the processing nodes, and the data and control path of the routers within the individual nodes.
`The design of the network architecture is fundamentally a process of making trade-offs between
`performance parameters such cost, latency, and throughput, and physical constraints such as area,
`wireability, and I/O limitations. Since physical constraints are continually evolving with advances
`in fabrication and packaging technology, so too are the appropriate choices of network topology and
`design alternatives for the router architecture. This chapter discusses basic issues facing the choice
`of network topologies and presents the design of router architectures that have been successfully
`employed within these topologies.
`Chapter 1 describes many distinct topologies that have been proposed for use in multiprocessor
`architectures. The majority of these topologies are either configurations of, or isomorphic to, It-ary
`n-cube networks. These include the binary hypercube, tori, rings, and meshes. Indirect networks
`such as the indirect binary n-cube and Omega networks are also topologically closely related to these
`networks. Low dimensional k-ary n-cube networks possess desirable features with respect to their
`physical implementations, such as constant degree and ease of embedding in physical space. They
`also possess desirable properties with respect to message performance such as simple distributed
`routing algorithms. As a result, these networks have been used in the majority of commercial
`systems developed in the past decade. These include the iPSC series of machines from Intel, the
`Cray T3D and T3E, the Ametek 2010, MasPar MP-1 and MP-2, and Thinking Machines CM-1 and
`CM-2. These networks have formed the communications substrate of many commercial machines
`and are therefore the focus of this chapter. We observed that the network topology is primarily
`influenced by the packaging technology which places constraints on the wiring density, chip pin-
`out, and wire lengths. In the presence of these constraints, topological design issues must reconcile
`conflicting demands in choices of channel width, network dimensionality, and wire length. Thus,
`this chapter initially focuses on the optimization of the network topology.
`While the architecture of the routers is certainly related to the topology of the interconnect,
`their design is primarily influenced by the switching technique that they are designed to support.
`Generic issues can be broadly classified into those dealing with internal data and control paths
`and those dealing with the design of the interrouter physical links. Specific design choices include
`internal switch design, buifer management, use of virtual channels, and physical channel flow control
`strategies. The remainder of the chapter presents descriptions of several modern router architectures
`
`305
`
`sed by
`xercise
`I Cl that
`
`: en by
`. -based
`
`oki [74]
`
`itched
`
`rsed by
`irection
`-_ across
`. What
`ashion?
`
`72
`
`
`
`306
`
`CHAPTER 7. NETWORK ARCHITECTURES
`
`as examples of specific design choices. Rather than present a comprehensive coverage of available
`architectures, we have tried to select illustrative examples from the range of currently available
`architectures.
`
`7.1 Network Topology and Physical Constraints
`For a fixed number of nodes, the choice of the number of dimensions of a k-ary n-cube represents 3
`fundamental trade-ofl between network diameter and node degree. This choice of network dimension
`also places different demands on physical resources such as wiring area and number of chip I/Os. For
`a given implementation technology, practical constraints on these physical resources will determine
`architectural features such as channel widths, and as a result determine the no-load message latency;
`message latency in the absence of traffic. Similarly, these constraints will also determine the degree
`of congestion in the network for a given communication pattern, although accurate modeling of such
`dynamic behavior is more difficult. It thus becomes important to model the relationships between
`physical constraints and topology, and the resulting impact on performance. Network optimization is
`the process of utilizing these models in selecting topologies that best match the physical constraints of
`the implementation. The following subsections discuss the dominant constraints and the construction
`of analytic models that incorporate their effects on the no-load message latency.
`
`7.1.1 Bisection Bandwidth Constraints
`
`One of the physical constraints facing the implementation of interconnection networks is the available
`wiring area. The available wiring area is determined by the packaging technology, i.e., does the
`network reside on a chip, multichip module or printed circuit board.
`In particular, VLSI systems
`are generally wire-limited: the silicon area required by these systems is determined by interconnect
`area and the performance is limited by the delay of these interconnections. Since these networks
`must be implemented in three dimensions, for an N node network the available wiring space grows as
`N% while the network traffic can grow as N. Clearly machines cannot be scaled to arbitrarily large
`sizes without eventually encountering wiring limits. The choice of network dimension is influenced
`by how well the resulting topology makes use of available wiring area. Such an analysis requires
`performance measures that can relate network topology to the wiring constraints.
`On such performance measure is the bisection width of a network: the minimum number of wires
`that must be cut when the network is divided into two equal sets of nodes. Since the primary goal
`is one of assessing bandwidth and resulting wiring demands, only Wires carrying data are generally
`counted, excluding control, power, and ground signals. However, these effects are relatively easy to
`incorporate in the following analysis. The collective bandwidth over these wires is referred to as the
`bisection bandwidth. For example, consider a 2-D torus with N = k2 nodes. Assume that each pair
`of adjacent nodes are connected with one bidirectional data channel of Width W bits. If this network
`is divided into two halves, there will be 2W\/N wires crossing this bisection. The factor of 2 is due
`to the wraparound channels. Imagine what happens when we bisect a 3-D torus with N = k3 nodes.
`Each half comprises of 72-3 nodes. A 2-D plane of k2 nodes has links crossing the bisection, leading
`to a bisection width of 2Wk2 2 2Wk”“1. These cases are illustrated in Figure 7.1. For the sake of
`clarity the wraparound links in the 3-D topology are not shown. In general, a bisection of a k-ary
`n-cube will cut channels from a k-ary (n — 1)-cube leading to a bisection width of 2Wk“‘1.
`Using the above approach we can compute the bisection width of a number of popular topologies.
`These are summarized in Table 7.1. The implicit assumption here is that all of these topologies
`are regular,
`i.e., each dimension has the same radix. Consider the case of networks with N =
`
`73
`
`
`
`NETWORK TOPOLOGY AND PHYSICAL CONSTRAINTS
`
`-M’
`.2.»
`
`(3)
`
`(b)
`
`Figure 7.1. Examples of the computation of bisection width:
`(wraparound links not shown).
`
`(a) 2-D torus.
`
`(b) 3-D torus
`
`kl X kg >< .. . >< kn nodes. In this case the minimum bisection width is orthogonal to the dimension
`with the largest radix. For example, this is the case shown in Figure 7.1.
`If the largest radix is
`N
`km, then we have F; nodes with channels across this bisection. Thus, the bisection width can be
`expressed as 22/N bits.
`The above cgmputation assumes that each node devotes W pins to communication with a neigh-
`bor along a dimension. No assumptions are made about how these pins are used. In general, given a
`set of W pins between adjacent nodes, these may be utilized as two (opposite) % bit unidirectional
`channels, one W-bit bidirectional channel, or one W-bit unidirectional channel. In the computing
`bisection width, the manner is which these channels are used is not particularly important. However,
`it is central to the evaluation of performance since channel widths directly impact message latency.
`We will see in later sections that assumptions about how these pins are used must be unambiguous
`and clearly stated.
`
`7.1.2 Node Size Constraints
`
`In addition to wiring space, a second physical constraint is the number of I/ Os available per router.
`This is referred to as the node size. The construction of networks under a constant channel width
`for an arbitrary number of dimensions is impractical since the node size is linear in the number of
`dimensions. For example, consider a 2-D network and assume a baseline channel width of 16 data
`
`Table 7.1. Examples of bisection Width and node size of some common networks (t X t switches are
`assumed in the Omega network).
`
`Network
`Bisection Widtlfl Node Size_|
`k—ary n-cube
`2Wk"‘1
`2Wn
`Binary n-cube 4 E?
`nW
`n-dimensional mesh
`Wk"”1
`2Wn
`
`Omega network
`
`NW
`
`2tW
`
`74
`
`
`
`308
`
`CHAPTER 7. NETWORK ARCHITECTURES
`
`bits, 8 control bits, and full-duplex channels, i.e., two unidirectional channels between adjacent nodes_
`This would correspond to 192 signal pins not including power, ground, and some miscellaneous I/05,_
`This is certainly feasible in current technology. For an n-dimensional network, the router will have
`48 pins in each direction in each dimension for a total pin-out of 9671.. If we consider a 10—D network
`with the same channel width, we will require 960 pins, or close to 500 pins just for the channe1s
`in a more modest 5-D network. As we progress to 32-bit channels, the feasible network dimension
`becomes smaller. Thus, practical considerations place limits on channel width as a function of
`dimensionality [1, 3].
`Even if we consider the bisection width constraints and networks being wiring area limited, it is
`likely that before networks encounter a bisection width limit, they may encounter the pin-out limit_
`This is evident from the following observation. Consider the implementation of a k-ary n—cube with
`the bisection width constrained to be N bits. The maadmum channel width determined by this
`bisection width is given by
`
`k
`(7.1)
`Bisection width = 2Wk“‘1 = N => W = 5
`Note that this is the maximum channel width permitted under this bisection width constraint.
`A router does not have to provide this number of pins across each channel. However, if it does,
`the router will have a channel width of W bits between adjacent routers in each direction in each
`dimension for a total pin-out of 2nW = nk. A network of 220 nodes will have a pin-out of 2,048 in
`a 2-D configuration and a pin-out of 128 in a 4-D configuration. The former is certainly infeasible
`in 1997 technology for commercial single-chip packaging.
`In general, both the maximum number
`of available pins and the wiring area constraint as represented by the bisection width place upper
`bounds on the channel width. If the maximum number of pins available for data channels is given
`by P, and the bisection width is given by B, then the channel width is constrained by the following
`two relations. The feasible channel width is the smaller of the two.
`P
`W < --
`‘ 211
`
`B
`(7.2)
`W 3 WH
`To understand the effect of channel width on message latency consider the following. For a
`fixed pin-out, higher-dimensional networks will have smaller channel widths: a fixed number of pins
`partitioned across a higher number of dimensions. As we reduce the number of dimensions, the
`channel width increases. For example, if the channel width of an n-dimensional binary hypercube is
`Wh, then the total number of pins is nW;, (note the absence of the factor of 2 since It = 2 is special
`case with no wraparound channels).
`If we keep this pin-out constant, then for an m-dimensional
`torus with kg" nodes, the pin-out is 2mW,,,. The channel width of the torus is given by
`
`W,,,=W,,><§><i
`717.
`
`(7.3)
`
`However, if we keep the number of nodes constant, we have
`
`(7.4)
`N=2"=k;"=>k1=N:»"
`Therefore we see that as the dimension decreases, the linear increase in channel width is offset
`by an exponential increase in the distance traveled by a message in each dimension. Expressions for
`message latency must capture these eflects in enabling the analysis of trade-offs between bisection
`Width, pin-out, and wiring length.
`
`75
`
`
`
`7.1. NETWORK TOPOLOGY AND PHYSICAL CONSTRAINTS
`
`7.1.3 Wire Delay Constraints
`
`As pointed out in [71], in VLSI systems the cycle time of the network is determined by the maximum
`wire delay, while the majority of the power is expended in driving these wires. Thus, topologies that
`can be embedded in two and three dimensions with short wires will be preferred. While even low-
`dimensional networks may logically appear to have long wires due to the wraparound connections,
`these can be avoided by interleaving nodes as shown in Figure 7.2 [258, 312]. However, when higher-
`dimensional networks are embedded in two and three dimensions, longer long wires do result. For
`the purposes of analyzing the effect of wire length we can determine this increase in wire length as
`follows
`The following analysis is based on an embedding in two dimensions, although it can be
`‘extended to three dimensions in a straightforward manner.
`A k—a.ry n—cube is embedded in two dimensions by embedding 125 dimensions of the network into
`each of the two physical dimensions (assuming an even number of dimensions). After embedding
`the first two dimensions each additional dimension increases the number of nodes in the network
`by a factor of k. Embedding in two dimensions provides an increase in the number nodes in each
`physical dimension by a factor of
`If we ignore wire width and wiring density effects, the length
`of the longest wire in each physical dimension is also increased by a factor of
`Note that if we
`had to account for the thickness of the wires, the length of the longest wire would actually grow
`faster than M; as we increase the number of dimensions (see [3] for a good discussion).
`Ignoring
`wire width effects, the length of the longest wire in a 2-D embedding of an n-dimensional network
`increases by a factor of kn’? over the maximum wire length of a 2-D network. For a generalization
`to embedding in three dimensions and higher, refer to Exercise 7.3.
`
`00606999
`
`76
`
`
`
`310
`
`CHAPTER 7. NETWORK ARCHITECTURES
`
`the
`The delay along the wire may follow one of several models depending upon the length,
`implementation medium, e.g., aluminum on silicon, copper on polyimide, etc. The three Common
`wire delay models include the linear, logarithmic, and constant delay models.
`
`Delay oc logel Logarithmic delay model
`
`Delay oc l
`
`Linear delay model
`
`Delay oc C’
`
`Constant delay model
`
`where l is the wire length. These delay models are incorporated into a model of average message
`latency developed in the following section.
`
`7.1.4 A Latency Model
`
`Given the impact of the preceding physical constraints, we can now develop a model of the no-load
`latency of a message in a k-ary n-cube in the presence of these constraints. Such relationships
`between performance and implementation constraints help us make informed trade-offs in the choice
`of network topology. In the following We assume a Wormhole-switched network, and the development
`follows that provided in
`Consider the base latency of a wormhole-switched message from Equation 2.4.
`
`twormhole : D(t1- ‘l’ ts ‘l’ tux) ‘l’ Ina'x(ts7tw)
`
`L
`
`where t,. is the time to make a routing decision, is is the delay through the switch, and tw is the wire
`delay across the physical channel. The first term in the right hand side of the expression represents
`the delay due to the distance between the source and destination. The major components are the
`routing, switch, and link delays. The routing protocols discussed in Chapter 4 implement decisions
`that can be quite complex relative to the no-load delay of a data flit through the switch. While the
`impact of this routing delay is reduced for long messages, it may be nonnegligible for small messages,
`and is represented by t,. It is sometimes the case that the switch delays (ts) are greater than the
`link delays, and as a result have a nonnegligible impact on the performance of the network. The
`development of the latency model in this section assumes switch delays dominate wire delays and
`follows the approach in
`Later in this chapter we consider the case where wire delays dominate
`switch delays. Finally, tu, represents the wire delay across the physical link. The above expression
`is based on a router architecture where the inputs and outputs are buffered. Thus, once a message
`pipeline has been set up, the period of this pipeline is determined by the greater of the switch delay
`and wire delay. If we had used an input only buffered switch, then max(t,, tw) would have simply
`been replaced by (t, + tw).
`Under random traflic, the value of the distance D above can be replaced by the average distance
`between any pair of nodes. With bidirectional links, in an n-dimensional network a message will
`travel an average of § links in a dimension when k is even and Q2?) links when k is odd. To
`capture the effect of wire delays the routing, switching, and wire delays can be normalized to the
`wire delay between adjacent nodes in a 2-D network. This will permit analysis that would remain
`relevant with improvements in technology. As new packaging techniques, materials, and interconnect
`media evolve, wire delays between adjacent nodes in a 2-D embedding will improve. The latency
`expression using normalized delays will represent performance relative to this improved base 2-D
`case. Let the switch delay and routing delay be represented as integer multiples of the wire delay in
`a 2-D network:
`.9 and 7', respectively. From the earlier discussion, we know the value of tw relative
`
`77
`
`
`
`7.1. NETWORK TOPOLOGY AND PHYSICAL CONSTRAINTS
`
`311
`
`to a 2-D network. Thus, the latency expression can be written as follows (for networks where k is
`even):
`
`k
`
`..
`
`twormhole = nZ(7‘ + 5 + 163-1) + max(s, k?‘1)
`
`1.
`
`L
`
`(7.6)
`
`The above expression is based on a linear delay model for wires. For short wires, delay is
`logarithmic [71] and for comparison purposes, it is useful (though perhaps unrealistic) to have model
`based on constant wire delay. The above expression is normalized to the wire delay in a 2-D mesh.
`Therefore, in the constant delay model, wire delay simply remains at 1. In the logarithmic delay
`- model, delay is a logarithmic function of wire length, i.e., 1 + logel [71]. The latency expression
`under the logarithmic wire delay model can be written as
`
`twormhole = 71% [T + 5 + 1 +
`
`— 1) loge k] + max [s,1 +
`
`— 1) loge k)
`
`For the constant wire delay model the latency expression is
`
`k
`
`twormhuze = n:1—(7‘ —|— s -1- 1) —|— s
`
`(7.8)
`
`The latency expressions can be viewed as the sum of two components: a distance component which
`is the delay experienced by the header flit, and a message component, which is the delay due to the
`length of the message. The expressions also include terms representing physical constraints (wire
`length, channel widths), network topology (dimensions, average distance), applications (message
`lengths) and router architecture (routing and switching delays). Although in the above expression
`routing and switching delays are constants, we could model them as an increasing function of network
`dimension. This appeals to intuition since a larger number of dimensions may make routing decisions
`more complex, and increase the delay through internal networks that must switch between a larger
`number of router inputs and outputs. We revisit this issue in the exercises at the end of this chapter.
`The above expression provides us with insight into selecting the appropriate number of dimensions
`for a fixed number of nodes. The minimum latency is realized when the component due to distance
`is equal to the component due message length. The choice of parameters such as the switching and
`routing delays, channel widths, and message lengths determine the dimension at which this minimum
`is realized. The parameters themselves are not independent and are related by implementation
`constraints. Examples of such analysis are presented in the following sections.
`
`7.1.5 Analysis
`
`In this section we focus on the selection of the optimal dimension given a fixed number of nodes.
`The optimal dimension realizes the minimal value of the no-load latency. The choice of the optimal
`dimension is dependent on the technological parameters. Wiring density, pin-out, and wire delays
`each individually define a dimension that minimizes the no-load latency performance of the network.
`The following analysis is intended to demonstrate how the topological features of the network are
`dependent on constraints on each these three physical features, and thus enable us to select the
`topology that optimizes the performance of an implementation.
`
`78
`
`
`
`312
`
`Constant Bisection Width
`
`CHAPTER 7. NETWORK ARCHITECTURES
`
`g resource is
`let us assume this fixed
`ary n-cube with k : 2, and single-bit channels
`= 1 in the expression in Table 7.1, we obtain this
`e differs from the expression for the
`
`width in the above expression
`necessity of being careful about
`would have been replaced by 7749 rather than
`what exactly a W-bit channel refers to in order to maintain fair comparisons between networks.
`Comparing W-bit bidirectional channels and W-bit, dual, unidirectional channels is clearly not a fair
`comparison. Although both instances are sometimes described as W-bit channels between adjacent
`nodes, the latter has twice as many signals as the former. From the perspective of packaging and
`‘n the number of pins devoted to communication with
`
`As we increase the number of dimensions, when the network is laid out in the plane (or even in
`three dimensions for that matter), the number of interprocessor channels that cross the bisection
`increases. If this bisection width is
`
`For a constant number of
`eases the wire length by a factor of N "#115". The cumulative
`efl'ect of all of these trade-offs is illustrated in Figure 7.3. It is apparent that for larger-size systems,
`the distance component of message latency dominates at low dimensions, thus favoring networks
`of dimension 3 or less. Smaller message sizes also effectively increase the impact of switching and
`
`m
`
`D:a,.giv<._r=ql-(fxa-.
`
`79
`
`
`
`NETWORK TOPOLOGY AND PHYSICAL CONSTRAINTS
`
`Constant Bisection Width
`
`1000000
`
`100000
`
`10000
`
`1000 -
`
`100
`
`+ 1M + 64K +256
`
`5
`
`6
`
`7
`
`8 9101214161820
`Dimension
`
`Figure 7.3. Effect of constant bisection width on the no-load message latency.
`
`size systems are less sensitive to these delays. Rather, the relative impact of narrower channels has
`a greater effect on message latency than the switching and routing delays. The optimal number of
`dimensions for smaller systems tends to be about two to three. Finally, the use of linear wire delay
`models penalizes higher-dimension networks. However, use of the logarithmic wire delay models does
`not change the results appreciably. The optimal number of dimensions for large systems (> 1K)
`ranges from three to six, and for small systems (< 512) remains at two for realistic wire delay
`models. For small systems, the (unrealistic) constant delay model and small message sizes increases
`the optimal number of dimensions. However for the case of 16K and 1M nodes, the optimal number
`of dimensions rises to 9'. These results are summarized in Tables 7.2a and 7.2b.
`Finally we note that the behavior is dependent on two additional factors. The latency expressions
`are based on the no—load latency. The presence of link congestion and blocking within the network
`can produce markedly diflerent behavior, particularly if the message destinations and lengths are
`nonuniform. Reference
`presents one approach to incorporating models of network congestion
`into the latency expressions. A second issue is the design of the routers. If routers employ input
`
`Table 7.2. Optimal number of dimensions for constant bisection bandwidth and: (a) Message size
`of 32 bits. (b) Message size of 256 hits.
`
`(a)
`
`(b)
`
`[Wire Delay Mada 1M | 16K ‘
`Linear
`3
`3
`
`2
`
`| Wire Delay Modfl 1M T_16KJ 256
`3
`2
`l—Linear
`3
`
`Logarithmic
`Constant
`
`7
`9
`
`L 2
`
`6
`7
`
`Logarithmic
`Constant
`
`4
`6
`
`3
`4
`
`2
`2
`
`80
`
`
`
`CHAPTER 7. NETWORK ARCHITECTURES
`
`100000
`
`Constant Pin Out
`
`-I" 1M +64K +256
`
`6
`
`7
`
`8 9101214161820
`Dimension
`
`Figure 7.4. Effect of constant pin-out on the no-load message latency.
`
`buffering, the cycle time of the message pipeline is the sum of the switch (t_,) and wire (tw) delays.
`When both input and output buffering is employed, the cycle time is reduced to max (t,, tw). Thus,
`the latter model would favor higher dimensions since it is less dependent on wire delays particularly
`when switching delays are relatively large.
`
`Constant Node Size
`
`The preceding analysis can lead to designs where the individual nodes must have extremely large
`pin-out (often impractical‘ in 1996-1997 technology) to make full use of the bisection bandwidth.
`In these instances network performance is limited not by the wiring resource but chip I/O or pin
`resources
`As we did for bisection width, we can re-write the latency expression to capture
`the dependency of latency on the network dimension assuming the number of chip I/Os are fixed.
`The following expression is written assuming linear wire delay and P pins per chip devoted to
`communication channels. Each router is assumed to communicate over W signals with an adjacent
`router in each direction in each dimension. Thus P = 2nW and substituting W = {-1 the no—load
`latency expression using a linear wire delay model becomes
`
`2nL
`P
`
`twormhale = n§(r + 5 + k%’1)+ max(s, kJ2l‘1) l
`The factor of 2 comes from the availability of channels in each direction within a dimension, and
`the use of W signals between adjacent nodes. For binary hypercubes We would have W = 5- since
`there is only one direction in each dimension. A plot illustrating the optimal number of dimensions
`is shown in Figure 7.4 for the linear wire delay model provided in the above equation, parameterized
`as shown in the figure. The message size is fixed to that of a nominal-sized cache line in a shared-
`memory system, or medium size message (32 bytes) and we assume a nominal number of pins devoted
`to communication (256 pins). The optimal number of dimensions for large numbers of nodes with
`message size of 256 bits is 3, whereas for the smaller-sized system it is 2. If the wire delay model is
`
`(7.10)
`
`81
`
`
`
`7.1. NETWORK TOPOLOGY AND PHYSICAL CONSTRAINTS
`
`315
`
`Table 7.3. Optimal number of dimensions for constant pin-out and: (a) Message size of 32 bits. (b)
`Message size of 256 bits.
`
`(a)
`
`(b)
`
`Linear
`
`Logarithmic
`Constant
`
`3
`
`3
`
`2
`
`Logarithmic
`' Constant
`
`changed to the logarithmic delay model the optimal number of dimensions increases to 6, 4, and 2 for
`the 1M, 16K, and 256—node systems respectively. Reducing the message size has a substantial effect
`further increasing the optimal number of dimensions. These results are summarized in Tables 7.3a
`and 7.3b.
`
`The exponential increase in the distance in each dimension overshadows the linear increase in
`channel width as the network dimension is reduced. Under the constant node size constraint model,
`higher dimensionality is more important than wider channels
`The use of input and output
`buffering further encourages the use of higher dimensions. This model is also particularly sensitive
`to Wire delays as shown in Figure 7.4.
`This analysis is very dependent upon the structure of the physical channel. The model assumes
`that the number of pins devoted to communication with a neighboring node in a dimension is orga-
`nized as a single bidirectional channel of width W bits. The channel width is a major determinant
`of message latency. The pins could have been organized as two unidirectional channels of width 1;’-
`bits each, significantly altering the impact of message size. Our expectation is that with the use of
`bidirectional channels, the network would be relatively more sensitive to traffic patterns and load
`since messages can only cross the physical channel in one direction at a time. Such conflicts may
`be reduced by using full-duplex channels, or making the channels unidirectional. In the latter case
`the average distance a message travels within a dimensions is doubled (to ’“—;—1), and overall average
`message distance would be increased. The preceding analysis could be easily repeated for any of
`these relevant cases with appropriate adjustments to the expression that relates channel widths to
`the number of available data pins. The important point to note is that when pin resources are
`limited, they can be used in different ways with consequently differing impact on performance.
`It is also important to note that while the analysis has been under a single constraint, the bisection
`width and node size are not independent. A fixed node size and dimensionality will produce a fixed
`bisection width. Using the model of the physical channel between adjacent routers as described in
`the preceding paragraphs, we observe that the channel width W = %. Based on this channel width,
`the relationship between node size and bisection width can be written from Table 7.1 as
`
`Bisection width = E k"_1
`n
`
`(7.11)
`
`Bisection width is linearly related to node size and inversely related to the network dimensionality.
`As networks are scaled to larger sizes, the technology-dependent question is one of whether bisection
`Width or node size limits are encountered first. The answer is based on a moving target, and may
`be different at any given point in the technology curve.
`
`82
`
`
`
`316
`
`CHAPTER 7. NETWORK ARCHITECTURES
`
`Constant Wire Throughput
`
`The analysis described in the preceding sections were based on the assumption that switching delays
`were dominant. Therefore the latency model represented these delays (and routing delays) as Values
`normalized to the wire delay of between adjacent routers in a 2-D implementation. As the speed
`of operation continues to rise and systems employ larger networks, wire delays begin to dominate.
`This section develops the latency model in the case where it is wire delays that are dominant rather
`than switching delays.
`When the delay along the wire is small relative to the speed of transmission through the wire
`media and propagation delay through the drivers, the lines can be modeled as simple capacitive
`loads. This is the case for lower—dimensional networks. When higher-dimensional networks are laid
`out in 2 or 3 dimensions, the propagation delay along the longest wire can be substantial. In reality,
`a signal connection actually operates as a transmission line. As the speeds of network operation
`increase, and high-dimensional networks are mapped to two and three dimensions producing longer
`wires, a more accurate analysis utilizes models of physical channel behavior based on transmission
`lines. Consider the following analysis [31]. A signal propagates along a wire at a speed given by the
`following expression.
`
`(7.12)
`
`The value of C0 is the speed of light in vacuum and 6-,. is the relative permittivity