IPR2015-00159, No. 1007-2 Exhibit - Duato Interconnection Networks An Engineering Approach, Part 2 of 2 (P.T.A.B. Oct. 28, 2014)

CHAPTER 2. MESSAGE SVVITCHING LAYER
`
`base latency for the transmission of a message of L bits along D channels. Make the analogy
`between packets without headers and flits in wormhole switching, and determine the optimal
`flit size.
`
`Consider a network using Wormhole switching, and a node architecture that is able to send
`and receive up to four packets simultaneously without interference. The start-up latency to
`initiate a new packet transmission after initiating the previous one is ts. Assuming that there
`are four minimal paths between two nodes that are D links apart, compute the base latency
`to send a message of L bits when the message is split into four sequences of packets, each
`sequence following a difierent path. Assume that routers are input-buffered only.
`
`A hierarchical network topology has channels of two different widths, W and —V2Y—, all of them
`being clocked at the same frequency B. The network uses wormhole switching, and routers
`have input buffers and output buffers deep enough to avoid filling the buffers in the absence of
`contention. Internal data paths are W bits wide. Compute the base latency to send a message
`of L bits across two channels in two cases: (a) the width of the first and second channels are W
`and %, respectively; (b) the Width of the first and second channels are % and VV, respectively.
`Assume that messages are not split into packets.
`
`71
`
`

`
`Chapter 7
`
`Network Architectures
`
`In many instances, particularly in fine-grained parallel machines, the performance of the applications
`are communication-limited rather than computation-limited. In this case, the design of the network
`architecture is crucial. By network architecture we refer to the topology of interconnection between
`the processing nodes, and the data and control path of the routers within the individual nodes.
`The design of the network architecture is fundamentally a process of making trade-offs between
`performance parameters such cost, latency, and throughput, and physical constraints such as area,
`wireability, and I/O limitations. Since physical constraints are continually evolving with advances
`in fabrication and packaging technology, so too are the appropriate choices of network topology and
`design alternatives for the router architecture. This chapter discusses basic issues facing the choice
`of network topologies and presents the design of router architectures that have been successfully
`employed within these topologies.
`Chapter 1 describes many distinct topologies that have been proposed for use in multiprocessor
`architectures. The majority of these topologies are either configurations of, or isomorphic to, It-ary
`n-cube networks. These include the binary hypercube, tori, rings, and meshes. Indirect networks
`such as the indirect binary n-cube and Omega networks are also topologically closely related to these
`networks. Low dimensional k-ary n-cube networks possess desirable features with respect to their
`physical implementations, such as constant degree and ease of embedding in physical space. They
`also possess desirable properties with respect to message performance such as simple distributed
`routing algorithms. As a result, these networks have been used in the majority of commercial
`systems developed in the past decade. These include the iPSC series of machines from Intel, the
`Cray T3D and T3E, the Ametek 2010, MasPar MP-1 and MP-2, and Thinking Machines CM-1 and
`CM-2. These networks have formed the communications substrate of many commercial machines
`and are therefore the focus of this chapter. We observed that the network topology is primarily
`influenced by the packaging technology which places constraints on the wiring density, chip pin-
`out, and wire lengths. In the presence of these constraints, topological design issues must reconcile
`conflicting demands in choices of channel width, network dimensionality, and wire length. Thus,
`this chapter initially focuses on the optimization of the network topology.
`While the architecture of the routers is certainly related to the topology of the interconnect,
`their design is primarily influenced by the switching technique that they are designed to support.
`Generic issues can be broadly classified into those dealing with internal data and control paths
`and those dealing with the design of the interrouter physical links. Specific design choices include
`internal switch design, buifer management, use of virtual channels, and physical channel flow control
`strategies. The remainder of the chapter presents descriptions of several modern router architectures
`
`305
`
`sed by
`xercise
`I Cl that
`
`: en by
`. -based
`
`oki [74]
`
`itched
`
`rsed by
`irection
`-_ across
`. What
`ashion?
`
`72
`
`

`
`306
`
`CHAPTER 7. NETWORK ARCHITECTURES
`
`as examples of specific design choices. Rather than present a comprehensive coverage of available
`architectures, we have tried to select illustrative examples from the range of currently available
`architectures.
`
`7.1 Network Topology and Physical Constraints
`For a fixed number of nodes, the choice of the number of dimensions of a k-ary n-cube represents 3
`fundamental trade-ofl between network diameter and node degree. This choice of network dimension
`also places different demands on physical resources such as wiring area and number of chip I/Os. For
`a given implementation technology, practical constraints on these physical resources will determine
`architectural features such as channel widths, and as a result determine the no-load message latency;
`message latency in the absence of traffic. Similarly, these constraints will also determine the degree
`of congestion in the network for a given communication pattern, although accurate modeling of such
`dynamic behavior is more difficult. It thus becomes important to model the relationships between
`physical constraints and topology, and the resulting impact on performance. Network optimization is
`the process of utilizing these models in selecting topologies that best match the physical constraints of
`the implementation. The following subsections discuss the dominant constraints and the construction
`of analytic models that incorporate their effects on the no-load message latency.
`
`7.1.1 Bisection Bandwidth Constraints
`
`One of the physical constraints facing the implementation of interconnection networks is the available
`wiring area. The available wiring area is determined by the packaging technology, i.e., does the
`network reside on a chip, multichip module or printed circuit board.
`In particular, VLSI systems
`are generally wire-limited: the silicon area required by these systems is determined by interconnect
`area and the performance is limited by the delay of these interconnections. Since these networks
`must be implemented in three dimensions, for an N node network the available wiring space grows as
`N% while the network traffic can grow as N. Clearly machines cannot be scaled to arbitrarily large
`sizes without eventually encountering wiring limits. The choice of network dimension is influenced
`by how well the resulting topology makes use of available wiring area. Such an analysis requires
`performance measures that can relate network topology to the wiring constraints.
`On such performance measure is the bisection width of a network: the minimum number of wires
`that must be cut when the network is divided into two equal sets of nodes. Since the primary goal
`is one of assessing bandwidth and resulting wiring demands, only Wires carrying data are generally
`counted, excluding control, power, and ground signals. However, these effects are relatively easy to
`incorporate in the following analysis. The collective bandwidth over these wires is referred to as the
`bisection bandwidth. For example, consider a 2-D torus with N = k2 nodes. Assume that each pair
`of adjacent nodes are connected with one bidirectional data channel of Width W bits. If this network
`is divided into two halves, there will be 2W\/N wires crossing this bisection. The factor of 2 is due
`to the wraparound channels. Imagine what happens when we bisect a 3-D torus with N = k3 nodes.
`Each half comprises of 72-3 nodes. A 2-D plane of k2 nodes has links crossing the bisection, leading
`to a bisection width of 2Wk2 2 2Wk”“1. These cases are illustrated in Figure 7.1. For the sake of
`clarity the wraparound links in the 3-D topology are not shown. In general, a bisection of a k-ary
`n-cube will cut channels from a k-ary (n — 1)-cube leading to a bisection width of 2Wk“‘1.
`Using the above approach we can compute the bisection width of a number of popular topologies.
`These are summarized in Table 7.1. The implicit assumption here is that all of these topologies
`are regular,
`i.e., each dimension has the same radix. Consider the case of networks with N =
`
`73
`
`

`
`NETWORK TOPOLOGY AND PHYSICAL CONSTRAINTS
`
`-M’
`.2.»
`
`(3)
`
`(b)
`
`Figure 7.1. Examples of the computation of bisection width:
`(wraparound links not shown).
`
`(a) 2-D torus.
`
`(b) 3-D torus
`
`kl X kg >< .. . >< kn nodes. In this case the minimum bisection width is orthogonal to the dimension
`with the largest radix. For example, this is the case shown in Figure 7.1.
`If the largest radix is
`N
`km, then we have F; nodes with channels across this bisection. Thus, the bisection width can be
`expressed as 22/N bits.
`The above cgmputation assumes that each node devotes W pins to communication with a neigh-
`bor along a dimension. No assumptions are made about how these pins are used. In general, given a
`set of W pins between adjacent nodes, these may be utilized as two (opposite) % bit unidirectional
`channels, one W-bit bidirectional channel, or one W-bit unidirectional channel. In the computing
`bisection width, the manner is which these channels are used is not particularly important. However,
`it is central to the evaluation of performance since channel widths directly impact message latency.
`We will see in later sections that assumptions about how these pins are used must be unambiguous
`and clearly stated.
`
`7.1.2 Node Size Constraints
`
`In addition to wiring space, a second physical constraint is the number of I/ Os available per router.
`This is referred to as the node size. The construction of networks under a constant channel width
`for an arbitrary number of dimensions is impractical since the node size is linear in the number of
`dimensions. For example, consider a 2-D network and assume a baseline channel width of 16 data
`
`Table 7.1. Examples of bisection Width and node size of some common networks (t X t switches are
`assumed in the Omega network).
`
`Network
`Bisection Widtlfl Node Size_|
`k—ary n-cube
`2Wk"‘1
`2Wn
`Binary n-cube 4 E?
`nW
`n-dimensional mesh
`Wk"”1
`2Wn
`
`Omega network
`
`NW
`
`2tW
`
`74
`
`

`
`308
`
`CHAPTER 7. NETWORK ARCHITECTURES
`
`bits, 8 control bits, and full-duplex channels, i.e., two unidirectional channels between adjacent nodes_
`This would correspond to 192 signal pins not including power, ground, and some miscellaneous I/05,_
`This is certainly feasible in current technology. For an n-dimensional network, the router will have
`48 pins in each direction in each dimension for a total pin-out of 9671.. If we consider a 10—D network
`with the same channel width, we will require 960 pins, or close to 500 pins just for the channe1s
`in a more modest 5-D network. As we progress to 32-bit channels, the feasible network dimension
`becomes smaller. Thus, practical considerations place limits on channel width as a function of
`dimensionality [1, 3].
`Even if we consider the bisection width constraints and networks being wiring area limited, it is
`likely that before networks encounter a bisection width limit, they may encounter the pin-out limit_
`This is evident from the following observation. Consider the implementation of a k-ary n—cube with
`the bisection width constrained to be N bits. The maadmum channel width determined by this
`bisection width is given by
`
`k
`(7.1)
`Bisection width = 2Wk“‘1 = N => W = 5
`Note that this is the maximum channel width permitted under this bisection width constraint.
`A router does not have to provide this number of pins across each channel. However, if it does,
`the router will have a channel width of W bits between adjacent routers in each direction in each
`dimension for a total pin-out of 2nW = nk. A network of 220 nodes will have a pin-out of 2,048 in
`a 2-D configuration and a pin-out of 128 in a 4-D configuration. The former is certainly infeasible
`in 1997 technology for commercial single-chip packaging.
`In general, both the maximum number
`of available pins and the wiring area constraint as represented by the bisection width place upper
`bounds on the channel width. If the maximum number of pins available for data channels is given
`by P, and the bisection width is given by B, then the channel width is constrained by the following
`two relations. The feasible channel width is the smaller of the two.
`P
`W < --
`‘ 211
`
`B
`(7.2)
`W 3 WH
`To understand the effect of channel width on message latency consider the following. For a
`fixed pin-out, higher-dimensional networks will have smaller channel widths: a fixed number of pins
`partitioned across a higher number of dimensions. As we reduce the number of dimensions, the
`channel width increases. For example, if the channel width of an n-dimensional binary hypercube is
`Wh, then the total number of pins is nW;, (note the absence of the factor of 2 since It = 2 is special
`case with no wraparound channels).
`If we keep this pin-out constant, then for an m-dimensional
`torus with kg" nodes, the pin-out is 2mW,,,. The channel width of the torus is given by
`
`W,,,=W,,><§><i
`717.
`
`(7.3)
`
`However, if we keep the number of nodes constant, we have
`
`(7.4)
`N=2"=k;"=>k1=N:»"
`Therefore we see that as the dimension decreases, the linear increase in channel width is offset
`by an exponential increase in the distance traveled by a message in each dimension. Expressions for
`message latency must capture these eflects in enabling the analysis of trade-offs between bisection
`Width, pin-out, and wiring length.
`
`75
`
`

`
`7.1. NETWORK TOPOLOGY AND PHYSICAL CONSTRAINTS
`
`7.1.3 Wire Delay Constraints
`
`As pointed out in [71], in VLSI systems the cycle time of the network is determined by the maximum
`wire delay, while the majority of the power is expended in driving these wires. Thus, topologies that
`can be embedded in two and three dimensions with short wires will be preferred. While even low-
`dimensional networks may logically appear to have long wires due to the wraparound connections,
`these can be avoided by interleaving nodes as shown in Figure 7.2 [258, 312]. However, when higher-
`dimensional networks are embedded in two and three dimensions, longer long wires do result. For
`the purposes of analyzing the effect of wire length we can determine this increase in wire length as
`follows
`The following analysis is based on an embedding in two dimensions, although it can be
`‘extended to three dimensions in a straightforward manner.
`A k—a.ry n—cube is embedded in two dimensions by embedding 125 dimensions of the network into
`each of the two physical dimensions (assuming an even number of dimensions). After embedding
`the first two dimensions each additional dimension increases the number of nodes in the network
`by a factor of k. Embedding in two dimensions provides an increase in the number nodes in each
`physical dimension by a factor of
`If we ignore wire width and wiring density effects, the length
`of the longest wire in each physical dimension is also increased by a factor of
`Note that if we
`had to account for the thickness of the wires, the length of the longest wire would actually grow
`faster than M; as we increase the number of dimensions (see [3] for a good discussion).
`Ignoring
`wire width effects, the length of the longest wire in a 2-D embedding of an n-dimensional network
`increases by a factor of kn’? over the maximum wire length of a 2-D network. For a generalization
`to embedding in three dimensions and higher, refer to Exercise 7.3.
`
`00606999
`
`76
`
`

`
`310
`
`CHAPTER 7. NETWORK ARCHITECTURES
`
`the
`The delay along the wire may follow one of several models depending upon the length,
`implementation medium, e.g., aluminum on silicon, copper on polyimide, etc. The three Common
`wire delay models include the linear, logarithmic, and constant delay models.
`
`Delay oc logel Logarithmic delay model
`
`Delay oc l
`
`Linear delay model
`
`Delay oc C’
`
`Constant delay model
`
`where l is the wire length. These delay models are incorporated into a model of average message
`latency developed in the following section.
`
`7.1.4 A Latency Model
`
`Given the impact of the preceding physical constraints, we can now develop a model of the no-load
`latency of a message in a k-ary n-cube in the presence of these constraints. Such relationships
`between performance and implementation constraints help us make informed trade-offs in the choice
`of network topology. In the following We assume a Wormhole-switched network, and the development
`follows that provided in
`Consider the base latency of a wormhole-switched message from Equation 2.4.
`
`twormhole : D(t1- ‘l’ ts ‘l’ tux) ‘l’ Ina'x(ts7tw)
`
`L
`
`where t,. is the time to make a routing decision, is is the delay through the switch, and tw is the wire
`delay across the physical channel. The first term in the right hand side of the expression represents
`the delay due to the distance between the source and destination. The major components are the
`routing, switch, and link delays. The routing protocols discussed in Chapter 4 implement decisions
`that can be quite complex relative to the no-load delay of a data flit through the switch. While the
`impact of this routing delay is reduced for long messages, it may be nonnegligible for small messages,
`and is represented by t,. It is sometimes the case that the switch delays (ts) are greater than the
`link delays, and as a result have a nonnegligible impact on the performance of the network. The
`development of the latency model in this section assumes switch delays dominate wire delays and
`follows the approach in
`Later in this chapter we consider the case where wire delays dominate
`switch delays. Finally, tu, represents the wire delay across the physical link. The above expression
`is based on a router architecture where the inputs and outputs are buffered. Thus, once a message
`pipeline has been set up, the period of this pipeline is determined by the greater of the switch delay
`and wire delay. If we had used an input only buffered switch, then max(t,, tw) would have simply
`been replaced by (t, + tw).
`Under random traflic, the value of the distance D above can be replaced by the average distance
`between any pair of nodes. With bidirectional links, in an n-dimensional network a message will
`travel an average of § links in a dimension when k is even and Q2?) links when k is odd. To
`capture the effect of wire delays the routing, switching, and wire delays can be normalized to the
`wire delay between adjacent nodes in a 2-D network. This will permit analysis that would remain
`relevant with improvements in technology. As new packaging techniques, materials, and interconnect
`media evolve, wire delays between adjacent nodes in a 2-D embedding will improve. The latency
`expression using normalized delays will represent performance relative to this improved base 2-D
`case. Let the switch delay and routing delay be represented as integer multiples of the wire delay in
`a 2-D network:
`.9 and 7', respectively. From the earlier discussion, we know the value of tw relative
`
`77
`
`

`
`7.1. NETWORK TOPOLOGY AND PHYSICAL CONSTRAINTS
`
`311
`
`to a 2-D network. Thus, the latency expression can be written as follows (for networks where k is
`even):
`
`k
`
`..
`
`twormhole = nZ(7‘ + 5 + 163-1) + max(s, k?‘1)
`
`1.
`
`L
`
`(7.6)
`
`The above expression is based on a linear delay model for wires. For short wires, delay is
`logarithmic [71] and for comparison purposes, it is useful (though perhaps unrealistic) to have model
`based on constant wire delay. The above expression is normalized to the wire delay in a 2-D mesh.
`Therefore, in the constant delay model, wire delay simply remains at 1. In the logarithmic delay
`- model, delay is a logarithmic function of wire length, i.e., 1 + logel [71]. The latency expression
`under the logarithmic wire delay model can be written as
`
`twormhole = 71% [T + 5 + 1 +
`
`— 1) loge k] + max [s,1 +
`
`— 1) loge k)
`
`For the constant wire delay model the latency expression is
`
`k
`
`twormhuze = n:1—(7‘ —|— s -1- 1) —|— s
`
`(7.8)
`
`The latency expressions can be viewed as the sum of two components: a distance component which
`is the delay experienced by the header flit, and a message component, which is the delay due to the
`length of the message. The expressions also include terms representing physical constraints (wire
`length, channel widths), network topology (dimensions, average distance), applications (message
`lengths) and router architecture (routing and switching delays). Although in the above expression
`routing and switching delays are constants, we could model them as an increasing function of network
`dimension. This appeals to intuition since a larger number of dimensions may make routing decisions
`more complex, and increase the delay through internal networks that must switch between a larger
`number of router inputs and outputs. We revisit this issue in the exercises at the end of this chapter.
`The above expression provides us with insight into selecting the appropriate number of dimensions
`for a fixed number of nodes. The minimum latency is realized when the component due to distance
`is equal to the component due message length. The choice of parameters such as the switching and
`routing delays, channel widths, and message lengths determine the dimension at which this minimum
`is realized. The parameters themselves are not independent and are related by implementation
`constraints. Examples of such analysis are presented in the following sections.
`
`7.1.5 Analysis
`
`In this section we focus on the selection of the optimal dimension given a fixed number of nodes.
`The optimal dimension realizes the minimal value of the no-load latency. The choice of the optimal
`dimension is dependent on the technological parameters. Wiring density, pin-out, and wire delays
`each individually define a dimension that minimizes the no-load latency performance of the network.
`The following analysis is intended to demonstrate how the topological features of the network are
`dependent on constraints on each these three physical features, and thus enable us to select the
`topology that optimizes the performance of an implementation.
`
`78
`
`

`
`312
`
`Constant Bisection Width
`
`CHAPTER 7. NETWORK ARCHITECTURES
`
`g resource is
`let us assume this fixed
`ary n-cube with k : 2, and single-bit channels
`= 1 in the expression in Table 7.1, we obtain this
`e differs from the expression for the
`
`width in the above expression
`necessity of being careful about
`would have been replaced by 7749 rather than
`what exactly a W-bit channel refers to in order to maintain fair comparisons between networks.
`Comparing W-bit bidirectional channels and W-bit, dual, unidirectional channels is clearly not a fair
`comparison. Although both instances are sometimes described as W-bit channels between adjacent
`nodes, the latter has twice as many signals as the former. From the perspective of packaging and
`‘n the number of pins devoted to communication with
`
`As we increase the number of dimensions, when the network is laid out in the plane (or even in
`three dimensions for that matter), the number of interprocessor channels that cross the bisection
`increases. If this bisection width is
`
`For a constant number of
`eases the wire length by a factor of N "#115". The cumulative
`efl'ect of all of these trade-offs is illustrated in Figure 7.3. It is apparent that for larger-size systems,
`the distance component of message latency dominates at low dimensions, thus favoring networks
`of dimension 3 or less. Smaller message sizes also effectively increase the impact of switching and
`
`m
`
`D:a,.giv<._r=ql-(fxa-.
`
`79
`
`

`
`NETWORK TOPOLOGY AND PHYSICAL CONSTRAINTS
`
`Constant Bisection Width
`
`1000000
`
`100000
`
`10000
`
`1000 -
`
`100
`
`+ 1M + 64K +256
`
`5
`
`6
`
`7
`
`8 9101214161820
`Dimension
`
`Figure 7.3. Effect of constant bisection width on the no-load message latency.
`
`size systems are less sensitive to these delays. Rather, the relative impact of narrower channels has
`a greater effect on message latency than the switching and routing delays. The optimal number of
`dimensions for smaller systems tends to be about two to three. Finally, the use of linear wire delay
`models penalizes higher-dimension networks. However, use of the logarithmic wire delay models does
`not change the results appreciably. The optimal number of dimensions for large systems (> 1K)
`ranges from three to six, and for small systems (< 512) remains at two for realistic wire delay
`models. For small systems, the (unrealistic) constant delay model and small message sizes increases
`the optimal number of dimensions. However for the case of 16K and 1M nodes, the optimal number
`of dimensions rises to 9'. These results are summarized in Tables 7.2a and 7.2b.
`Finally we note that the behavior is dependent on two additional factors. The latency expressions
`are based on the no—load latency. The presence of link congestion and blocking within the network
`can produce markedly diflerent behavior, particularly if the message destinations and lengths are
`nonuniform. Reference
`presents one approach to incorporating models of network congestion
`into the latency expressions. A second issue is the design of the routers. If routers employ input
`
`Table 7.2. Optimal number of dimensions for constant bisection bandwidth and: (a) Message size
`of 32 bits. (b) Message size of 256 hits.
`
`(a)
`
`(b)
`
`[Wire Delay Mada 1M | 16K ‘
`Linear
`3
`3
`
`2
`
`| Wire Delay Modfl 1M T_16KJ 256
`3
`2
`l—Linear
`3
`
`Logarithmic
`Constant
`
`7
`9
`
`L 2
`
`6
`7
`
`Logarithmic
`Constant
`
`4
`6
`
`3
`4
`
`2
`2
`
`80
`
`

`
`CHAPTER 7. NETWORK ARCHITECTURES
`
`100000
`
`Constant Pin Out
`
`-I" 1M +64K +256
`
`6
`
`7
`
`8 9101214161820
`Dimension
`
`Figure 7.4. Effect of constant pin-out on the no-load message latency.
`
`buffering, the cycle time of the message pipeline is the sum of the switch (t_,) and wire (tw) delays.
`When both input and output buffering is employed, the cycle time is reduced to max (t,, tw). Thus,
`the latter model would favor higher dimensions since it is less dependent on wire delays particularly
`when switching delays are relatively large.
`
`Constant Node Size
`
`The preceding analysis can lead to designs where the individual nodes must have extremely large
`pin-out (often impractical‘ in 1996-1997 technology) to make full use of the bisection bandwidth.
`In these instances network performance is limited not by the wiring resource but chip I/O or pin
`resources
`As we did for bisection width, we can re-write the latency expression to capture
`the dependency of latency on the network dimension assuming the number of chip I/Os are fixed.
`The following expression is written assuming linear wire delay and P pins per chip devoted to
`communication channels. Each router is assumed to communicate over W signals with an adjacent
`router in each direction in each dimension. Thus P = 2nW and substituting W = {-1 the no—load
`latency expression using a linear wire delay model becomes
`
`2nL
`P
`
`twormhale = n§(r + 5 + k%’1)+ max(s, kJ2l‘1) l
`The factor of 2 comes from the availability of channels in each direction within a dimension, and
`the use of W signals between adjacent nodes. For binary hypercubes We would have W = 5- since
`there is only one direction in each dimension. A plot illustrating the optimal number of dimensions
`is shown in Figure 7.4 for the linear wire delay model provided in the above equation, parameterized
`as shown in the figure. The message size is fixed to that of a nominal-sized cache line in a shared-
`memory system, or medium size message (32 bytes) and we assume a nominal number of pins devoted
`to communication (256 pins). The optimal number of dimensions for large numbers of nodes with
`message size of 256 bits is 3, whereas for the smaller-sized system it is 2. If the wire delay model is
`
`(7.10)
`
`81
`
`

`
`7.1. NETWORK TOPOLOGY AND PHYSICAL CONSTRAINTS
`
`315
`
`Table 7.3. Optimal number of dimensions for constant pin-out and: (a) Message size of 32 bits. (b)
`Message size of 256 bits.
`
`(a)
`
`(b)
`
`Linear
`
`Logarithmic
`Constant
`
`3
`
`3
`
`2
`
`Logarithmic
`' Constant
`
`changed to the logarithmic delay model the optimal number of dimensions increases to 6, 4, and 2 for
`the 1M, 16K, and 256—node systems respectively. Reducing the message size has a substantial effect
`further increasing the optimal number of dimensions. These results are summarized in Tables 7.3a
`and 7.3b.
`
`The exponential increase in the distance in each dimension overshadows the linear increase in
`channel width as the network dimension is reduced. Under the constant node size constraint model,
`higher dimensionality is more important than wider channels
`The use of input and output
`buffering further encourages the use of higher dimensions. This model is also particularly sensitive
`to Wire delays as shown in Figure 7.4.
`This analysis is very dependent upon the structure of the physical channel. The model assumes
`that the number of pins devoted to communication with a neighboring node in a dimension is orga-
`nized as a single bidirectional channel of width W bits. The channel width is a major determinant
`of message latency. The pins could have been organized as two unidirectional channels of width 1;’-
`bits each, significantly altering the impact of message size. Our expectation is that with the use of
`bidirectional channels, the network would be relatively more sensitive to traffic patterns and load
`since messages can only cross the physical channel in one direction at a time. Such conflicts may
`be reduced by using full-duplex channels, or making the channels unidirectional. In the latter case
`the average distance a message travels within a dimensions is doubled (to ’“—;—1), and overall average
`message distance would be increased. The preceding analysis could be easily repeated for any of
`these relevant cases with appropriate adjustments to the expression that relates channel widths to
`the number of available data pins. The important point to note is that when pin resources are
`limited, they can be used in different ways with consequently differing impact on performance.
`It is also important to note that while the analysis has been under a single constraint, the bisection
`width and node size are not independent. A fixed node size and dimensionality will produce a fixed
`bisection width. Using the model of the physical channel between adjacent routers as described in
`the preceding paragraphs, we observe that the channel width W = %. Based on this channel width,
`the relationship between node size and bisection width can be written from Table 7.1 as
`
`Bisection width = E k"_1
`n
`
`(7.11)
`
`Bisection width is linearly related to node size and inversely related to the network dimensionality.
`As networks are scaled to larger sizes, the technology-dependent question is one of whether bisection
`Width or node size limits are encountered first. The answer is based on a moving target, and may
`be different at any given point in the technology curve.
`
`82
`
`

`
`316
`
`CHAPTER 7. NETWORK ARCHITECTURES
`
`Constant Wire Throughput
`
`The analysis described in the preceding sections were based on the assumption that switching delays
`were dominant. Therefore the latency model represented these delays (and routing delays) as Values
`normalized to the wire delay of between adjacent routers in a 2-D implementation. As the speed
`of operation continues to rise and systems employ larger networks, wire delays begin to dominate.
`This section develops the latency model in the case where it is wire delays that are dominant rather
`than switching delays.
`When the delay along the wire is small relative to the speed of transmission through the wire
`media and propagation delay through the drivers, the lines can be modeled as simple capacitive
`loads. This is the case for lower—dimensional networks. When higher-dimensional networks are laid
`out in 2 or 3 dimensions, the propagation delay along the longest wire can be substantial. In reality,
`a signal connection actually operates as a transmission line. As the speeds of network operation
`increase, and high-dimensional networks are mapped to two and three dimensions producing longer
`wires, a more accurate analysis utilizes models of physical channel behavior based on transmission
`lines. Consider the following analysis [31]. A signal propagates along a wire at a speed given by the
`following expression.
`
`(7.12)
`
`The value of C0 is the speed of light in vacuum and 6-,. is the relative permittivity

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases