throbber
[CHS93]
`
`Architectural Tradeoffs in
`Field-Programmable-Device-Based Computing Systems
`Pak K. Chau'and Martiue D.F. Schlagt
`Computer Engineering
`University of California, Santa Cruz
`Santa Cruz, California 95064 USA
`
`Reprogrammable Field-Programmable Gate Ar­
`rays (FPGAs) have enabled the realization of high-
`performance and affordable reconfigurable comput­
`ing engines. We examine the architectural tradeoffs
`involved in designing general purpose FPGA-based
`computing systems with field-programmable gate ar­
`rays and field-programmable interconnects. The
`fact
`that FPGAs provide both programmable logic and
`programmable interconnects raises numerous design
`issues that need to be considered with care. Factors
`that influence the tradeoffs are routability, rearrange-
`ability, and speed.
`
`I FPGA-based computing systems
`Reprogrammable FPGAs form the basis of high-
`performance and affordable reconfigurable comput­
`ing, prototyping and emulation systems [1, 2, 3, 4, 5,
`6, 7]. Interconnections schemes in these systems vary
`quite a bit. In SPLASH, PAM, QUICKTURN GANGLION
`and ANYBOARD
`[1,3, 4, 5, 7],
`the FPGAs are directly
`connected to ea
`ch other in a
`fixed pattern. In BORG
`and REALIZER [6, 2], logic and interconnection are
`separate. The REALIZER uses the partial crossbar
`interconnection architecture to connect the FPGAs,
`while the borg has complete interconnection. In
`this paper, we shall investigate the design of inter­
`connection architectures using multiple crossbars and
`discuss the architectural tradeoffs in the design of
`such FPGA-based computing systems.
`We view FPGA-based computing systems as con­
`sisting of reprogrammable FPGAs, and reprogram­
`mable interconnects. A number of commercial de­
`vices can serve as field-programmable interconnects:
`Aptix FPlCs have a place and route architecture
`which is not a crossbar 18]. Routing delay
`is not
`entirely controllable and predictable as a
`result
`of its place-and-route architecture. Each FPIC
`chip has a maximum of 940 user programmable
`I/Os.
`IQ160 Each FP1D chip has a maximum of 160 user
`programmable I/Os. The heart of the IQ160 is a
`176 x 176 crossbar. Every port can be configured
`to connect to any port. Routing delay is entirely
`’Supported in p*rt by NSF Grant MIP-9196276 Si MICRO
`MICRC?POrted iU P,k^, b> NSF PY' Gr<UU MIp-8896276 4
`0-818S-3890-7/93 *03.00 © 1993 IEEE
`
`132
`
`controllable and predictable as a result of this
`crossbar architecture (9].
`TI crossbar The Texas Instrument SN74ACT8841
`is a 16 x 16 crossbar with 4-bit word length.
`The crossbar is organized as 16 ports of 4-
`bit each. Routing delay is entirely controllable
`and predictable as a result of this crossbar
`architecture. Each crossbar has a maximum of
`128 user programmable I/Os.
`FPGAs have a place and route architecture. Rout­
`ing delay is not entirely controllable and pre­
`dictable as a result of this place-and-route ar­
`chitecture [10]. Each FPGA typically has a
`maximum of 200 user programmable I/Os.
`Function-wise, the FPGAs and Aptix chips can be
`conceived of as devices capable of performing as
`crossbars under many circumstances (ignoring their
`unpredictable
`delays),
`This will be the view of
`this paper and
`we shs
`all use switches and crossbars
`interchangeably in our discussion.
`While all of the devices mentioned above can serve
`as programmable interconnects to provide flexible
`interconnection between computing elements in an
`FPGA-based computing system, the number of inter­
`connections required may easily exceed the number of
`user programmable I/Os on a single device. We need
`to consider building a larger interconnection structure
`from these basic device packages which will be viewed
`as small crossbars. In this regard, the problem of
`building large crossbars from smaller crossbars is
`well studied in the context of telephone switching
`networks. Clos and Bcnes networks are representative
`of rearrangeable networks made up of small crossbars.
`We shall review the basic terminology, concepts and
`known results related to Clos networks in Section 11.
`The notion of foldcd-Clos networks is introduced
`to facilitate the application of known results and
`a control algorithm to configure the crossbars in
`an FPGA-based system consisting of multiple small
`crossbars. This is one of the contributions of this
`paper. Then, we show that the fact that FPGAs have
`dual personality (both as programmable logic and
`programmable interconnects) which can be exploited
`to enable system tradeoffs in the design of an
`FPGA-based system using both FPGAs and field-
`programmable interconnect. The issues of speed,
`rearrangeability and routability are considered in
`Section IV.
`
`IPR2018-01600
`
`EXHIBIT
`2064
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2078, p. 1
`
`

`

`middle switches output switches
`r = 4
`---------
`
`X =
`XE
`X
`X
`
`i
`
`* 4
`
`X
`X
`
`input switches
`m = 3
`
`3
`
`- {EiX:
`X
`EX:
`EX
`
`Figure 1: A three-stage Clos network C(3,3,4).
`
`II Rearrangeable CLOS switching
`networks
`We review some basic terminology and concepts
`related to Clos networks in this section.
`A connecting network is an arrangement of switches
`and interconnects allowing a set of input terminals to
`be connected to a set of output terminals in various
`combinations [11, 12]. All connections are point-
`to-point connections. An assignment for a given
`network is a list of input/output pairs which are
`to be connected; each terminal appears in at most
`one pair. An assignment is realizable if there exist
`disjoint paths in the network connecting all pairs
`of input^/output terminals in the assignment. A
`network is rearrangeable if any assignment is realiz­
`able. Nonblocking networks have the property that in
`addition to being rearrangeable, any connection can
`be established between any idle pair of input/output
`terminals without rearrangements.
`Three-stage symmetrical Clos network consists of
`two symmetrical outer stages of r n x m switches and
`an middle stage of m r x r switches. The network
`is completely characterized by three parameters n, m
`and r, or C(n,m,r) collectively. A C(3,3,4) Clos
`network is shown in Figure 1. The Slepian-Duguid
`theorem states that a 3-stage Clos network C(n, m, r)
`is rearrangeable if and only if m > n [11, 13]. For our
`purpose, we assume n = m, hence all switches arc
`square and our Clos networks are rearrangeable.
`1
`Clos shows that a 3-stage Clos network C(», m, r
`is nonblocking in the strict sense if m > 2n —
`[14]. Clos also illustrates that it is possible to
`apply the principle recursively to form multi-stage
`rearrangeable or nonblocking networks. Clos assumes
`that all connections are point-to-point connections.
`In Section V, we shall consider an extension to multi­
`pin connections.
`Although rearrangeable networks generally use
`
`153
`
`fewer switches, they require careful and sometimes
`hard routing ^ontrol algorithms. On the other hand,
`non-blocking, networks require less demanding or no
`control algorithm [15]. The next section presents
`a known control algorithm for rearrangeable Clos
`networks based on bipartite matching [12j.
`II-A A control algorithm for rearrangeable
`Clos networks
`Given an assignment A, a bipartite graph G(A) =
`can be constructed, where the vertex set
`<,VUV2,E)
`V\ consists of the input switches, and the vertex set
`Vi consists of the output switches. There is an edge
`between switch S 6 Vj and T 6 Vi for each pair of
`input/output terminals in the assignment where the
`input terminal belongs to S and the output terminal
`belongs to T. The edges of this bipartite graph can
`always be partitioned into m disjoint matchings, and
`then each matching provides the at most r pairs
`of input/output terminals which can be realized by
`a single r x r switch in the middle stage of the
`three-stage Clos network. Thus, the problem of
`setting up the switches in a three-stage rearrangeable
`Clos network C(n, m, r) can be formulated as edge­
`coloring with m colors (or equivalently partitioning
`the edges into m matchings) for the bipartite graph
`G(-4) where edges incident to the same vertex must
`be of distinct colors [12]. This problem can be solved
`in polynomial time [16].
`Ill Architecture of FPGA-based
`systems
`With the basics defined in the previous sections, we
`are now in a position to discuss FPGA-based systems
`consisting of the following components:
`FPGAs: which can serve as computing elements,
`programmable interconnect or both. Figure 2
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2078, p. 2
`
`

`

`I/O pads
`
`ni = 3
`
`LOGIC
`
`ar®
`i®M SI
`
`Figure 2: An FPGA modeled as consisting of programmable logic and switches connecting the nets to I/O pads.
`
`Rl
`
`Routing chip
`
`"1
`
`------- 1
`i
`
`I
`
`h
`
`I-------
`i
`
`r ~
`I
`
`*“1
`i
`
`i--------
`
`L
`
`u
`
`L
`
`R2
`
`____ i
`
`—i
`
`J
`
`J
`
`LOGIC
`
`FPGA
`
`LOGIC
`
`FPGA
`
`Routing chip
`Figure 3: The BORG prototyping board viewed as as a three-stage C(2,2,4) Clos rearrangeable network.
`
`illustrates the last case in which an FPGA is
`viewed as a device consisting of programmable
`logic and a number of m x nj “switches”
`connecting the logic to the programmable I/O
`pads. The «i x nj “switches” are realized by
`igning the nets among the ti\
`rerouting or reassi
`pads. Th
`e size o
`f these switches (the amount
`
`of routing resources used to implement them)
`affects the routability of the FPGAs and hence
`the amount of logic that can be implemented on
`the device.
`The software implication of this model is that
`we do not lock the pins to conform with board-
`level constraints. This is advantageous since
`
`154
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2078, p. 3
`
`

`

`outer stage
`
`middle stage
`
`X
`
`computing
`element
`
`computing
`element
`
`X
`
`<x
`
`computing
`element
`
`computing
`
`x
`X
`element X
`Figure 4: A folded-Clos network with n = m = 3, and r = 4.
`
`reassigning or locking pins generally affects the
`routability of the FPGAs [10]. Instead, we allow
`the automatic placement and routing tool to
`choose freely the pin assignment. Afterward,
`one more routing step is required to use the
`“switches” to reroute the nets to conform with
`the board-level constraints.
`Memory elements serve as local memories to the
`computing elements.
`Host interface serves to communicate the FPGA-
`based computing system with the host machine
`to which the FPGA-based system is attached.
`Programmable routine whose sole purpose is to
`provide programmable interconnect amongst the
`computing elements and/or memory elements.
`We assume that the number of interconnec­
`tions required exceeds the number of user pro­
`grammable I/Os on a single programmable rout­
`ing device. The major issue is then the prob­
`lem of building a large switch from the smaller
`switches.
`We shall defer until Section VI the discussion of the
`impact of introducing other devices such as memory
`elements and the host interface to the system. So
`until then, we assume that there are only FPGAs
`and programmable interconnect in the system. As
`an example, consider the small programmable proto­
`typing board BORG [6] which has two routing chips;
`it can be viewed as a three-stage rcarrangeablc Clos
`network. The FPGAs serve the dual purpose of
`programmable logic and programmable interconnec­
`tion. The switches inside the FPGAs, provided by
`
`155
`
`interchanging the nets among the programmable I/O
`pads, form the outer stages of the Clos network.
`The middle stage is realized by the FPGAs R1 and
`R2 which serve as routing chips, as illustrated in
`Figure 3. The fact that the I/O pads of the FPGA
`are connected to the routing chips in an alternating
`fashion makes the system depicted in Figure 3 a three-
`stage rearrangeable Clos network with n = m = 2
`and r = 4. (The actual
`borg uses Xilinx XC3030s
`as routing chips resulting in a C(2,2,27) Clos rear­
`rangeable network.)
`The scheme shown in Figure 3 only works for
`m = 2. In next section, we shall discuss how to use
`Clos networks when there are more than two FPGAs,
`for m > 2.
`XII-A Folded-Clos networks
`The Clos network and control algorithm outlined
`in Section II has a definite notion of input and
`output terminals. Therefore, in order to apply the
`control algorithm discussed in Section II one must
`identify the two terminals of a net either as input
`or output (irrespective of signal flow). This notion
`does not necessarily exist in an FPGA-based system,
`in which the computing elements have pads which
`can be reconfigured to be either input, output or
`bidirectional pads. The numbers and distribution of
`inputs/outputs change from application to applica­
`tion.
`In this case it is most convenient to describe
`the programmable-routing component of an FPGA-
`based com;
`puling
`; system as a folded-Clos network,
`as depicted
`I in Fi
`igure 4. A three-stage folded-Clos
`network resembles a three-stage Clos network, but
`there is no distinction between input and output
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2078, p. 4
`
`

`

`outer stage
`nr-r.X
`
`r-z^zX
`
`FPGAi
`
`LOGIC
`
`FPGAj
`
`Xl'
`X
`zn
`LOGIC X
`X
`
`middle stage
`
`X
`
`X
`
`X
`
`Figure 5: A folded-Clos network using the FPGAs as both computing element and programmable interconnect.
`
`stages. We simply call the stage which connects
`directly to the computing elements the “outer” stage;
`it consists of r pairs of m x m switches. The “middle"
`stage consists of m r x r crossbars.
`
`III-B A control algorithm for folded-Clos
`networks
`The control algorithm for the folded-Clos network
`depicted in Figure 4 can be adapted from the control
`algorithm for the three-stage Clos network depicted
`in Figure 1 as follows. First, directions are assigned
`to each of the nets so that each pair of m x m switches
`has m in-terminals and m out-terminals. To do this,
`we form a net-graph from the required connections,
`where each node corresponds to a group of 2m pins,
`and an edge is added for each net between the two
`nodes(groups) to which its two pins belong. We then
`traverse this net-graph visiting each edge once to
`partition the edges into edge-disjoint circuits (a node
`may appear more than once). We assign a direction
`to the edges as we traverse the net-graph. After the
`traversal, each node will have m in-edges and m out-
`edges.
`Once the 2m edges
`have been divided, we can
`unfolded the foled-Clos
`network by separating the
`2m terminals into two m
`a x m switches and obtain
`a network identical to the three-stage Clos network.
`The Clos network control algorithm can be used
`to configure the m x m switches to realize any
`assignment. Notice that the direction assigned to
`an edge is purely for the purpose of using the edge-
`colorability control algorithm described in Section II-
`A, the direction is unrelated to the actual signal flow.
`
`IV Architectural tradeoffs: FPGAs as
`both computing element and
`programmable interconnect
`In this section, we examine the architectural tradeoffs
`involved in the organization of an FPGA-based
`computing systems. The issues
`are;
`1. rearrangeability
`2. routability, and
`3. speed.
`Any point-to-point connection realized by the
`architecture presented in Figure 4 has the delay
`of 3 switches. It is possible to trade routability
`with speed by exploiting the dual personality of
`FPGAs. Figure 5 shows a system resembling a
`folded-Clos network with the FPGAs serving as both
`programmable interconnect and computing .1......... .
`elements.
`It is not difficult to show the network depicted in
`Figure 5 satisfies the Slepian-Duguid rearrangeability
`criterion (see Section II) providing the algorithm
`outlined in Section Ill-B has been used to identify
`the input and output stages. We eliminate two stages
`of switch delay in Figure 5 but risk the danger of
`having a non-rearrangeable network. This possibility
`is illustrated as follows. Assume that we control
`the utilization of the FPGAs such that they can be
`modeled as shown in Figure 2 with m = 3, r = 4.
`suppose that the utilizations of the FPGAs
`Now
`increase such that n = m decreases from 3 to 2 while
`the external connections remain unchanged, as shown
`in Figure 6. Keep in mind that the interconnections
`between the outer stage and the middle stage are
`fixed. It is not hard to see that the network now is
`
`156
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2078, p. 5
`
`

`

`outer stage
`
`middle stage
`
`•m
`
`d,
`B,
`
`A? —
`b2 —S3
`
`;x
`
`)x
`
`'X
`
`Figure 6: An example to illustrate the effect of increase in FPGA utilization on rearrangeability.
`
`a non-Clos non-rearrangeable network. For example,
`the connections A\ —< Ai and B\ — Bj cannot be
`realized simultaneously.
`From an architectural standpoint, keeping the size
`of the outer switches, m, small (say 3) enhances the
`routability (and hence utilization) of the FPGAs. On
`the other hand rearrangeability requires that m be
`the number of switches in the middle stage of the
`folded-Clos network. For a given FPGA and crossbar
`device packages, this would determine the proper
`configuration of the FPGA-based system. If there
`are a total of y user I/Os per FPGA, a total of z
`user I/Os per crossbar, and F FPGAs as computing
`elements in the system, then since the total number
`of I/Os is roughly conserved (in the absence of “other
`devices”), we have
`xm
`(1)
`Fy
`m[xly\.
`F
`(2)
`For example, IQ160 FP1D has 160 user-l/Os per
`chip, i.e., z = 160. Given that m = 3, if we
`are using 80-user-I/O FPGAs, then F can be at
`most 6 in order to maintain rearrangeability. As
`another example, Aptix has 940 user-I/Os per chip.
`Given that m = 3, P can be at most 33 in order
`to maintain rearrangeability. Incorporating more
`FPGAs while maintaining rearrangeability requires
`increasing m beyond 3. As more FPGAs are added,
`the additional resources consumed by the “outer”
`switches will result in a diminishing logic capacity per
`FPGA. Conceptually, m should not reach the point
`where FPGA utilization is the same independent of
`whether the I/O pins were locked or not.
`
`V Considerations for multi-pin nets.
`The development so far had assumed that all connec­
`tions are point-to-point connections. However in an
`FPGA-based system, if a signal is the source of mul­
`tiple loads at different FPGAs, it is only reasonable
`to assume (with pin-limited packages) that this signal
`has only one source pin originating from one FPGA,
`and it fanouts in a miidh crossbar to reach different
`destinations. Figure 7 illustrates a net N\ with 3 pins
`fanout inside a crossbar.
`Given this assumption, the problem of configuring
`the crossbars to realize a given set of connections has
`no known solution. The closest related work in this
`area is for configuring non-blocking networks with
`definite signal direction (17). Our folded-Clos network
`doesn't have any definite signal flow direction and we
`only require rearrangeability.
`We can formulate the problem of configuring
`the crossbars to realize a given set of multi-pin
`connections in the following manner. Let
`• P be the set of all pins (terminals)
`• Oj be the set of pins (terminals) on outer
`switch j of the folded-Clos network
`« Al, be a net, Ni = (P,,i, P,,j,. •., P,.,.) C P
`where each Pjj is a pin, we have ((_!,_, A, C P).
`Since no two nets can share a pin,
`Vij (A/,nWj) = d
`(3)
`and only one pin per outer-stage switch for each
`net Ni is needed, we have
`v.-,/ |A,no,| < i.
`
`(4)
`
`167
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2078, p. 6
`
`

`

`outer stage
`
`middle stage
`
`FPGA
`
`I
`
`LOGIC
`
`W,
`
`FPGA2
`
`LOGIC
`
`fpga3
`
`.LOGIC
`
`Figure 7: A multipin-net N\ with 3 terminals.
`We assume that the m middle switches are capable
`of realizing any interconnection pattern among their
`R = 2r pins. The problem is solved by assigning
`the nets to the m switches (coloring the nets with m
`colors)
`/ : Nets —• (1,2,..., m)
`while avoiding contention among nets
`V,,i f(Ni) = f(N,) => Vy|Oj-n(A'iUA'i)| < I. (5)
`Constraint (5) ensures that each outer switch has at
`most one net of each color, that is, at most one net
`per middle switch. This constraint also ensures that
`each middle switch will not be assigned too many
`connections. That is,
`E/<*.)=)
`(6)
`The multi-pin control problem is more difficult
`than its point-to-point counterpart. To illustrate,
`for example consider the following seven nets to be
`connected in the network as shown in Figure 8.
`A, = (P,.P,2,
`= (Pr, Pr), N, = (Ps, P9, Pis}.
`N, = (P., P», Pi.), Ns = {Ps.Pio. Pis). Ns = (P., P„),
`A'7 = (P,,,P„)
`A contention graph Q can be constructed according
`to incompatibility among the nets. The vertex set V
`in 0 arc the nets, there is an edge joining Uj and
`Vj if net Ni and Nj cannot be in the same middle
`switch. The contention graph for our example is
`shown on the left in Figure 9. Since there are m = 3
`
`middle switches, a feasible 3-coloring of the vertices
`of the contention graph should provide a valid routing
`connection. This is, however, not possible in this
`example. Hence the folded-Clos network is no longer
`rearrangcable when multi-pin nets are allowed.
`Note that in this particular example allowing
`slightly more flexibility in reassigning of the nets
`would admit a solution. If we swap the nets on
`pins P9 and Pio (which is the same as setting
`Ws = {P3.Px0.Pn} and Nr, = {Ps, P9, f’is}) the
`new contention graph shown on the right in Figui
`re 9
`is 3-colorabIe. Since rearrangeability is no longer
`guaranteed, it may be necessary to consider larger
`“outer" switches while limiting the amount of internal
`rerouting of nets. Another possibility is to
`consider
`the m different alignments of the “outer”
`switches
`with respect to the FPGA pi--
`pins.
`On the other hand non-blocking n
`blocking networks offer a
`guarantee, but require a considerable
`: increase in the
`number of middle switches m. For instance, the best
`widc-sense non-blocking network with broadcasting
`capability requires
`mi > (n - l)logP + 2(n - 1)
`(7)
`for a system with F FPGAs where n is the width
`of each switch in the outer stage [17]. Whereas a
`rearrangeable network only requires
`mj = ri
`(8)
`for point-to-point connections. Given n = 3, and
`F = 16, mi is 12 but mj is 3.
`VI How should other devices be
`connected in the system ?
`Finally we consider the impact of the placement of
`other elements (memory, host interface) in an FPGA-
`
`V,
`
`153
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2078, p. 7
`
`

`

`outer stage
`Pi
`Pi
`
`middle stage
`
`X
`
`X
`
`X
`
`FPGAj
`
`LOGIC
`
`p6
`
`X Pa
`p9
`
`FPGAj
`
`LOGIC
`
`Pio
`
`P,2
`
`Xp5
`Pi X
`XPi,
`P,3 X
`XP,7
`
`\ Pm
`Pi s
`P,6
`
`LOGIC
`
`P,8
`
`Figure 8: A folded-Clos network.
`
`N,
`
`Nj
`
`W3
`
`We
`
`W5
`
`Ae
`
`W7
`
`Ns
`
`We
`
`w.
`
`FPGA3
`
`N,
`
`/Vj
`
`Ns
`
`W4
`
`if p9 e w5
`if P9 6 w3
`if P,o e w3
`if Pio G Ws
`Figure 9: Contention graphs for the given connections.
`
`169
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2078, p. 8
`
`

`

`RAM
`
`LOOIC
`
`RAM
`
`>oaic
`
`I
`
`i
`
`'X
`
`X
`
`X
`
`Figure 10: Direct connection between the RAMs and the FPGAs in a folded-Clos network.
`
`LOGIC
`
`LOCK
`
`X
`
`X lalerUc
`
`X
`
`Figure 11: Connection between the Host Interface and the FPGAs in a folded-Clos network.
`
`X
`X
`
`X
`
`Xi
`
`XX
`
`FPGA
`
`LOGIC
`
`FPGA
`
`LOGIC X
`X!
`RAM NX
`
`Figure 12: Connection between the RAMs and the FPGAs in a folded-Clos network.
`
`160
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2078, p. 9
`
`

`

`[5] C. E. Cox and W. E. Blanz, “GANGLION-
`A Fast Field-Programmable Gate Array Imple­
`mentation of a Connectionist Classifier," IEEE
`Journal on Solid-Slate Circuits, vol. 27, pp. 288-
`299, March 1992.
`[6] P. K. Chan, M. Schlag, and M. Martin, “BORG:
`A reconfigurable prototyping board using Field-
`Programmable Gate Arrays,” in Proceedings
`of 1“ International ACM/SIGDA Workshop
`on Field-Programmable Gate Arrays, (Berkeley,
`California, USA), pp. 47-51, Feb. 1992.
`[7] D. A. Thomae, T. A. Petersen, and D. E. Van
`den Bout, “The Anyboard rapid prototyping
`environment,” in Advanced Research in VLSI,
`Proceedings of the 1991 UC Santa Cm Confer­
`ence, (Santa Cruz, CA), pp. 356-370, 1991.
`[8] Aptix Corp. 225 Charcot Avenue, San Jose, CA
`95131.
`[9] I-Cube Design Systems. 2328-C Walsh Avenue,
`Santa Clara, CA 95051.
`[10] XILINX: The Programmable Gate Array Data
`Book. 2100 Logic Drive, San Jose, CA 95124,
`1992.
`[11] V. E. Benes, Mathematical Theory of Connecting
`Networks and Telephone Traffic. Academic
`Press, 1965.
`[12] F. K. Hwang, “Control Algorithms for Rear-
`rangeable Clos Networks," IEEE Transactions
`on Communications, vol. COM-31, pp. 952-954,
`Aug. 1983.
`[13] D. Slepian, “Two Theorems on a Particu­
`lar Crossbar Switching Network.” Unpublished
`Memo, 1952.
`[14] C. Clos, “A study of non-blocking switching net­
`works,” Bell System Technical Journal, vol. 32,
`pp. 406-424, Mar. 1953.
`[15] A. Varma and C. S. Raghavendra, Interconnec­
`tion Networks for Multiprocessors and Multicom­
`puters: Theory and Practice. IEEE Press, 1993.
`[16] H. N. Gabow and O. Kariv, “Algorithms for
`edge coloring bipartite graphs and multigraphs,"
`SIAM Journal of Computing, vol, II, no. 1,
`pp. 117-129, 1982.
`[17] Y. Yang and G. M. Masson, “Nonblocking
`Broadcast Switching Networks," IEEE Transac­
`tions on Computers, vol. C-40, pp. 1005-1015,
`Sept. 1991.
`
`based computing system connected as a folded-Clos
`network. We assume that each memory element
`serves as local memory to the corresponding FPGA.
`There are several possibilities:
`1. Attach the other element (such as memory
`element) directly to the FPGA, as illustrated in
`Figure 10. This involves locking some of the pads
`of the FPGA, and may impair the routability of
`the FPGA. This scheme has the advantage of
`incurring no additional memory access time.
`2. Attach the other element to the middle stage of
`the folded-Clos network, as illustrated in Figure
`11. This scheme incurs higher (one switch delay)
`access delay to the elements, and creates forcea-
`(6]. That is, some nets must go through
`nets
`inc switches. In this case, the control
`speci
`problem is NP-complete, even for m = 2. Yet
`in our experience with our small prototyping
`we have not encountered an
`system BORG [6]
`assignment which could
`3. Treat the other elements as computing elements,
`as illustrated in Figure 12. This scheme incurs
`the largest additional access delay while enabling
`the use of a known polynomial time control
`algorithm. It will also increase the size of
`outer switches to maintain rearrangeability of
`the network.
`VII Conclusions
`We have addressed architectural issues involved in the
`design of an FPGA-based computing system. Some
`of the principles discussed in this paper are being
`applied in the design of a FPGA-based computing
`system at University of California at Santa Cruz. We
`are using the Xilinx XC4010S as the FPG As and IQ160
`as programmable interconnects.
`Acknowledgment
`The authors would like to acknowledge Professor
`Anujan Varma for his discussions on Clos networks.
`The authors are grateful for the generous support of
`Xilinx Inc. and 1-Cube Design Systems.
`
`References
`[1] M. Gokhale, W. Holmes, A. Kopser, S. Lucas,
`R. Minnich, D. Sweely, and D. Lopresti, “Build­
`ing and using a highly parallel programmable
`logic array,” Computer, vol. 24, pp. 81-89, Jan.
`1991.
`[2] M. Butts, J. Batcheller, and J. Varghese, An
`Efficient Logic Emulation System, in Proceed­
`ings ICCD 92, pg. 138-141, Cambridge MA, Oct,
`1992.
`[3] P. Berlin, D. Roncin, and J. Vuillemin, “Pro-
`ble Active Memories: a Performance
`gramma
`Assessment,” in li! International ACM/SIGDA
`Workshop on Field Programmable Gate Arrays,
`(Berkeley, California), pp. 57-59, February 1992.
`[4] Quickturn Systems Inc. 325 East Middlefield
`Road, Mountain View, CA 94043, 1991.
`
`161
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2078, p. 10
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket