`Field-Programmable-Device-Based Computing Systems
`
`[Ch893]
`
`Pak K. Chau'and Martino DF. Schlagl
`
`Computer Engineering
`University of California, Santa Cruz
`Santa. Cruz, California 95064 USA
`
`Reprogrammable Field-Programmable Gate Ar-
`rays (FPGAs) have enabled the realization of high-
`performance and affordable reconfigurable comput-
`ing engines. We examine the architectural tradeofis
`involved in dsigning eneral purpose FPGA-based
`computin systems wit
`field-programmable ate ar-
`rays and old-programmable interconnects.
`be fact
`that FPGAs provide both programmable logic and
`programmable interconnects raises numerous design
`issues that need to be considered with care. Factors
`that influence the tradeofis are mutability, rearrange-
`ability, and speed
`
`I FPGA-hased computing systems
`Reprogrammahle FPGAs form the basis of high-
`performauce and afl'ordablc reconfigurable comput—
`ing‘ prototyping and emulation systems [1, 2, 3. It, 5.
`6, 1']. Interconnections schemes in these systems vary
`quite a, hit. In SPLASH, PAM, QUICKTURN GANGLIDN
`and ANYBOARD 1, 3, 4, 5, 7], the FPGAs are directly
`connected to es
`other in a fixed pattern. In Bonn
`and REALIZER [6,
`'2},
`logic and interconnection are
`separate. The REALIZER uses the partial crossbar
`interconnection architecture to connect the FPGAs,
`while the sons has complete interconnection.
`in
`this paper, We shall investigate the design of inter
`connection architectures using multiple crossbar-s and
`discuss the architectural
`tradeofi's in the design of
`such FPGA-based computing systems.
`We view FPGA—based computing systems as con-
`sisting of reprogrammeble FPGAs, and reprogram-
`mable interconnects. A number of commercial de-
`vices can serve as field-programmable interconnects:
`Aptix FPle have a place and route architecture
`which is not a crossbar As]. Routin delay is not
`entirely controllable an
`predictable as a result
`of its place-and-route architecture. Each FPlC
`chi has a maximum of 940 user programmable
`if s.
`
`IQlfifl Each FPID chip has a maximum of160 user
`programmable I/Os. The heart of the [(2160 is a
`”'6 x 176 crossbar. Every port can be configured
`to connect to any port, Routing delay is entirely
`'Supported in part by NSF Grant MIP-SIQSITG {c MICRO
`Isu ported in put by NSF FYI Grant MIP—sssszrs 5:
`Mining
`
`O‘SIEG-SEQO-ZBB $08.00 6 1993 [BEE
`
`152
`
`controllable and predictable as a result of this
`crossbar architecture [9].
`TI crossbar The Texas Instrument SNNACTBB“
`is a 16 x 16 crwebar with 4-bit word lcn th.
`The crossbar
`is or anized as
`16 ports I:
`4-
`bit each. Routing clay is entirely controllable
`and predictable as a result of this crossbar
`architecture. Each crossbar has a maximum of
`128 user programmable I/Os.
`
`FP GAS have a place and route architecture. Rout-
`ing delay is not entirely controllable and pro:
`dictabie as a result of thisGplawaud-route ar-
`chitecture [10].
`Each FP A typically has a.
`maximum of 200 user programmable l/Oa.
`Function-wise, the FPGAs and Aptix chips can be
`conceived of as devices capable of
`eri'orrnin
`as
`crossbars under many circumstances
`ignoring fiieir
`unpredictable deiaysg.
`This will be the view of
`this paper and we a all use switches and crossbar:
`interchangeably in our discussion.
`While all of the devices mentioned above can serve
`as programmable interconnects to provide flexible
`interconnection between computing elements in an
`FPGA-based computing system, the number of inter-
`connections required may easily exceed the number of
`user pro rammable 1/05 on asingle device. We need
`to considger building a larger interconnection structure
`from these basic device packages which wiIl be viewed
`3 smail crossbars.
`In this regard,
`the problem of
`building large crosabars from smaller crossbars is
`well studied in the context of telephone switching
`networks. Gina and Benes networks are representative
`of rearrangeable networks made up of small crossbars.
`We shall review the basic terminolo y, concepts and
`known results related to Cios networ s in Section II.
`The notion of folded-Clo: networks is introduced
`to facilitate the application of known rwults and
`a control algorithm to configure the crowbars in
`an FPGA-based system consisting of multiple small
`crocshars. This is one of the contributions of this
`paper. Then, we show that the fact that FPGAs have
`dual personality (both as programmable logic and
`programmable interconnects) which can be exploited
`to enable system tradeolfs
`in the design of an
`FPGA-based system using both FPGAs and field-
`programmable interconnect. The issues of speed,
`rearrangeability and mutability are considered in
`Section IV.
`
`|PFl2018-01594
`
`EXHIBIT
`
`2054
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2071, p. 1
`
`IPR2018-01594
`
`EXHIBIT
`2054
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2071, p. 1
`
`
`
`input switches
`
`middle switches
`
`output switches
`
`
`
`Figure l: A three-stage (1106 network C(3.3,4).
`
`II Rearrangeabie CLOS switching
`network:
`
`We review some basic terminology and concepts
`related to Clos networks in this section.
`A connecting network is an arrangement of switches
`and interconnects allowing a set of input terminals to
`be connected to a set of output terminals in various
`combinations [[1, 12}. All connections are point-
`to-poinl. connections. An assignment for a given
`network is a list of input/output pairs which are
`to be connected; eanh terminal appears in at most
`one pair. An assignment is renlr'zcfile if there exist
`disjomt paths in the network connecting all pairs
`of input/output
`terminals in the assignment.
`A
`network is rearranged“: if any assignment is realis-
`ahle. Nontlach‘ng networks have the property that in
`addition to being rearrangeable, any connection can
`he established between any idle pair of input/output
`terminals without rearrangements.
`Three-stage symmetrical Clot network consists of
`two symmetrical outer stagfi of r n x m switches and
`an middle sta e oi m r x r switches. The network
`is completely ciaracterized by three parameters in, m
`and r, or C(n,m,r] collectively. A 00.3.4) Clos
`network is shown in Figure l. The Slepian-Duguid
`theorem states that a 3-stage (3105 network C(n, m, r)
`is recrrcngeetle if and only if n: 2 n [11, 13]. For our
`purpose, we assume n = m, hence all smtches are
`square and our Clos networks are rearrangeahle.
`Cine shows that a 3-stage Clos network C(u, m, r?
`is nonblacting in the sir-tel sense if m 2 2n -
`[1-1].
`Clos also illustrates that
`it
`is possible to
`apply the principle recursively to form multi-stage
`reerrangeable or nonblocking networks. Clea assumes
`that all connections are point-to-point connections.
`In Section V, we shall consider an extension to multi-
`pin connections.
`Although rearrangeable networks generally use
`
`153
`
`require careful and sometimes
`fewer switches, the
`hard routing
`ntro algorithms. On the other hand,
`non-blocking; networks require less demanding or no
`control algorithm [15]. The next section presents
`a known control al orithrn for rearran able Clos
`networks based on bipartite matching [125i2
`III-A A control algorithm for rearrangeable
`Clos networks
`Given an assignment .4, a bipartite graph C(A) =
`V], V2. E) can be constructed, where the Vertex net
`| consists of the input switches, and the vertex set
`V; consists of the output switches. There is an edge
`between switch 3 E V. and T E V; for each pair of
`inputfoutput terminals in the assignment where the
`input terminal belon s to S and the output terminal
`belongs to ‘1". The
`ges of this bipartite graph can
`always be partitioned into In disjoint matchings. and
`then each matching provides the at most
`:- pairs
`of input/output terminals which can he realised by
`a single 1- x I" switch in the middle stage of the
`three-stage Clos network.
`Thus,
`the problem of
`setting up the switches in a three—stage rearrangeable
`Clos network C(n,m.r) can be formulated as edge-
`coloring with m colors (or equivalently partitioning
`the edges into m matchings] for the bipartite graph
`C(11) where edges incident to the same vertex must
`be of distinct colors [12]. This problem can be solved
`in polynomial time {16].
`III Architecture of FPGA-based
`systems
`With the basics defined in the previous sections, we
`are now in a position to discuss FPGAAbased systems
`consisting of the following components:
`FPGAa: which can serve as computing elements,
`programmable interconnect or both. Figure ‘2
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2071, p. 2
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2071, p. 2
`
`
`
`
`
`Routing chip
`
`
`
`
`Routing chip
`
`Figure 3: The Bone prototyping board viewed as as a three-stage C(2,2.4) Cios rearrangeahie network,
`
`illustrates the last case in which an FPGA is
`viewed as a device consisting of programmable
`logic and a number of 111 x n,
`“switches"
`connecting the logic to the programmable 1/0
`pads. The In x m “switches” are realized by
`reroutin or reassigning the nets among the n;
`pads.
`he size of these switches {the amount
`
`of routing resources used to im iement them)
`afiecm the mutability of the PP As and hence
`the amount of logic that can be implemented on
`the device.
`
`is that
`The software implication of this model
`we do not lock the pins to conform with board-
`Ievel constraints. This is advantageous since
`
`154
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2071, p. 3
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2071, p. 3
`
`
`
`outer stage
`
`middle stage
`
`cam pucing
`element
`
`element
`
`computing
`clement
`
`computing
`element
`
`computing
`
`Figure 4: A folded-Clos network with n = m = 3. and r = 4.
`
`reassigning or lacking} pins encrally affects the
`mutability of the FP As [10 . Instead. we allow
`the automatic placement and routing tool
`to
`choose freely the pin assignment. Afterward,
`one more routing step is required to use the
`“switches” to reroute the nets to conform with
`the board-level constraints.
`
`Memory elements serve as local memories to the
`computing elements.
`
`Host interface serves to communicate the FPGA-
`based computing system with the host machine
`to which the FPGAvbased system is attached.
`
`Programmable routin whose sole purpose is to
`provide programmab e interconnect amongst the
`computing elements and/or memoryr eiements.
`We assume that
`the number of interconnec-
`tions required exceeds the number of user pro
`grammable l/Os on a. single programmable routv
`1ng device. The major Issue is then the prob-
`lem of building a large switch from the smaller
`switches.
`
`We shall defer until Section VI the discussion of the
`impact of introducing other devices such as memoryr
`elements and the host interface to the system. So
`until
`then. we assume that there are only FFGAs
`and programmable interconnect in the system. As
`an example, consider the small programmable proto-
`typing board none [6] which has two routing chi
`s;
`it can be. vieWed as a threestage rearranges le Cos
`network. The FPGAs serve the dual purpose of
`programmable logic and programmable interconnccA
`tion. The switches inside the FPGAs, provided by
`
`155
`
`interchanging the nets among the programmable ifO
`pads,
`form the outer stages of the 0105 network.
`The middle stage is realized by the FPGAs R1 and
`R2 which serve as routing chi s, as illustrated in
`Figure 3. The fact that the l/
`pads of the FPGA
`are connected to the routing chips in an alternating
`fashion makes the system depicted in Figure 3 a. three-
`sta e rearrangeabie Clos network with n = m = 2
`r = 4.
`[ he actual aono uses Xilinx XC30305
`as routing chips resulting in a 612,127) Clo: rear-
`rangeahle network.)
`The scheme shown in Fi ure 3 only works for
`m = 2.
`in next section. we a all discuss how to use
`Clos networks When there are more than two FPGAs,
`for m > 2.
`
`III-A Folded-(2105 networks
`The Clo: network and control algorithm outlined
`in Section ll has a definite notion of input and
`output terminals. Therefore,
`in order to apply the
`control algorithm discussed in Section [I one must
`identify the two terminals of a. net either as input
`or output (irrespective of signal flow). This notion
`does not necessarily exist in an FPGA-baeed system.
`in which the computing elements have pads which
`can be reconfigured to be either input, output or
`bidirectional pads. The numbers and distribution of
`
`inputs/outputs change from application to applica-tron.
`in this case it
`is most convenient
`to describe
`the pragmmmnlrlc-rosting component of an FFGA-
`based computin
`system as a folded-Clea network.
`as depicted in
`igure 4. A three—stage folded-Clea
`network resembles a three-stage 6105 network. but
`there is no distinction between input and output
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2071, p. 4
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2071, p. 4
`
`
`
`outer stage
`
`middle stage
`
`FPGA:
`
`FFGA:
`
`Figure 5; A folded-Clue network using the FPGAs as both computing element and programmable interconnect.
`
`the stage which connects
`stages. We simply call
`directly to the computing elements the “outer“ sta e;
`it consists of :- palrs ofrn x m switches. The “midd e"
`stage consists of m r x r crowhars.
`
`III-B A control algorithm for folded-0105
`networks
`
`The control algorithm for the folded-Clos network
`depicted in Figure 4 can be adapted from the control
`algorithm for the three-stage Clos network depicted
`in Figure l as folltms. First, directions are assigned
`to each of the nets so that each pair ofrnx m switches
`has run iii-terminals and m out-terminals. To do this,
`we form a net-graph from the required connections,
`where each node come ends to a. group of 2m pins.
`and an edge is added or each net between the two
`nodes{groups) to which its two pins belong, We then
`traverse this net-graph visiting each edge once to
`partition the edges into edge-disjoint circuits (a node
`may appear more than once). We assign a direction
`to the edges as we traverse the net-graph. After the
`traversal, each node will have m in-edges and :71 out-
`edges.
`Once the 2911 ed es have been divided, we can
`unfolded the faded Jos network by separatin
`the
`2m terminals into two in: x m switches and o tain
`a network identical to the three-stage Clos network.
`The C105 network control algorithm can be used
`to configure the m x m switches to realize any
`assignment. Notice that
`the direction assigned to
`an ed e is purely for the urpose of using the edge-
`colora ility control algorithm described in Section Ii—
`A. the direction is unrelated to the actual signal flow.
`
`156
`
`IV Architectural tradeofi's: FPGAS as
`both computing element and
`programmable interconnect
`In this section, we examine the architectural tradeofis
`involved in the or anisation of an FPGA-hased
`computing systems.
`he issues are:
`l. rearrangenbility
`2. mutability, and
`3. speed.
`Any point—to—point connection realized by the
`architecture presented in Figure 4 has the delay
`of 3 switches
`It
`is possible to trade mutability
`with speed by exploiting the dual personality of
`FPGAs.
`Figure 5 shows a system resembling a
`folded-Clos network with the FPGAs serving as both
`programmable interconnect and computin elements.
`it is not difficult to show the network
`epicted in
`Figure 5 satisfies the SlepianADuguid rearrangenbility
`criterion (see Section II} providing the algorithm
`outlined in Section Ill—B has been used to identify
`the input and output stagts. We eliminate two stages
`of switch delay in Figure 5 but risk the danger of
`having a non-rearrangeahle network. This posstbility
`is illustrated as follows. Assume that we control
`the utilisation of the FPGAs such that they can be
`modeled as shown in Figure 2 with m = 3. r = 4_
`Now, sup one that the utilization: of the FPGAs
`increase an
`that n = m decreases from 3 to 2 while
`the external connections remain unchanged, as shown
`in Figure 5. Keep in mind that the interconnections
`between the outer stage and the middle stage are
`fixed.
`it is not hard to see that the network now is
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2071, p. 5
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2071, p. 5
`
`
`
`outer stage
`
`middle stage
`
`
`
`Figure 6: An example to illustrate the effect of increase in FPGA utilization on rearrangeability.
`
`a non-Clue oon-rearrangeable network. For example,
`the connections A:
`-—o A; and B; —' B; cannot be
`realized simultaneously.
`From an architectural stand oiat, keeping the size
`of the outer switches, m, small,(say 3) enhances the
`routahility (and hence utilization] ofthe FPGAs. 0n
`the number of switches in the mi die ate. e of the
`the other hand rearrangeability reauires that m be
`folded-0105 network. For a iven FPGA an crossbar
`device packages,
`this woucl determine the
`roper
`configuration .of the FPGA-based system
`there
`are a total of y user [/05 per FPGA. a total of .1:
`user I/Os per crossbar, and F FPGAs as computing
`elements in the system, then since the total number
`of 1/05 'm roughly conserved (in the absence of “other
`devices"), we have
`(1}
`Fy = Im
`(2}
`F = Mlzfvl‘
`lQlfiO FPID has 160 user-l/Os per
`For example,
`chip.
`i.e., e = 160. Given that m = 3‘
`if we
`are using SO-user-I/O FPGAS,
`then P can be at
`most 5 In order
`to maintain rearrangeability. As
`another example. Aptix has 940 userel/Os per chip.
`Given that m = 3. F can be at most 33 in order
`to maintain rearrangeability.
`Incorporating more
`FPGAs while maintaining rearran eahility requires
`increasing m beyond 3. As more P GAB are added.
`the additional resources consumed by the "outer"
`switches will result in adiminiahlngloglc capacity per
`FPGA. Conceptually, m should not reach the point
`where FPGA utilization is the same independent of
`whether the [/0 pins were locked or not.
`
`V Considerations for multi—pin nets.
`The development so far had assumed that all connec-
`tions are point-to-point connections. However in an
`FPGA-based system‘ ifa signal is the source of mul-
`tiple loads at different FPGAs, it is only reasonable
`to assume (with pin-limited packages that this si nal
`has onl one source pin originating tom one F? A.
`and it snout: in a middle crossbar to reach different
`destinations. Figure T illustrates a net N1 with 3 pins
`fanout inside a crossbar.
`Given this assumption. the problem of configuring
`the crossbar: to realize a given set of connections has
`no known solution. The clomt related work in this
`area is {or configuring non-blockin
`networks with
`definite signal direction [17]. Our fol ed-Clos network
`doesn’t have any definite signal flow direction and we
`only require rearrangeability.
`We can formulate the problem of confi urng
`the crossbars to realize a. given set of mu tI-pin
`connections in the following; manner, Let
`I P be the set of all pins (terminals)
`c 0,‘ be the set of
`ins (terminals) on outer
`switch 1‘ of the folde ~Clos network
`0 Ni be a BEL. Ni = {Pi.llFl-.=l ”apt-3.} Q P
`where each PM is a pinI we have (Lt-=1 N.- E P).
`Since no two nets can share a pin‘
`{3)
`Vu {NrnN§)=¢
`and only one pin per outer-stage switch for each
`net Ni is needed, we have
`Vt: We ”Oil 5 1-
`
`(4)
`
`151
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2071, p. 6
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2071, p. 6
`
`
`
`outer stage
`
`middle stage
`
`FPGA:
`
` FPGA.
`
`FFGA:
`
`Figure 1’: A multipin-net Ni with 3 terminals.
`
`We assume that the in middle switChe: are capable
`of realizing any interconnection pattern among their
`R = 2:- pins. The problem is solved by assigning
`the nets to the 1:11 switches (coloring the nets with :1:
`colors)
`
`I : Nets—v {I,2,...,m]
`while avoiding contention among nets
`
`f(M-J=I(Nr) =‘ VrlojnlN-UNrJlSI- (5)
`Vi}!
`Constraint (5) ensures that each outer switch has at
`most one net of each color, that isI at most one net
`per middle switch. This constraint also ensures that
`each middie switch will not be assigned too many
`connections. That is,
`
`v,- Z IMIsR.
`IENr)=.i
`
`(5}
`
`The multi- in control problem is more difficult
`than its poin to-‘poml. counterpart. To Illustrate,
`for exam l_e consi er the following seven nets to be
`connectedam the network as shown In Figure 8.
`N: = {Ph Pna,PII}.N3 = {PL Pr}. ”3 = {21,R,Pu}.
`N. = {P..P.,P..}.N. = {PhPmPu}. N. = {Paar}.
`Nr= {PHI-Pl?)
`
`A contention graph G can be constructed according
`to incompatibility among the nets. The vertex set V
`in G are the nets,
`there is an edge joining u. and
`u;
`if net N.- and NJ: cannot be in the same middle
`switch. The contention graph for our example is
`shown on the left in Figure 9. Since there are m = 3
`
`middle switches. a. feasible 3-coloring of the verticcs
`of the contention graph should provide a valid routing
`connection. This is, however, not possible in this
`example. Hence the foldechlas network is no longer
`rearrangeahle when multi-pin nets are allowed.
`Note that
`in this particular example allowing
`slightiy more flexibility in reassigning of the nets
`would admit a solution.
`If we swap the nets on
`pins P9 and Pm (which is
`the same as setting
`N3 = {P'3.‘nl:’]JJ:P|5%I and N5 = {35,139,}:13” the
`new contention grep shown on the right in Fi ure 9
`is 3-colorabie. Since rearrangeability is no ionger
`guaranteed,
`it mag!J be necessary to consider larger
`‘outer" switches w ile iimi ting the amount ofinternal
`reroutin of nets. Another possibility is to consider
`the m ifl'erent aii nrnents of the “outer” switches
`with respect to the PGA ins.
`On the other hand nomglockio networks ofi‘er a
`guarantee, but require a considerafile increase in the
`number of middle switches rn. For instance, the bust
`wide-sense nan-blachng network with broadcasting
`capability requires
`
`(T)
`m;>(n«-i)logF+‘2{n-l)
`for a system with F FPGAs where n is the width
`of each switch in the outer stage {1?}. Whereas a
`rearrangeable network only requires
`m; = r:
`
`(3)
`for point-topoint connections. Given n = :5, and
`F :16, 1'11] i512 but 1112 is 3
`
`VI How should other devices be
`connected in the system '3
`Finally We consider the impact of the placement of
`other elements (memory. host interface) in an FPGA-
`
`:58
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2071, p. 7
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2071, p. 7
`
`
`
`
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2071, p. 8
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2071, p. 8
`
`
`
`
`
`
`.“\;.4\\Q.2.».‘\.~.~\V”
`
`x4!3:5"fill
`fl.3?) fr
`\\ >11 “'9‘
`.-
`
`rll9 2‘5»'
`
`
`
`
`
`
`
`
`
`
`p14
`
`9";1\§
`
`Lt!-
`
`
`
`
`
`
`
`
`
`
`
`
`=»;'<Ei(
`x. //;
`Wm»;
`.93“:
`2:2::
`
`.A
`
`
`
`
`
`
`Figure l2: Connection between the RAMS and the FPGAS in a folded—Clog netwc-zrk.
`1M
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2071, p. 9
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2071, p. 9
`
`
`
`[5]
`
`“GANGLION—
`[5] C. E. Cox and W. E. Elam.
`A Fast Field-Programmable Gate Array Imple-
`mentation of a Connectionist Classifier." IEEE
`Journal on Solid-Sluts Circuits, vol. 27, pp. 9.88-
`299. March 1992,
`P. K. Chan, M. Schlag, and M. Martin, “BORG:
`A reconfigurable prototyping board using Field-
`Programmable Gate Arrays.”
`in Proceedings
`of 1"
`International ACM/SlGDA Workshop
`on Field-Programmable Gate Army's, (Berkeley.
`California. USA), pp. Ill-51. Feb. 1992.
`D. A. Thomas, T. A. Petersen. and D. E. Van
`den Bout. “The Anyboard rapid prototyping
`environment," in Advanced Research in VLSI.
`Proceedings 9
`Inc 1'99] UC Santa Cruz Confer-
`ence, (Santa
`rur. CA), pp. 356—370. 1991.
`
`[8} Aptix Corp. 225 Charcot Avenue, San Jose. CA
`95131.
`
`[9} I-Cube Design Systems. 2323-0 Walsh Avenue.
`Santa Clara. CA 951351.
`{10] XILINX: The Programmable Gate Array Dole
`Book. 2100 Logic Drive, San Jose, CA 95124.
`1992.
`
`[11] V. E. Beneé. Mathematical Theory of Connecting
`Networks and Telephone Tl'nfic.
`Academic
`Press. 1955.
`
`l-lwang. “Control Al orithrns for Bear-
`{12] F, K.
`rsngeable Cloe Networks," EEE “liposuction:
`on Communications. vol. COM-31. pp. 952—954.
`Aug. 1983.
`
`"Two Theorems on a Particu-
`[13] D. Slepian,
`lar Crossbar Switching Network." Unpublished
`Memo, 1952.
`
`[14] C. Clos, “A study of nonvbloclrin switching net-
`works." Bell System Technical
`our-rial. vol. 32.
`pp. 406—424. Mar. 1955.
`
`[15] A. Vanna and C. S. Raghavendra. Interconnec-
`tion Networks for Multiprocessor: and Multr'com-
`purer-s: Theory and Practice. IEEE Press. 1993.
`[16] H. N. Gebow and 0. Kariv, "Algorithms for
`edge coloring bipartite graphs and multigraphs."
`SIAM Journal of Corriputing. vol. 11. no. 1.
`pp. 117-129,1982.
`“Nonlilocking
`[17] Y. Yang and GA M. Masson,
`Broadcast Switching Networks." IEEE Transac—
`l'iarrs on Computers, vol. C-‘IU, pp. 1005—1015,
`Sept. 1991.
`
`based computing system connected as a [aided-Clo:
`network. We assume that each memory element
`serves as local memory to the corresponding FPGA.
`There are several possibilities:
`(such as memory
`1. Attach the other element
`element) directly to the FPGA. as illustrated in
`Figure 10. This involves locking some of the pads
`of the FPGA. and may im air the mutability of
`the FPGA. This scheme
`as the advantage of
`incurring no additional memory accesa time.
`
`‘2. Attach the other element to the middle stage of
`the folded~Clos network, as illustrated in F1 ure
`11. This scheme incurs higher (one switch dc ay
`accefi delay to the elements. and creates force -
`nets *6]. That is, some nets must
`a through
`speci c switches.
`in this case,
`I; e control
`problem is NP-complete. even for m = 2. Yet
`in our experience with our small prototyping
`system norm [6 We have not encountered an
`assignment whl
`could
`3. Treat the other elements as computing elements,
`as illustrated in Figure 12. This scheme incurs
`the largest additional access delay while enablio
`the use of a known polynomial
`time contra
`algorithm.
`It will also increase the size of
`outer switches to maintain rearrangeability of
`the network.
`
`VII Conclusions
`We have addressed architectural issues involved in the
`dsi n of an FPGA-based computing system. Some
`01' the principles discussed in this paper are being
`applied in the design of a FPGA-based computing
`system at University of California at Santa Cruz. We
`are using the Xilinx XC4D10s as the FPGAs and IDISO
`as programmable interconnects.
`
`Acknowledgment
`The authors would like to acknowledge Professor
`Anujan Varma for his discussions on Class networks.
`The authors are grateful for the generous support of
`Xilinx Inc. and IACube Design Systems.
`
`References
`[I] M. Gokhale. W. Holmes, A. Kopser. S. Lucas,
`R. Minnich. D. Sweely, and D. Lopresti. “Build-
`ing and using a highly parallel programmable
`logic array," Computer, vol. 24. DP. 31—89, Jan.
`1991.
`
`.l. Batcheller, and J. Varghese. An
`[2! M. Butts.
`Efficient Logic Emulation System!
`in Proceeds
`199 .
`ingsgICCD 92. pg.l$8-l4l. Cambridge MA. Oct.
`[3] P. Bertin, D. Rancin. and J. Vuillemin, “Pro-
`grammable Active Memories: 3 Performance
`Assessment,” in I" Inlemolionol’ACM/SIGDA
`Workshop on Field Programmable Gate Arrays,
`(Berkeley. California), pp. 57—59. February £992.
`[4] Quickturn Systems Inc.
`325 East Middlefield
`Road. Mountain View, CA 94043. 1991.
`
`161
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2071, p. 10
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2071, p. 10
`
`