`Wong
`
`USOO6940308B2
`(10) Patent No.:
`US 6,940,308 B2
`(45) Date of Patent:
`*Sep. 6, 2005
`
`(54) INTERCONNECTION NETWORK FOR A
`FIELD PROGRAMMABLE GATE ARRAY
`
`(75) Inventor: Dale Wong, San Francisco, CA (US)
`(73) ASSignee: Leopard Logic Inc., Cupertino, CA
`(US)
`
`( c: ) Notice:
`
`Subject to any disclaimer, the term of this
`p
`is Sh adjusted under 35
`
`a --
`
`y U dayS.
`
`This patent is Subject to a terminal dis
`
`claimer.
`
`(21) Appl. No.: 10/764,216
`(22) Filed:
`Jan. 23, 2004
`(65)
`Prior Publication Data
`US 2004/0150422 A1 Aug. 5, 2004
`
`Related U.S. Application Data
`
`(63) Sinitie? Epig . 09/923.294, filed on Aug. 3,
`(60) Provisional application No. 60/223,047, filed on Aug 4,
`2000.
`(51) Int. Cl."
`H03K 19/173; G06F 7/38
`(52) U s c - - - - - - - - - - - - - - - - - - - - - - - - - -
`326/41; 32 63s. 710/317
`58 Fi la fs- - - - - - - -h - - - - - - - - - - - - - - - - -
`s 326iss 39 40
`(58) Field of Search ........................ 326f41, 47: 71037
`s
`s
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,349.248 A 9/1994 Parlour et al. .............. 307/465
`5,519,629 A 5/1996 Snider ..............
`... 364/490
`5.530,813 A 6/1996 Paulsen et al. ............. 395/312
`5.987,028 A 11/1999 Yang et al. ................. 370/380
`6,693.456 B2
`2/2004 Wong .......................... 326/41
`
`FOREIGN PATENT DOCUMENTS
`
`EP
`p.
`
`O919938
`O3O98353
`
`6/1999 ........... G06F/17/50
`4/1991
`........... HO4L/12/48
`
`OTHER PUBLICATIONS
`
`66
`
`Chan et al., “Architectural Tradeoffs in Field-Program
`mable-Device Based Computing Systems,” GPGAS for
`Custom Computing Machines, Procedings, IEEE Workshop
`on Napa, CA, USA, Apr. 5–7, 1993, pp. 152-161.
`
`-
`
`-
`
`-
`
`Primary Examiner Daniel Chang
`(74) Attorney, Agent, or Firm Aka Chan LLP
`(57)
`ABSTRACT
`
`An interconnection network architecture which provides an
`interconnection network which is especially useful for
`FPGAs is described. Based upon Benes networks, the result
`ing network interconnect is rearrangeable So that routing
`between logic cell terminals is guaranteed. Upper limits on
`time delays for the network interconnect are defined and
`pipelining for high Speed operation is easily implemented.
`The described network interconnect offers flexibility so that
`many design options are presented to best Suit the desired
`application.
`
`5,299,317 A 3/1994 Chen et al. ................. 395/325
`
`5 Claims, 13 Drawing Sheets
`
`
`
`Page 1 of 22
`
`FLEX LOGIX EXHIBIT 1008
`
`
`
`U.S. Patent
`
`Sep. 6, 2005
`
`Sheet 1 of 13
`
`US 6,940,308 B2
`
`IO
`
`IO
`
`
`
`Cell
`
`Logic Cell
`
`IO
`
`input A
`
`inputB
`
`
`
`input A
`
`inputB
`
`IO
`
`FIG. I.
`(Prior Art)
`I5
`
`"CROSS
`FIG. 2B
`
`Output C
`
`Output D
`
`Output C
`
`- Output D
`
`Page 2 of 22
`
`
`
`U.S. Patent
`US. Patent
`
`Sep. 6, 2005
`Sep. 6, 2005
`
`Sheet 2 of 13
`Sheet 2 0f 13
`
`US 6,940,308 B2
`US 6,940,308 B2
`
`
`
`
`
`Page 3 of 22
`
`Page 3 of 22
`
`
`
`U.S. Patent
`
`Sep. 6, 2005
`
`Sheet 3 of 13
`
`US 6,940,308 B2
`
`level
`
`input
`
`eve2
`
`level 3
`
`level4
`
`Level 5
`
`20
`
`20
`
`20
`
`20
`
`20
`
`Output
`
`20 EEE
`
`D D
`
`FIG 3A
`
`Level 1
`
`level 2
`
`Level 3
`
`Level4
`
`Level 5
`
`
`
`Page 4 of 22
`
`
`
`U.S. Patent
`US. Patent
`
`Sep. 6, 2005
`Sep. 6, 2005
`
`Sheet 4 of 13
`Sheet 4 0f 13
`
`US 6,940,308 B2
`US 6,940,308 B2
`
`
`
`
`
`Page 5 of 22
`
`Page 5 of 22
`
`
`
`U.S. Patent
`
`Sep. 6, 2005
`
`Sheet 5 of 13
`
`US 6,940,308 B2
`
`
`
`To and From
`Logic Cells
`
`Page 6 of 22
`
`
`
`U.S. Patent
`US. Patent
`
`Sep. 6, 2005
`Sep. 6, 2005
`
`Sheet 6 of 13
`Sheet 6 0f 13
`
`US 6,940,308 B2
`US 6,940,308 B2
`
`
`
`
`
`Page 7 of 22
`
`Page 7 of 22
`
`
`
`U.S. Patent
`US. Patent
`
`Sep. 6, 2005
`Sep.6,2005
`
`Sheet7 0f13
`Sheet 7 of 13
`
`US 6,940,308 B2
`US 6,940,308 132
`
`FIG. 6B
`
`
`
`r | | ! | | | | | | | | | | | | | | { | | | |-
`
`
`
`FIG. 7
`FIG. 7
`
`Page 8 of 22
`
`Page 8 of 22
`
`
`
`U.S. Patent
`
`Sep. 6, 2005
`
`Sheet 8 of 13
`
`US 6,940,308 B2
`
`interconnect
`NetWork
`
`Page 9 of 22
`
`
`
`U.S. Patent
`US. Patent
`
`Sep. 6, 2005
`Sep. 6, 2005
`
`Sheet 9 of 13
`Sheet 9 0f 13
`
`US 6,940,308 B2
`US 6,940,308 B2
`
`
`
`
`
`al-AW
`as W
`- W.
`
`82]
`
`82—/
`FIG. I3A
`FIG. 13/1
`
`82—],
`
`Page 10 of 22
`
`Page 10 of 22
`
`
`
`U.S. Patent
`US. Patent
`
`Sep. 6, 2005
`Sep. 6, 2005
`
`Sheet 10 0f 13
`Sheet 10 01'13
`
`US 6,940,308 B2
`US 6,940,308 B2
`
`
`
`FIG. I. IA
`FIG. [IA
`
`
`
`
`
`FIG. I. IB
`FIG. 11B
`
`Page 11 of 22
`
`Page 11 of 22
`
`
`
`U.S. Patent
`
`Sep. 6, 2005
`
`Sheet 11 of 13
`
`US 6,940,308 B2
`
`
`
`
`
`
`
`Mininuin
`Area?
`
`
`
`
`
`
`
`
`
`
`
`
`
`Corner Turning-No
`Crossbars=4x4
`Latches=No
`
`orner Turning=Yes
`Crossbars=32x32
`atchessNo
`
`Corner turning No
`Crossbars=32x32
`atches=0
`
`
`
`
`
`X)
`
`User Specifies Number
`of Logic Cells
`
`Round Up to Power of Two
`
`Calculate Dimensions of
`Alternate Layouts
`
`User
`Size Constraint?
`
`No
`
`Yes
`Find Best-Matching
`Layout
`
`
`
`User Selects
`Corner Turning
`Option
`
`User Selects
`Crossbar Option
`
`Usef Selects
`Latch Option
`
`User Selects
`Layout
`
`FIG. I2
`
`CSX 1Y
`
`Generate Array
`
`Page 12 of 22
`
`
`
`U.S. Patent
`
`Sep. 6, 2005
`
`Sheet 12 of 13
`
`US 6,940,308 B2
`
`
`
`FIG. 13B
`
`"It is a
`a
`As
`"A. A.
`is a 34
`"B1 B2B3B4
`"Kkokk
`"c c2cc,
`to it
`"E. E.
`E.
`E.
`"Miss M
`E"FF Firs.
`E.
`N.
`N.
`N.
`N.
`E"GC G364
`E.00200
`El H. Has H.
`P. pepp
`
`.
`
`FIG. 14A
`
`Page 13 of 22
`
`
`
`U.S. Patent
`
`Sep. 6, 2005
`
`Sheet 13 of 13
`
`US 6,940,308 B2
`
`clerc,
`FIF2
`ge
`To act
`
`mana
`or
`
`EF
`GH
`
`ABETF
`coach
`MIN
`ktop
`
`was
`is
`stor
`
`FIG. I.4B
`
`a
`
`
`
`
`
`
`
`
`
`ceil
`it.
`Asia-62-Ass E FE2F2 EF
`coca Dac Dachs G2H2 GH
`is
`MBE, F,
`it
`C4D4G4H4
`44 M4N4.
`kop
`2
`2 is
`is MINIM2NaMN
`Kalakalaoa Pao Pao P1
`EE
`c
`
`
`
`
`
`
`
`FIG. I.4C
`
`Page 14 of 22
`
`
`
`1
`INTERCONNECTION NETWORK FOR A
`FIELD PROGRAMMABLE GATE ARRAY
`
`US 6,940,308 B2
`
`CROSS-REFERENCES TO RELATED
`APPLICATIONS
`This patent application claims priority from Provisional
`Patent Application No. 60/223,047, filed Aug. 4, 2000, and
`is a continuation of U.S. patent application Ser. No. 09/923,
`294, filed Aug. 3, 2001 is now a U.S. Pat. No. 6,693.456, all
`of which are hereby incorporated by reference.
`
`15
`
`25
`
`BACKGROUND OF THE INVENTION
`The present invention relates to integrated circuit inter
`connections and, in particular, to the interconnection archi
`tecture of FPGA (Field Programmable Gate Array) inte
`grated circuits.
`FPGAS are integrated circuits whose functionalities are
`designated by the users of the FPGA. The user programs the
`FPGA (hence the term, “field programmable”) to perform
`the functions desired by the user.
`A very significant portion of an FPGA's design is the
`integrated circuit's interconnection network between the
`logic cells or blocks, which perform the functions of the
`FPGA. Heretofore, the current practice for designing an
`FPGA interconnection architecture has been empirical and
`on an ad hoc basis. The goal of the FPGA designer has been
`to create an interconnect Structure which is Sufficiently
`flexible to implement the required wiring for any circuit
`design intended for the FPGA, and yet occupies a minimal
`amount of area of the integrated circuit and with a minimal
`amount of transmission delay. In today's FPGA products,
`the interconnect network typically occupies about 90% of
`the chip area and the actual logic cells occupy only about 5%
`35
`of the chip. In other words, most of the area of the integrated
`circuit is not dedicated to the circuits performing desired
`functions of the FPGA, but rather to the interconnections
`between those circuits.
`Furthermore, the current practice for designing FPGA
`40
`interconnects is empirical and on an ad hoc basis. The users
`of these FPGA products spend most of their design time
`trying to make their circuits route to obtain the desired
`functions and to meet the timing constraints. The rule of
`thumb is to only utilize 50% of the available logic cells in
`order to guarantee they can all be routed through the
`interconnect network. If the timing constraints are relatively
`high speed, then the rule of thumb is to only utilize 33% of
`the logic cells in order to avoid the need for detours and
`longer delays in the routing.
`Hence, there is a need for an FPGA interconnection
`network architecture by which routing through the resulting
`interconnect network is guaranteed and that the timing
`constraints of the interconnect network are predictable. The
`present invention provides for Such an interconnection net
`work.
`
`45
`
`50
`
`55
`
`SUMMARY OF THE INVENTION
`The present invention provides for an integrated circuit
`having a plurality of logic cells, and a programmable
`network interconnecting the logic cells. The programmable
`interconnection network has a plurality of interconnection
`network input terminals, a plurality of programmable
`Switches, each programmable Switch having a plurality of
`input terminals and output terminals with the programmable
`Switch arranged So that signals on any input terminal are
`passed to any output terminal. The plurality of program
`
`60
`
`65
`
`2
`mable Switches interconnecting the plurality of interconnec
`tion network input terminal to the interconnection network
`output terminal are arranged in a Benes network So that
`connections between the interconnection network input ter
`minals and interconnection network output terminals are
`rearrange able.
`The plurality of programmable Switches are arranged in
`hierarchical levels with a first level of the programmable
`Switches having input terminals connected to the intercon
`nection network input terminals and a last level of the
`programmable Switches having output terminals connected
`to the interconnection network output terminals. The levels
`of the programmable Switches intermediate the first and last
`level are arranged in a plurality of first rank Sub
`interconnection networks equal to the number of Switch
`output terminals. Each first rank Sub-interconnection net
`work is connected to an output terminal of each program
`mable Switch in the first level and connected to an input
`terminal of each programmable Switch in the last level. In a
`Similar arrangement, the first rank Sub-interconnection net
`Works themselves are formed from Second rank Sub
`interconnection networks and So forth.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1 is a diagram of the interconnection architecture of
`a current SRAM-based FPGA;
`FIG. 2A is a diagram of one operation of a 2x2 Switch for
`a Benes network, FIG. 2A is a diagram of another operation
`of a 2x2 Switch for a Benes network; FIG. 2C is a block
`diagram of the elements of a 2x2 Switch;
`FIG. 3A illustrates the organization of an 8x8 Benes
`network with 2x2 Switches in accordance with one embodi
`ment of the present invention; FIG. 3B illustrates the inter
`connection between the Switches at the first rank of hierar
`chy; FIG. 3C illustrates the interconnection between the
`Switches at the next lower rank of hierarchy; FIG. 3D shows
`the complete interconnection of the 8x8 Benes network;
`FIG.3E is an example of a permutation of connections in the
`8x8 Benes network to reverse the order of input signals at
`the output terminals of the network,
`FIG. 4A shows how the 8x8 Benes network is folded for
`an FPGA interconnection network in accordance with an
`embodiment of the present invention; FIG. 4B shows the
`resulting folded Benes network; FIG. 4C illustrates the FIG.
`4B folded network in which the interconnections have been
`inverted by level;
`FIG. 5 shows two exemplary logic cells connected to a
`combined Switch of the FIG. 4C network in which the
`combined Switch provides for corner turn routing in accor
`dance with an embodiment of the present invention;
`FIG. 6A illustrates the four elementary states of the
`combined Switch; FIG. 6B illustrates 10 additional states of
`an enhanced combined Switch for corner turn routing in
`accordance with the present invention;
`FIG. 7 is a block diagram of the enhanced combined
`Switch described with respect to FIGS. 6A and 6B;
`FIG. 8 illustrates the four states with the input stage of the
`combined Switch having fanout capability;
`FIG. 9 illustrates an exemplary arrangement to create
`fanout functions with an FPGA interconnect network in
`accordance with an embodiment of the present invention;
`FIG. 10 shows a pair of enhanced switches with timing
`latches in accordance with an embodiment of the present
`invention;
`FIG. 11A is a block diagram of an exemplary circuit
`pipeline with mismatched delay paths; FIG. 11B is a block
`
`Page 15 of 22
`
`
`
`US 6,940,308 B2
`
`15
`
`25
`
`35
`
`3
`diagram of a modified FIG. 11A circuit pipeline with the
`delay paths corrected in accordance with an embodiment of
`the present invention;
`FIG. 12 is a flow chart of a software generator for an
`FPGA, in accordance with an embodiment of the present
`invention;
`FIG. 13A illustrates an exemplary column-based floorplan
`layout of an FPGA according to an embodiment of the
`present invention; FIG. 13B illustrates how two of the FIG.
`13A columns are interconnected;
`FIG. 14A shows the column-based layout of FIG. 13B
`with all the cells labeled; FIG. 14B illustrates the same
`topological network arranged in a tree-based layout in
`accordance with the present invention; and FIG. 14C shows
`a modification of the FIG. 14B tree-based layout with
`wirelengths between switch levels minimized.
`DESCRIPTION OF THE SPECIFIC
`EMBODIMENTS
`Current SRAM (Static Random Access Memory)-based
`FPGA products conform to the interconnect architecture as
`illustrated in FIG. 1: The basic structure of FIG. 1 has logic
`cells 10, which implement the desired circuit logic by the
`user, connection cells 11 which connect logic cells 10 to the
`interconnect network, and Switch cells 12 which implement
`the interconnect network. Additional connections are made
`between a Switch cell 12 and its four neighboring Switch
`cells 12 in the north, east, west, and South directions. The
`Switch cells 12, connection cells 11, and all their wires and
`connections constitute the interconnect network of the
`FPGA. This basic unit is arrayed to build FPGAs of varying
`SZCS.
`The flexibility of this architecture lies within the connec
`tion cell 11 and the Switch cell 12. In common terminology,
`a fully “populated” connection cell 11 will connect each pin
`of the logic cell 10 to every wire connecting to the Switch
`cell 12. A “depopulated” connection cell 11 will connect
`each pin of the logic cell to a Subset of the wires connecting
`to the Switch cell 12, with each pin connecting to a different,
`possibly overlapping, Subset of wires. Similarly, a fully
`“populated” Switch cell will provide full crossbar connec
`tions between all the wires on all four of its Sides, and a
`“depopulated” Switch cell will only provide a subset of these
`connections. Lastly, the Set of wires between any two cells
`is called a “channel”, and the number of wires in a channel
`can be varied.
`Each possible connection in the FPGA interconnect net
`work requires its own pass gate and controlling configura
`tion bit. A fully populated interconnect network is prohibi
`tively expensive to implement and the current practice has
`been to build a parameterized Software model that can
`represent varying depopulated interconnect networks. Then
`various representative logic designs are tried onto the mod
`eled networks. Based on this empirical data, a judgment
`55
`must be made about what constitutes an “acceptable' inter
`connect network in terms of routability versus implementa
`tion cost. This is an ad hoc proceSS Since there are no
`theoretical guarantees of routability, i.e., that the desired
`interconnections can actually be made.
`A further complication in the above empirical proceSS has
`been that the demands on the interconnect network do not
`Scale linearly with the number of logic cells in the array. In
`other words, an interconnect network that Seems to route
`most designs on an array with 1 Klogic cells cannot simply
`be replicated for a 64K logic cell array. AS Seen empirically,
`the routing demands grow exponentially, but these demands
`
`50
`
`40
`
`45
`
`60
`
`65
`
`4
`are highly dependent on the exact algorithms used to imple
`ment the design. Specifically, it depends on the algorithms
`used to map the original circuit design onto the logic cells,
`to place the logic cells on the array, and to route (connect)
`the logic cells to each other. There is currently no precise
`theoretical model of this growth in wiring demand, although
`current practice has been to approximate the wiring demand
`with stochastic models. The use of these models entails
`Some assumptions for certain coefficients, which are based
`on empirical data, and So current practice is Still an ad hoc
`proceSS.
`In contrast, the present invention provides for an FPGA
`interconnection network architecture which creates inter
`connection networks which are “rearrangeable, i.e., any
`permutation of interconnections from the network's input
`terminals to the output terminals can be implemented. The
`resulting FPGA network interconnect has guaranteed rout
`ing with defined maximum timing delayS and is Scalable.
`The present invention uses the So-called Benes network,
`which has been the Subject of research in the telecommu
`nications field, Specifically for Switching networks. Gener
`ally described, a Benes network interconnects a number of
`network input terminals to a number of network output
`terminals. Between the input and output terminals are
`Switches, each Switch itself having input terminals and a
`number of output terminals and the ability to pass Signals on
`any input terminal to any output terminal. The Switches are
`connected in hierarchical levels with a first level of Switches
`having input terminals connected to the network input
`terminals and a last level of the Switches having output
`terminals connected to the network output terminals. The
`levels of the Switches intermediate the first and last levels are
`arranged in a plurality of first rank Sub-interconnection
`networks equal to the number of Switch input (and output)
`terminals, each first rank Sub-interconnection network con
`nected to an output terminal of each Switch in the first level
`and connected to an input terminal of each Switch in the last
`level. The first rank Sub-interconnection networks are
`formed by Second level Switches having input terminals
`connected to the output terminals of the first level switches
`and Second-to-the-last level Switches having output termi
`nals to the input terminals of the last level of Switches. The
`levels of Switches intermediate the Second and Second-to
`the-last level are arranged in a plurality of Second rank
`Sub-interconnection networks equal to the number of Switch
`output terminals with each Second rank Sub-interconnection
`network connected to an output terminal of each Second
`level Switch and connected to an input terminal of each
`Second-to-the-last level Switch.
`A Switch level hierarchy is formed because each rank
`Sub-interconnection network is formed like the rank Sub
`interconnection network above. That is, each rank Sub
`interconnection network is formed by a plurality of Switches
`in one level, the Switches having input terminals connected
`to output terminals of Switches of a Sub-interconnect net
`work rank immediately higher; and a corresponding level of
`Switches having output terminals connected to input termi
`nals of Switches of the Sub-interconnect network ran imme
`diately higher; and the levels of the Switches intermediate
`the Switches in the one and corresponding levels arranged in
`a plurality of lower rank Sub-interconnection networks equal
`to the number of Switch output terminals, each lower rank
`Sub-interconnection network connected to an output termi
`nal of each Switch in the one level and connected to an input
`terminal of each Switch in the corresponding level. To define
`the hierarchical level arrangement of the Switches.
`The particular Benes network described immediately
`below explains the Switch hierarchy with specificity. This
`
`Page 16 of 22
`
`
`
`US 6,940,308 B2
`
`1O
`
`15
`
`25
`
`35
`
`40
`
`S
`network is also useful to implement an FPGA according to
`the present invention.
`Benes Network With 2x2 Switches
`The building block of the described Benes network is the
`2x2 (2 input, 2 output) Switch 20, having operations illus
`trated in FIGS 2A and 2B. The 2x2 Benes Switch 20 has two
`possible configuration modes: pass and croSS. In pass mode
`illustrated by FIG. 2A, a signal on input A is passed Straight
`to output C, and a Signal on input B is passed Straight to
`output D. In cross mode illustrated by FIG. 2B, a signal on
`input A crosses over to output D and a Signal on input B
`crosses over to output C. A Single configuration or control bit
`can control these two modes.
`The Switching itself can be implemented with two 2:1
`multiplexers or MUX's as shown by FIG.2C. The switch 20
`has two MUXs 21 and 22 having two input nodes which are
`each connected to one of the input terminals, input A or input
`B, of the switch 20. The output node of the MUX 21 forms
`the output terminal, output A, and the output node of the
`MUX 22 forms the output terminal, output B, of the Switch
`20. Both MUXS 21 and 22 are connected to a control line 23
`which carries the configuration or control bit. The entire
`Switch cell only requires 18 transistors in a CMOS
`(Complementary Metal-Oxide-Semiconductor) implemen
`tation of an integrated circuit.
`These 2x2 Switches are connected in a Specific topology
`to build a Benes network. For the purpose of illustration, the
`arrangement of the 2x2 Switches 20 in an 8x8 Benes
`network is shown in FIG. 3A. For a network with N inputs
`and N outputs, N being a power of 2, there are (2*(logN)-
`1) levels of Switches, each level consisting of N/2 switches.
`In this example of an 8x8 network, each level has 4 Switch
`cells and there are 5 levels. The interconnection between the
`Switches 20 can best be understood by viewing the network
`in a hierarchical arrangement, starting from the outside and
`proceeding inwards. We can view the two outermost levels,
`levels 1 and 5, in detail and view the inner levels as
`hierarchical blocks, as illustrated by FIG. 3B. The inner
`levels can be viewed as two hierarchical blocks, an Upper
`Network 25 and a Lower Network 26. In level 1, each Switch
`cell 20 has one output going to the Upper Network 25, and
`one output going to the Lower Network 26. Similarly in
`level 5, each Switch cell has one input from the Upper
`Network 25, and one input coming from the Lower Network
`26.
`At the next level of the hierarchy of the Benes network,
`the details of the Upper Network 25 and the Lower Network
`26 are expanded in FIG. 3C. The Upper Network 25 is
`formed by Switches 20 in levels 2 and 4, and Upper and
`Lower Networks 27 and 28 respectively. The Lower Net
`work 26 is formed by Switches 20 in levels 2 and 4, and its
`own Upper and Lower Networks 29 and 30 respectively.
`Each of these networks 27-30 are half the size of the higher
`level networks 25 and 26 and are similarly decomposed into
`their own Upper and Lower Networks: In this example of an
`8x8 network, the bottom of the hierarchy has been reached
`since the lower level networks 27-30 are Switches 20 in
`level 3. For larger networks, a similar decomposition into the
`Upper and Lower Networks may be performed until the
`bottom of the hierarchy is reached. The complete intercon
`nection of the constituent Switches 20 in the 8x8 Benes
`network is illustrated by FIG. 3D.
`The Benes network of FIG. 3D is not configured with only
`the hard-wired connections between the Switch cells 20
`illustrated. This network can potentially implement any
`permutation of Signals on input terminals to output termi
`nals. In order to configure the network to implement a
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`specific routing, each Switch cell 20 must be individually
`configured as either “pass” or “cross” mode described
`previously. The example of FIG.3E shows the configuration
`of the network to implement an order reversal from the
`inputs to the outputs:
`Note that there are many variations of the Benes network.
`The hierarchical sub-division into Upper and Lower net
`WorkS can be generalized to more than 2 Sub-networks, So
`networks of the Size p", p>2, can be constructed. Also, the
`sub-division does not require that the Sub-networks be of
`equal size. This generalized construction leads to overall
`Benes networks with arbitrary numbers of inputs and a
`proportional number of Switch cells. All variants are simply
`be referred to as Benes networks.
`The Benes network is a very powerful and efficient
`interconnection network with guaranteed routability. Its use
`has not been more widespread because of the complexity of
`the algorithm required to determine the appropriate configu
`ration of the Switches for a specific routing. The Benes
`network is “rearrangeable,” but not “non-blocking.” Non
`blocking means that any input-to-output connection can
`always be made, even if there are already existing connec
`tions on the network. Rearrangeable is less powerful and
`means that any input-to-output connection can be made, but
`Some existing connections may need to be rerouted. In the
`dynamic worlds of telephone Switching and data communi
`cation networks, a Benes network would require that a
`routing algorithm be performed every time a new connection
`is requested. A Benes routing algorithm requires time
`O(NlogN), but the network itself transmits data in time
`O(logN). It takes longer to reconfigure the network than to
`actually transmit the data through the network. Hence,
`current practice in the data communications has to use more
`expensive non-blocking Switches.
`However, the present invention recognizes that in the
`FPGA world, routing is not so dynamic. There is no real time
`Set up and tear down of fleeting connections. Instead, in an
`offline process, a circuit design is mapped onto the FPGA
`integrated circuit once and the resulting interconnect con
`figuration is used without change. Even in the application of
`FPGA technology to the burgeoning field of “reconfigurable
`logic', multiple configurations may be rapidly Swapped in
`and out of the FPGA, but each configuration itself is never
`changed. Presently, the offline routing process in an FPGA
`requires on the order of minutes or even hours of execution
`time. In contrast, the execution of a Benes routing algorithm
`requires in the order of 10 Seconds (which is completely
`unacceptable in a data communications network) in accor
`dance with the present invention. This time is spectacularly
`fast and routability is guaranteed.
`Specific Implementation of Benes Network in FPGAs
`There are a number of ways that the Benes network may
`be adapted to make it more efficient as an interconnection
`network for an FPGA or MPGA (Mask Programmable Gate
`Array). In an FPGA, the logic is composed from building
`blocks called “logic cells', and the logic cells contain both
`input and output pins. An example of a typical logic cell is
`a 2-input NAND gate. So an FPGA interconnection network
`should have neighboring “leaf cells” which correspond to
`the logic cells and which contain both inputs and outputs to
`make the connections to the logic cells. This can be accom
`modated by “folding in half the original Benes network,
`and combining the Switch cells 20 from the first level and
`last level, the Second level and Second-to-last level, and So
`on. This is illustrated in FIG. 4A with the 8x8 network
`example. The “folding” is made along the dotted line 31
`which runs through the Switches 20 in level 3. The Switches
`
`Page 17 of 22
`
`
`
`15
`
`7
`20 are labeled by location in the network to maintain
`identification through the folding operation. The first num
`ber in the labels identifies the level of the Switch and the
`Second number its row location. Hence Switch with label,
`“4.3," is in level 4 and row 3.
`The resulting folded network is illustrated by FIG. 4B.
`The Switches 20 are combined into two, with the formerly
`level 3 switches duplicated for uniformity. While the com
`bined Switches 32 represent a topological change of the 8x8
`Benes network, it should be noted that the connections
`between the cells 20 remain the same. The combined input
`and output Switch cells 32 on the left of the folded network,
`e.g., combined switches 1.2 and 5.2, form the leaf cells for
`the connection to the pins of the FPGA logic cells.
`From the connections between the combined Switches 32,
`the network of FIG. 4B can be turned “inside out', that is,
`the innermost levels of the combined Switches 32 become
`the outermost and vice versa, as illustrated in FIG. 4C,
`without affecting the routability of the interconnection net
`work. The levels with shorter connections are moved be
`closer to the logic cells. This is more suitable for an FPGA.
`Corner Turning for Interconnection Network
`With inputs and outputs combined into a single Switch cell
`32, shorter routes between logic cells which don't travel
`through all 2 (logN) levels of Switches can be configured.
`In the original Benes network, every route must travel
`through all the levels to go from input to output. In the
`adapted interconnection network, Signals from the logic can
`“turn the corner” before reaching the opposite side of the
`network. For example, in FIG. 5, logic cell 41 has an output
`pin that must be routed to an input pin on logic cell 42.
`Of course, the particular advantage of corner turning in
`the interconnection network depends on the quality of logic
`cell placement algorithm for the FPGA. (Note that “place
`ment” for an FPGA logic cell is not the physical placement
`of Selected logic gates to form a desired function, but rather
`the programming of a Selected logic cell to perform the
`desired function.) The algorithm is designed to minimize the
`distance between connected logic cells, where distance is not
`defined as it usually is for an FPGA or MPGA. The usual
`definition of distance in a placement algorithm is either
`Euclidean or Manhattan. In present interconnection
`network, distance is defined as the depth of the first common
`ancestor in the network because a corner can be turned at
`this point. The most appropriate placement algorithms build
`cluster trees with capacity constraints, either top-down or
`bottom-up. Nonetheless, regardless of the quality of logic
`cell placement, the present invention Still provides the
`original worst case bound of 2 (logN) Switches, no matter
`how highly the network is utilized. In contrast, current
`FPGA products cannot guarantee a worst case bound on
`Signal delay when the integrated circuit is highly utilized.
`Enhanced Switch for FPGAs
`Corner turning requires that the original Benes Switch be
`enhanced. It should noted that the original Switch had 2
`States responsive to 1 configuration bit. See the description
`above with respect to FIGS. 2A and 2B. Just by combining
`the input and output Switches cells, the combined switch 32
`has 4 States and requires 2 configuration bits. FIG. 6A
`illustrates the four permutations of passing Signals from the
`60
`input terminals to the output terminals for the combined
`Switch 32.
`The corner turning feature adds 5 more States for the
`“output lower half of the combined switch 32. When
`multiplied by 2 states for the “input” upper half, there are a
`total of 10 new states for the combined Switch 32. These
`additional 10 states are illustrated by FIG. 6B. It should be
`
`45
`
`50
`
`55
`
`65
`
`US 6,940,308 B2
`
`25
`
`35
`
`40
`
`8
`noted that for the corner turning States shown, there are only
`two possible paths to turn a corner: Each of the two possible
`corner turning outputs can only be connected to one of the
`inputs, not both inputs. The unconnected input comes from
`the same Switch as the one the output is going to. While there
`may be Some possible use for this connection in terms of
`Selectively adding variable delays to certain routes, the cost
`of implementing additional configuration bits to all com
`bined Switch cells to Support these paths is unjustified.
`Of course, the increased number of States for the com
`bined switch can not be satisfied by the two-MUX structure
`of FIG. 2C. FIG. 7 is a block diagram of the combined
`switch cell 32, which is formed by MUXs 61-66 which
`operate by the Setting of configuration bits on five control
`line nodes 71–75. MUXS 61 and 62 each have terminal
`nodes connected to inputs A and B; control line node 71 is
`connected to MUX 61 and control line node 72 is connected
`to MUX 72. The output nodes of MUX 61 and 62 form the
`outputs of the combined Switch 32. MUXs 63 and 65 are
`connected so that the output