`Gate Arrays
`
`JONATHAN ROSE, MEMBER, IEEE, ABBAS EL GAMAL, SENIOR MEMBER, IEEE, AND
`ALBERT0 SANGIOVANNI-VINCENTELLI, FELLOW, IEEE
`
`Invited Paper
`
`A survey of Field-Programmable Gate Array (FPGA) architec-
`tures and the programming technologies used to customize them is
`presented. Programming technologies are compared on the basis of
`their vola fility, size, parasitic capacitance, resistance, and process
`technology complexity. FPGA architectures are divided into two
`constituents: logic block architectures and routing architectures.
`A classijcation of logic blocks based on their granularity is
`proposed and several logic blocks used in commercially available
`FPGA ’s are described. A brief review of recent results on the effect
`of logic block granularity on logic density and pe$ormance of an
`FPGA is then presented. Several commercial routing architectures
`are described in the contest of a general routing architecture
`model. Finally, recent results on the tradeoff between the fleibility
`of an FPGA routing architecture its routability and density are
`reviewed.
`
`I.
`
`INTRODUCTION
`The architecture of a field-programmable gate array
`(FPGA), as illustrated in Fig. 1, is similar to that of
`a mask-programmable gate array (MPGA), consisting of
`an array of logic blocks
`that can be programmably
`interconnected to realize different designs. The major
`difference between FPGA’s and MPGA’s is that an MPGA
`is programmed using integrated circuit fabrication to form
`metal interconnections, while an FPGA is programmed
`via electrically programmable switches much the same as
`traditional programmable logic devices (PLD’s). FPGA’s
`can achieve much higher levels of integration than PLD’s,
`however, due to their more complex routing architectures
`and logic implementations. PLD routing architectures are
`very simple but highly ineffiecient crossbar-like structures
`in which every output is directly connectable to every
`
`Manuscript received October 1, 1992. The work by The second author
`was partially supported under contract J-FBI-89-101.
`J. Rose i s with the Department of Electrical Engineering, University of
`Toronto, 10 King’s College Road, Toronto, Ontario M5S IA4, Canada.
`A. El Gama1 is with the Depratment of Electrical Engineering, Stanford
`University, Stanford, CA 94305.
`A. Sangiovanni-Vincentelli i s with the Department of Electrical Engi-
`neering and Computer Science, University of California, Berkeley, CA
`94720.
`IEEE Log Number 9210745.
`
`Fig. 1. FPGA architecture
`
`input through one switch. FPGA routing architectures
`provide a more efficient MPGA-like routing where each
`connection typically passes through several switches. In a
`PDL, logic is implemented using predominantly two-level
`AND-OR logic with wide input AND gates. In an FPGA
`logic is implemented using multiple levels of lower fanin
`gates, which is often much more compact than two-level
`implementations.
`An FPGA logic block can be as simple as a transistor or
`as complex as a microprocessor. It is typically capapble of
`implementing many different combinational and sequential
`logic functions. Current commercial FPGA’s employ logic
`blocks that are based on one or more of the following:
`Transistor pairs.
`Basic small gates such as two-input NAND’s or
`exclusive-OR’ s.
`Multiplexers.
`
`PROCEEDINGS OF THE IEEE. VOL 81. NO 7 . JULY 1993
`
`1013
`
`0018-9219/93$03.00 0 1993 IEEE
`
`I
`
`1-
`
`-
`
`Authorized licensed use limited to: IEEE Publications Operations Staff. Downloaded on February 28,2023 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
`
`Intel Exhibit 1017
`Intel v. Iida
`
`
`
`Look-up tables (LUT’s).
`Wide-fanin AND-OR structures.
`The routing architecture of an FPGA could be as simple
`as a nearest neighbor mesh [9] or as complex as the perfect
`shuffle used in multiprocessors [42]. More typically, an
`FPGA routing architecture incorporates wire segments of
`varying lengths which can be interconnected via electrically
`programmable switches. The choice of the number of wire
`segments incorporated affects the density achieved by an
`FPGA. If an inadequate number of segments is used, only a
`small fraction of the logic blocks can be utilized, resulting
`in poor FPGA density; conversely the use of an excess
`number of segments that go unused also wastes area.
`The distribution of the lengths of the wire segments
`also greatly affects the density and performance achieved
`by an FPGA. For example, if all segments are chosen to
`be long, implementing local interconnections becomes too
`costly in area and delay. On the other hand if all segments
`are short, long interconnections are implemented using too
`many switches in series, resulting in unacceptably large
`delays.
`Several different programming technologies are used to
`implement the programmable switches. There are three
`types of such programmable switch technologies currently
`in use. These are:
`SRAM, where the switch is a pass transistor controlled
`by the state of a SRAM bit,
`Antifuse, whci, when electrically programmed, forms
`a low resistance path, and
`EPROM, where the switch is a floating-gate transistor
`that can be turned off by injecting charge onto their
`floating gate.
`In all cases, a programmable switch occupies larger
`area and exhibits much higher parasitic resistance and
`capacitance than a typical contact or via used in the
`customization of an MPGA. Additional area is also required
`for programming circuitry. As a result the density and
`performance achievable by today’s FPGA’s are an order
`of magnitude lower than that for MPGA’s manufactured in
`the same technology.
`The adverse effects of the large size and relatively high
`parasitics of programmable switches can be reduced by
`careful architectural choices. By choosing the appropriate
`granularity and functionality of the logic block, and by
`designing the routing architecture to achieve a high degree
`of routability while minimizing the number of switches,
`both density and performance can be optimized. The best
`architectural choices, however, are highly dependent on the
`programming technology used as well as on the type of
`designs implemented, so that no one architecture is likely
`to be best suited for all programming technologies and for
`all designs.
`The complexity of FPGA’s has surpassed the point where
`manual design is either desirable or feasible. Consequently,
`the utility of an FPGA architecture is highly dependent
`on effective automated logic and layout synthesis tools to
`support it. A complex logic block may be underutilized
`
`without an effective logic synthesis tool, and the overall
`utilization of an FPGA may be low without an effective
`placement and routing tool.
`Commercial P G A ’ s differ in the type of programming
`technology used, in the architecture of the logic block
`and in the structure of their routing architecture. In this
`paper we survey the architectures of commercially available
`FPGA’s and discuss the dependence of FPGA density
`and performance on these factors. The paper is organized
`as follows: Section I1 describes the most widely used
`programming technologies. Section 111 presents a survey
`of commercial FPGA logic block architectures, classified
`by their granularity. This includes a summary of recent
`research results concerning the effect of granularity on over-
`all FPGA density and performance. Section IV describes
`several commercial routing architectures in the context of a
`general routing architecture model, and summarizes recent
`research results in this area. Section V concludes with a
`discussion of potential future architectural directions for
`FPGA’s.
`
`11. PROGRAMMING TECHNOLOGIES
`An FPGA
`is programmed using electrically pro-
`grammable switches. The properties of these programmable
`switches, such as size, on-resistance, and capacitance,
`dictate many of the tradeoffs in FPGA architecture. In this
`section we describe the most commonly used programmable
`switch technologies and at the end will contrast each
`technology with respect to volatility, re-programmability,
`size, series on-resistance, parasitic capacitance, and process
`technology complexity.
`
`A. SRAM Programming Technology
`The SRAM programming technology uses Static RAM
`cells to control pass gates or multiplexers as illustrated in
`Fig. 2. It is used in the devices from Xilinx [23], Plessey
`[33] Algotronix, [2], Concurrent Logic [13] and Toshiba
`Wl.
`When a one is stored in the SRAM cell in Fig. 2(a),
`the pass gate acts as a closed switch, and can be used to
`make a connection between two wire segments. When a
`zero is stored, the switch is open and the transistor presents
`a high resistance between the two wire segments. For the
`multiplexer, the state of the SRAM cells connected to the
`select lines controls which one of the multiplexer inputs are
`connected to the output, as shown in Fig. 2(b).
`Since SRAM is volatile, the FPGA must be loaded and
`configured at the time of chip power-up. This requires
`external permanent memory to provide the programming
`bits such as PROM, EPROM, EEPROM or magnetic disk.
`A major disadvantage of SRAM programming technol-
`ogy is its large area. It takes at least five transistors to
`implement an SRAM cell, plus at least one transistor
`to serve as a programmable switch. However, SRAM
`programming technology has two major advantages; fast
`re-programmability and that it requires only standard inte-
`grated circuitprocess technology.
`
`1014
`
`PROCEEDINGS OF THE IEEE, VOL. 81, NO. 7, JULY 1993
`
`Authorized licensed use limited to: IEEE Publications Operations Staff. Downloaded on February 28,2023 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
`
`
`
`I h
`
`PassGate
`
`t
`
`(b)
`Fig. 2. Static RAM programming technology.
`
`t
`
`B. Antifuse Programming Technology
`An antifuse is a two terminal device with an un-
`programmed state presenting a very high resistance between
`its terminals. When a high voltage (from 11 to 20 volts,
`depending on the type of antifuse) is applied across its
`terminals the antifuse will “blow” and create a low-
`resistance link. This link is permanent. Antifuses in use
`today are built either using an Oxygen-Nitrogen-Oxygen
`(ONO) dielectric between N+ diffusion and poly-silicon
`[ 191, [ 151, [ 11 or amorphous silicon between metal layers
`[6] or between polysilicon and the first layer of metal [31].
`Programming an antifuse requires extra circuitry to de-
`liver the high programming voltage and a relatively high
`current of 5 mA or more. This is done in [15] through
`fairly sizable pass transistors to provide addressing to each
`antifuse. An associated paper in this issue discusses the
`programming of antifuse structures in more detail [18].
`Antifuse technology is used in the FPGA’s from Actel [ 151
`[I], Quicklogic [6], and Crosspoint [31].
`A major advantage of the antifuse is its small size,
`little more than the cross-section of two metal wires. This
`advantage is somewhat reduced by the large size of the
`necessary programming transistors, which must be able
`to handle large currents, and the inclusion of isolation
`transistors that are sometimes needed to protect low voltage
`transistors from high programming voltages. A second
`major advantage of an antifuse is its relatively low series
`resistance. The on-resistance of the ONO antifuse is 300 to
`500 ohms [ 191, while the amorphous silicon antifuse is 50 to
`100 ohms [6] [3 11. Additionally, the parasitic capacitance
`of an unprogrammed amorphous antifuse is significantly
`lower than for other programming technologies.
`
`C. Floating Gate Programming Technology
`The floating gate programming technology uses technol-
`ogy found in ultraviolet erasable EPROM and electrically
`erasable EEPROM devices. The EPROM-based approach
`is used in devices from Altera [43] and Plus Logic [34].
`The programmable switch, illustrated in Fig. 3, is a
`transistor that can be permanently “disabled.” This is ac-
`complished by injecting a charge on the floating gate (gate
`2 in the figure) using a high voltage between the control
`gate 1 and the drain of the transistor. This charge increases
`
`ROSE er al.: ARCHITECTURE OF GATE ARRAYS
`
`I
`
`I
`
`-
`
`
`
`word
`
`Fig. 3. Floating gate programming technology
`
`the threshold voltage of the transistor so that it turns off.
`The charge is removed by exposing the floating gate to UV
`light. This lowers the threshold voltage of the transistor and
`makes the transistor function normally.
`Rather than using an EPROM transistor directly as a
`programmable switch, the unprogrammed transistor is used
`to pull down a “bit line” when the “word line” is set high,
`as illustrated in Fig. 3. While this approach can be simply
`used to make a connection between the word and bit lines,
`it can also be used to implement a wired-AND style of
`logic, thereby providing both logic and routing.
`As with
`the SRAM programming
`technology, a
`major advantage of the EPROM technology is its re-
`programmability. An advantage over SRAM, though, is
`that no external permanent memory is needed to program
`the chip on power-up. EPROM technology, however,
`requires three additional processing steps over an ordinary
`CMOS process. Two other disadvantages are the high ON-
`resistance of an EPROM transistor (about twice that of a
`similarly sized NMOS pass transistor) and the high static
`power consumption due to the pull-up resistor used (see
`Fig. 3).
`The EEPROM-based programming technology is used in
`the devices from AMD [3] and Lattice [4]. It is similar
`to the EPROM approach, except that removal of the gate
`charge can be done electrically, in-circuit, without UV light.
`This gives the added advantage of easy reprogrammability,
`which can be very helpful in some applications such as
`hardware updates to equipment in remote locations. An
`EEPROM cell, however, is roughly twice the size of an
`EPROM cell.
`
`E. Summary of Programming Technologies
`Table 1 lists the properties of each programming technol-
`ogy. All data assumes a 1.2 p m CMOS process technology.
`The first column gives the name of the technology. Note
`that there is separate information for the two different
`types of antifuse. The second column indicates if the
`configuration is lost when power is removed from the
`
`1015
`
`.
`
`-
`
`Authorized licensed use limited to: IEEE Publications Operations Staff. Downloaded on February 28,2023 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
`
`
`
`~
`
`Table 1 Comparison of Programming Technologies
`
`Technolugy Voletlle?
`and Process
`
`leProg7
`
`Area
`
`Yes
`in
`Circuit
`
`Large
`
`No
`
`Fuse small (via)
`Pmg. Tran. Large
`
`300-500
`
`5fF
`
`3
`
`No
`
`No
`
`Fuse small (via)
`Prog. Tran. Large
`
`50-100
`
`1.1-1.3fF
`
`3
`
`n u Trsnslato
`I2 pm CMOS
`
`Anti-fuae
`I2 pm CMOS
`
`Amorphous
`Antl-fuse
`I2 pm CMOS
`
`EPROM
`
`No
`
`12pm CMOS
`
`EEPROM
`
`No
`
`1 2 pm CMOS
`
`Yes
`olnd
`arcuk
`
`Yes
`in
`circuit
`
`Small
`in array
`
`2 - 4 k
`
`10-201F
`
`2x EPROM
`
`10-2OfF
`
`1
`
`3
`
`>5
`
`device. The third column indicates if the technology permits
`reprogramming. The fourth column provides the relative
`size of the programmable switch. The fifth column gives
`the series resistance of an “on” switch, and the sixth
`column gives the parasitic capacitance of an “off’ switch,
`not including any capacitance due to associated wiring
`or programming transistors. For reference, the capacitance
`of a 10 pm length of minimum-width wire in a 1.2 pm
`CMOS process is about 0.6 fF. The seventh column gives
`the number of additional processing steps required beyond
`standard CMOS.
`
`111. LOGIC BLOCKARCHITECTURE
`In this section we survey the commercial FPGA logic
`block architectures in use today. In the first section we
`discuss the combinational logic portion of the logic block.
`A discussion of the sequential logic portion is deferred to
`Section 111-D. In Section 111-E, we present several recent
`research results on the effect of the choice of the logic
`block on the density and performance of an FPGA.
`
`A. Survey of Commercial Logic Block Architectures
`FPGA logic blocks differ greatly in their size and imple-
`mentation capability. The two transistor logic block used
`in the Crosspoint FPGA can only implement an inverter
`but is very small in size, while the look-up table logic
`block used in the Xilinx 3000 series FPGA can implement
`any five-input logic function but is significantly larger. To
`capture these differences we classify logic blocks by their
`granularity. Granularity can be defined in various ways,
`for example, as the number of boolean, functions that the
`logic block can implement, the number of equivalent two-
`
`(b)
`Example logic function and two-input NAND gate im-
`Fig. 4.
`plementation.
`
`input NAND gates, the total number of transistors, total
`normalized area, or the number of inputs and outputs. The
`matter is further confused because in some architectures,
`such as the Altera FPGA [43] or the AMD FPGA [3], the
`logic and routing are tightly intertwined and it is difficult
`to separate their contributions to the architecture. For these
`reasons, we choose to classify the commercial logic blocks
`into just two categories: $ne-grain and coarse-grain.
`For all the logic blocks described below, we show how
`to implement the logic function f = ab + T, as illustrated
`in Fig. 4(a). Note that this is equivalent to the two-input
`NAND gate implementation given in Fig. 4(b).
`
`B. Fine-Grain Logic Blocks
`Fine-grain logic blocks closely resemble MPGA basic
`cells. The most fine grain logic block would be identical to a
`basic cell of an MPGA and would consist of few transistors
`that can be programmably interconnected.
`
`1016
`
`PROCEEDINGS OF THE IEEE, VOL. 81, NO. 7, JULY 1993
`
`Authorized licensed use limited to: IEEE Publications Operations Staff. Downloaded on February 28,2023 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
`
`
`
`Transistor Pair
`
`Fig. 5. Transistor pair tiles in cross-point FPGA.
`
`a
`
`f
`
`C
`
`TWO-Input
`NAND
`
`T
`
`~
`
`~
`NAND
`
`Transistors
`Turned Off
`for Isolation
`Fig. 6. Programmed cross-point FPGA for logic function
`f = a b + ?.
`
`-
`
`I
`
`~
`
`~
`
`~
`
`I
`I
`I
`I
`I
`I
`I
`
`Fig. 7. The Plessey logic block.
`
`it is easier to use small logic gates efficiently and the logic
`synthesis techniques for such blocks are very similar to
`those for conventional mask-programmed gate arrays and
`standard cells.
`The main disadvantage of fine-grain blocks is that they
`~
`
`require a relatively large number of wire segments and
`programmable switches. Such routing resources are costly
`in delay and area. As a result, FPGA's employing fine-grain
`blocks are in general slower and achieve lower densities
`than those employing coarse grain blocks. See Section 111-A
`for results supporting this claim.
`
`C. Coarse-Grain Logic Blocks
`1) The Actel Logic Block: The Actel logic block [ 151, [ 11
`is based on the ability of a multiplexer to implement
`different logic functions by connecting each of its inputs
`to a constant or to a signal [46]. For example, consider
`a two-to-one multiplexer with selector input s, inputs a
`and b and output f = sa + sb. By setting signal b to
`logic 0, the multiplexer can implement the AND function
`f = sa. Setting signal a to logic 1 provides the OR function
`f = s + b. By connecting together a number of multiplexers
`and basic logic gates, a logic block can be constructed
`which can implement a large number of functions in this
`manner.
`The Actel Act-1 logic block [I51 is illustrated in Fig.
`8(a). It consists of three multiplexers and one logic gate,
`has a total of 8 inputs and one output, and implements the
`function
`f = (s3 + s 4 ) (STW + S I X ) + (s3 + sq)(S;jy + s2z).
`
`By setting each variable to an input signal, or to a
`constant, 702 logic functions can be realized. For example,
`the logic function f = ab + E is realized by setting
`the variables as shown in Figure 8b: w = 1, x = 1, SI =
`0, y = 0. z = a. s2 = b, s3 = c, and sq = 0.
`The Act-2 logic block [I] is similar to Act-1, except that
`the separate multiplexers on the first row are joined and
`connected to a two-input AND gate, as shown in Fig. 9.
`The Act-2 combinational logic module can implement 766
`functions.
`2 ) Quicklogic Logic Block: The logic block in the FPGA
`from QuickLogic [6] is similar to the Actel logic blocks in
`that it employs a four to one multiplexer. Each input of the
`
`1) The Crosspoint FPGA: The FPGA from Crosspoint
`Solutions [3 11 uses a single transistor pair in the logic block,
`as illustrated in Fig. 5.
`Figure 6 illustrates how the function of Fig. 4(b) is
`implemented with the transistor pair tiles of the cross-point
`FPGA. Since the transistors are connected together in rows,
`the two two-input NAND gates are isolated by turning off
`the pair of transistors between the gates.
`In addition to the transistor pair tiles, the cross-point
`FPGA has a second type of logic block, called a RAM
`logic tile, that is tuned for the implementation of random
`access memory, but can also be used to build random
`logic functions in a manner similar to the Actel and The
`Quicklogic logic blocks described below.
`2) The Plessey FPGA: A second example of a fine-grain
`FPGA architecture is the FPGA from Plessey [33]. Here
`the basic block is a two-input NAND gate as illustrated in
`Fig. 7. Logic is formed in the usual way by connecting the
`NAND gates to implement the desired function. The logic
`function f = ab + C illustrated in Fig. 4(a) is implemented
`exactly as shown in Fig. 4(b). If the latch in Fig. 7 is not
`needed, then the configuration store is set to make the latch
`permanently transparent.
`Several other commercial FPGA's employ fine grain
`blocks. Algotronix [2] uses a two-input function block
`which can perform any function of two inputs. This is
`implemented using a configurable set of multiplexers. The
`logic block of Concurrent Logic's FPGA [13] contains a
`two-input AND gate and a two-input EXCLUSIVE-OR
`gate. The FPGA recently discussed by Toshiba in [32] also
`uses a two-input NAND gate.
`The main advantage of using fine grain logic blocks is
`that the useable blocks' are fully utilized. This is because
`' In all FPGA's, as well as in all MPGA's, only a fraction of the logic
`
`blocks available can be utilized in any design.
`
`ROSE ef d' ARCHITECTURE OF GATE ARRAYS
`
`IO17
`
`Authorized licensed use limited to: IEEE Publications Operations Staff. Downloaded on February 28,2023 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
`
`
`
`m i%&
`
`Ca M El
`
`Ez
`F1 4 -1
`
`I
`
`I
`02
`
`(a)
`
`03 04
`
`Fig. 10. The Quicklogic logic block.
`
`0
`
`1
`
`1
`
`
`
`(a)
`
`b
`
`(b)
`The Actel Act-1 logic block.
`
`Fig. 8.
`
`X
`
`81
`P2
`
`Y
`
`2
`
`- f
`
`a3 a4
`
`Fig. 9. The Actel Act-2 logic block.
`
`multiplexer (not just the select inputs) is fed by an AND
`gate, as illustrated in Fig. 10. Note that alternating inputs
`to the AND gates are inverted. This allows input signals to
`be passed in true or complement form, thus eliminating the
`need to use extra logic blocks to perform simple inversions.
`Multiplexer-based logic blocks have the advantage of
`providing a large degree of functionality for a relatively
`small number of transistors. This is, however, achieved
`at the expense of a large number of inputs (eight in the
`case of Actel and 14 in the case of QuickLogic), which
`when utilized place high demands on the routing resources.
`Such blocks are, therefore, more suited to FPGA’s that use
`
`Fig. 11. Lookup table-based logic.
`
`programmable switches of small size such as antifuses.
`3) The Xilinr Logic Block: The basis for the Xilinx logic
`block is an SRAM functioning as a look-up table (LUT).
`The truth table for a K-input logic function is stored in a
`2K x 1 SRAM. The address lines of the SRAM function
`as inputs and the output of the SRAM provides the value
`of the logic function. For example, consider the truth table
`of the logic function f = ab + given in Fig. ll(a). If
`this logic function is implemented using a three-input LUT,
`then the SRAM would have a 1 stored at address 000, a 0
`at 001 and so on, as specified by the truth table.
`The advantage of look-up tables is that they exhibit
`high functionality-a K-input LUT can implement any
`function of K inputs and there are 22K such functions. The
`disadvantage is that they are unacceptably large for more
`than about five inputs, since the number of memory cells
`needed for a K-input lookup table is 2”. While the number
`of functions that can be implemented increases very fast,
`these additional functions are not commonly used in logic
`designs and are also difficult to exploit for a logic synthesis
`tool. Hence it is often the case that a large LUT will be
`largely underutilized.
`The Xilinx 3000 series logic block [21] [2a] contains a
`five-input one-output LUT, as illustrated in Fig. 12. This
`block can be reconfigured into two four-input LUTs, with
`
`1018
`
`PROCEEDINGS OF THE IEEE, VOL. 81. NO. 7 , JULY 1993
`
`Authorized licensed use limited to: IEEE Publications Operations Staff. Downloaded on February 28,2023 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
`
`
`
`Dap, In
`
`A
`
`Fig. 12. The Xilinx 3000 logic block
`
`Cl cz w U
`I
`l
`l 1
`
`w
`G3
`02
`G1
`
`F4
`F3
`
`Fl
`
`uocll
`
`X
`
`a2
`
`G
`
`ai
`
`Fig. 13 The Xilinx 4000 logic block.
`
`the constraint that together they use a total of no more
`than five distinct inputs. This reconfigurability provides
`flexibility that translates into better logic block utilization
`because many common logic functions do not require as
`many as five inputs. The block also contains sequential
`logic and several multiplexers that connect the combina-
`tional inputs and outputs to the flip-flops or outputs. These
`multiplexers are controlled by the SRAM cells that are
`loaded at programming time.
`The Xilinx 4000 series logic block [23] contains two
`four-input LUT’s feeding into a three-input LUT as il-
`lustrated in Fig. 13. In this block, all of the inputs are
`distinct and available external to the logic block. This block
`introduces two significant architectural changes from the
`3000 series block. First, two differently sized LUT’s are
`used: a four input LUT and a three input LUT, giving the
`complete block a heterogenous flavor. In general, hetero-
`geneity allows for a better tradeoff between performance
`and logic density.
`The second architectural change in the Xilinx 4000 logic
`block is the use of two nonprogrammable connections from
`the two four-input LUT’s to the three-input LUT. These
`connections are significantly faster than any programmable
`interconnection since no programmable switches are used
`in series, and little is present in parallel. If proper use can
`be made of these fast connections FPGA performance can
`be greatly improved. There is a penalty for this type of
`connection, however; since the connection is permanent,
`the inflexibility means that the three-input LUT may often
`go unused, reducing the overall logic density.
`
`I I+
`
`abc
`
`(b)
`Fig. 14. The Altera 5000 Series logic block.
`
`The Xilinx 4000 block incorporates several additional
`features. Each LUT can be used directly as an SRAM block.
`This allows small amounts of memory to be more efficiently
`implemented. Another feature is the inclusion of circuitry
`that can be used to implement fast carry addition circuits.
`4 ) The Altera Logic Block: The architecture of the Altera
`FPGA [43] has evolved from the PLA-based architecture
`of traditional PLDs [28] with its logic block consisting of
`wide fanin (20 to over 100 inputs) AND gates feeding into
`an OR gate with three to eight inputs. Figure 14a illustrates
`the Altera MAX 5000 series logic block. Using the floating
`gate transistor-based programmable switch presented in
`Section 11-C, any vertical wire passing by an AND gate
`can be connected as an input to the gate. The three product
`terms are then OR’s together and can be programmably
`inverted by an exclusive OR gate, which can also be used
`to implement other arithmetic functions. Notice that each
`input signal is provided in both true and complement form,
`with two separate wires. This programmable inversion
`significantly increases the functional capability of the block.
`Figure 14(b) illustrates the implementation of the logic
`function f = ab + C. The x’s in the figure indicate the
`wired-AND connections described in Section II-C.
`The advantage of this type of block is that the wide
`AND gate can be used to form logic functions with few
`levels of logic blocks, reducing the need for programmable
`interconnect. It is difficult, however, to make efficient use
`of all of the inputs to all of the gates, resulting in loss of
`density. This loss is not as severe as it first appears because
`of the high packing density of the wired-AND gates, as well
`as the fact that logic connections also serve as the routing
`function. In other architectures where logic and routing are
`separate such unused inputs would incur a high penalty.
`A disadvantage of the wired-AND configuration is the
`use of pull-up devices that consume static power. An array
`full of these pull-ups will consume significant amount of
`power. To mitigate this, each gate in the MAX 7000 series
`
`ROSE et al.: ARCHITECTURE OF GATE ARRAYS
`
`1019
`
`Authorized licensed use limited to: IEEE Publications Operations Staff. Downloaded on February 28,2023 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
`
`
`
`-41 Y‘
`
`i+f
`(b) !a‘
`
`(a)
`
`u n u r d
`
`block can be programmed to consume about 60% less
`power but at the expense of about 40% increase in delay
`[44]. This feature can be used in noncritical paths to reduce
`power consumption.
`In addition to the wide AND-OR logic block, the MAX
`5000 employs one other type of logic block, called a logic
`expander. This is a wide-input NAND gate whose output
`can be connected to the AND-OR logic block. While a
`logic expander incurs the same delay as the logic block, it
`takes up less area and can be used to increase its effective
`number of product terms.
`The Altera MAX 7000 logic block 1441 is similar to
`the MAX 5000 except that it provides two more product
`terms and has more flexibility because neighboring blocks
`can “borrow” product terms from each other. This is
`accomplished using a small routing structure between the
`AND and OR gates called the product term select matrix.
`Several other FPGA’s use the wide AND-OR style of
`logic block, including those produced by Plus Logic 1341,
`AMD [3], and Lattice [4]. The device in 1341 employs other
`logic types in combination with the wide AND-OR gate.
`
`D. Sequential Logic
`Most of the logic blocks described above include some
`form of sequential logic. The Xilinx devices 122, 231 have
`two D flip-flops that can be programmably connected to
`the outputs of the two lookup tables. The Altera device
`1431 has one flip-flop per logic block. In the Act-1 device
`from Actel [ 151, the sequential logic is not explicitly present
`and so must be formed using programmable routing and
`the purely combinational logic blocks. In the Act-2 device
`[I], there are two alternating types of logic block: the C-
`module which is the purely combinational block described
`in Section 111-Cl), and the S-module which has similar
`combinational functionality to the C-module but includes
`a D flip-flop.
`The Plessey logic block 1331 also incorporates one D
`latch. It thus requires two blocks to make a master-slave
`flip-flop. The Algotronix logic block [2] forms sequential
`logic using feedback around the basic combinational logic
`module.
`
`E. EfSect of Logic Block Granularity on FPGA Density
`and Performance
`In recent years research efforts have been directed at
`determining choices for FPGA logic blocks that optimize
`density and performance 1351, 1361, [37l, [39l, 1241, [401,
`1201, 1261, [41]. In this section we briefly survey this
`research. For a more complete survey see [8]. The section
`is divided into two parts: the first deals with the effect of
`logic block granularity on FPGA density, while the second
`covers the effect of granularity on performance.
`I ) Effect of Granularify on Density As the granularity of
`a logic block increases, the number of blocks needed to
`implement a design should decrease. On the other hand
`a more functional (larger granularity) logic block requires
`more circuitry to implement it, and therefore occupies more
`
`Three implementations o f f = abd + bcd + ab?.
`
`(C)
`
`Fig. 15.
`
`area. This tradeoff suggests the existence of an “optimal”
`logic block granularity for which the FPGA area devoted
`to logic implementation is minimized. While this argument
`for logic area