throbber
Architecture of Field-Programmable
`Gate Arrays
`
`JONATHAN ROSE, MEMBER, IEEE, ABBAS EL GAMAL, SENIOR MEMBER, IEEE, AND
`ALBERT0 SANGIOVANNI-VINCENTELLI, FELLOW, IEEE
`
`Invited Paper
`
`A survey of Field-Programmable Gate Array (FPGA) architec-
`tures and the programming technologies used to customize them is
`presented. Programming technologies are compared on the basis of
`their vola fility, size, parasitic capacitance, resistance, and process
`technology complexity. FPGA architectures are divided into two
`constituents: logic block architectures and routing architectures.
`A classijcation of logic blocks based on their granularity is
`proposed and several logic blocks used in commercially available
`FPGA ’s are described. A brief review of recent results on the effect
`of logic block granularity on logic density and pe$ormance of an
`FPGA is then presented. Several commercial routing architectures
`are described in the contest of a general routing architecture
`model. Finally, recent results on the tradeoff between the fleibility
`of an FPGA routing architecture its routability and density are
`reviewed.
`
`I.
`
`INTRODUCTION
`The architecture of a field-programmable gate array
`(FPGA), as illustrated in Fig. 1, is similar to that of
`a mask-programmable gate array (MPGA), consisting of
`an array of logic blocks
`that can be programmably
`interconnected to realize different designs. The major
`difference between FPGA’s and MPGA’s is that an MPGA
`is programmed using integrated circuit fabrication to form
`metal interconnections, while an FPGA is programmed
`via electrically programmable switches much the same as
`traditional programmable logic devices (PLD’s). FPGA’s
`can achieve much higher levels of integration than PLD’s,
`however, due to their more complex routing architectures
`and logic implementations. PLD routing architectures are
`very simple but highly ineffiecient crossbar-like structures
`in which every output is directly connectable to every
`
`Manuscript received October 1, 1992. The work by The second author
`was partially supported under contract J-FBI-89-101.
`J. Rose i s with the Department of Electrical Engineering, University of
`Toronto, 10 King’s College Road, Toronto, Ontario M5S IA4, Canada.
`A. El Gama1 is with the Depratment of Electrical Engineering, Stanford
`University, Stanford, CA 94305.
`A. Sangiovanni-Vincentelli i s with the Department of Electrical Engi-
`neering and Computer Science, University of California, Berkeley, CA
`94720.
`IEEE Log Number 9210745.
`
`Fig. 1. FPGA architecture
`
`input through one switch. FPGA routing architectures
`provide a more efficient MPGA-like routing where each
`connection typically passes through several switches. In a
`PDL, logic is implemented using predominantly two-level
`AND-OR logic with wide input AND gates. In an FPGA
`logic is implemented using multiple levels of lower fanin
`gates, which is often much more compact than two-level
`implementations.
`An FPGA logic block can be as simple as a transistor or
`as complex as a microprocessor. It is typically capapble of
`implementing many different combinational and sequential
`logic functions. Current commercial FPGA’s employ logic
`blocks that are based on one or more of the following:
`Transistor pairs.
`Basic small gates such as two-input NAND’s or
`exclusive-OR’ s.
`Multiplexers.
`
`PROCEEDINGS OF THE IEEE. VOL 81. NO 7 . JULY 1993
`
`1013
`
`0018-9219/93$03.00 0 1993 IEEE
`
`I
`
`1-
`
`-
`
`Authorized licensed use limited to: IEEE Publications Operations Staff. Downloaded on February 28,2023 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
`
`Intel Exhibit 1017
`Intel v. Iida
`
`

`

`Look-up tables (LUT’s).
`Wide-fanin AND-OR structures.
`The routing architecture of an FPGA could be as simple
`as a nearest neighbor mesh [9] or as complex as the perfect
`shuffle used in multiprocessors [42]. More typically, an
`FPGA routing architecture incorporates wire segments of
`varying lengths which can be interconnected via electrically
`programmable switches. The choice of the number of wire
`segments incorporated affects the density achieved by an
`FPGA. If an inadequate number of segments is used, only a
`small fraction of the logic blocks can be utilized, resulting
`in poor FPGA density; conversely the use of an excess
`number of segments that go unused also wastes area.
`The distribution of the lengths of the wire segments
`also greatly affects the density and performance achieved
`by an FPGA. For example, if all segments are chosen to
`be long, implementing local interconnections becomes too
`costly in area and delay. On the other hand if all segments
`are short, long interconnections are implemented using too
`many switches in series, resulting in unacceptably large
`delays.
`Several different programming technologies are used to
`implement the programmable switches. There are three
`types of such programmable switch technologies currently
`in use. These are:
`SRAM, where the switch is a pass transistor controlled
`by the state of a SRAM bit,
`Antifuse, whci, when electrically programmed, forms
`a low resistance path, and
`EPROM, where the switch is a floating-gate transistor
`that can be turned off by injecting charge onto their
`floating gate.
`In all cases, a programmable switch occupies larger
`area and exhibits much higher parasitic resistance and
`capacitance than a typical contact or via used in the
`customization of an MPGA. Additional area is also required
`for programming circuitry. As a result the density and
`performance achievable by today’s FPGA’s are an order
`of magnitude lower than that for MPGA’s manufactured in
`the same technology.
`The adverse effects of the large size and relatively high
`parasitics of programmable switches can be reduced by
`careful architectural choices. By choosing the appropriate
`granularity and functionality of the logic block, and by
`designing the routing architecture to achieve a high degree
`of routability while minimizing the number of switches,
`both density and performance can be optimized. The best
`architectural choices, however, are highly dependent on the
`programming technology used as well as on the type of
`designs implemented, so that no one architecture is likely
`to be best suited for all programming technologies and for
`all designs.
`The complexity of FPGA’s has surpassed the point where
`manual design is either desirable or feasible. Consequently,
`the utility of an FPGA architecture is highly dependent
`on effective automated logic and layout synthesis tools to
`support it. A complex logic block may be underutilized
`
`without an effective logic synthesis tool, and the overall
`utilization of an FPGA may be low without an effective
`placement and routing tool.
`Commercial P G A ’ s differ in the type of programming
`technology used, in the architecture of the logic block
`and in the structure of their routing architecture. In this
`paper we survey the architectures of commercially available
`FPGA’s and discuss the dependence of FPGA density
`and performance on these factors. The paper is organized
`as follows: Section I1 describes the most widely used
`programming technologies. Section 111 presents a survey
`of commercial FPGA logic block architectures, classified
`by their granularity. This includes a summary of recent
`research results concerning the effect of granularity on over-
`all FPGA density and performance. Section IV describes
`several commercial routing architectures in the context of a
`general routing architecture model, and summarizes recent
`research results in this area. Section V concludes with a
`discussion of potential future architectural directions for
`FPGA’s.
`
`11. PROGRAMMING TECHNOLOGIES
`An FPGA
`is programmed using electrically pro-
`grammable switches. The properties of these programmable
`switches, such as size, on-resistance, and capacitance,
`dictate many of the tradeoffs in FPGA architecture. In this
`section we describe the most commonly used programmable
`switch technologies and at the end will contrast each
`technology with respect to volatility, re-programmability,
`size, series on-resistance, parasitic capacitance, and process
`technology complexity.
`
`A. SRAM Programming Technology
`The SRAM programming technology uses Static RAM
`cells to control pass gates or multiplexers as illustrated in
`Fig. 2. It is used in the devices from Xilinx [23], Plessey
`[33] Algotronix, [2], Concurrent Logic [13] and Toshiba
`Wl.
`When a one is stored in the SRAM cell in Fig. 2(a),
`the pass gate acts as a closed switch, and can be used to
`make a connection between two wire segments. When a
`zero is stored, the switch is open and the transistor presents
`a high resistance between the two wire segments. For the
`multiplexer, the state of the SRAM cells connected to the
`select lines controls which one of the multiplexer inputs are
`connected to the output, as shown in Fig. 2(b).
`Since SRAM is volatile, the FPGA must be loaded and
`configured at the time of chip power-up. This requires
`external permanent memory to provide the programming
`bits such as PROM, EPROM, EEPROM or magnetic disk.
`A major disadvantage of SRAM programming technol-
`ogy is its large area. It takes at least five transistors to
`implement an SRAM cell, plus at least one transistor
`to serve as a programmable switch. However, SRAM
`programming technology has two major advantages; fast
`re-programmability and that it requires only standard inte-
`grated circuitprocess technology.
`
`1014
`
`PROCEEDINGS OF THE IEEE, VOL. 81, NO. 7, JULY 1993
`
`Authorized licensed use limited to: IEEE Publications Operations Staff. Downloaded on February 28,2023 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
`
`

`

`I h
`
`PassGate
`
`t
`
`(b)
`Fig. 2. Static RAM programming technology.
`
`t
`
`B. Antifuse Programming Technology
`An antifuse is a two terminal device with an un-
`programmed state presenting a very high resistance between
`its terminals. When a high voltage (from 11 to 20 volts,
`depending on the type of antifuse) is applied across its
`terminals the antifuse will “blow” and create a low-
`resistance link. This link is permanent. Antifuses in use
`today are built either using an Oxygen-Nitrogen-Oxygen
`(ONO) dielectric between N+ diffusion and poly-silicon
`[ 191, [ 151, [ 11 or amorphous silicon between metal layers
`[6] or between polysilicon and the first layer of metal [31].
`Programming an antifuse requires extra circuitry to de-
`liver the high programming voltage and a relatively high
`current of 5 mA or more. This is done in [15] through
`fairly sizable pass transistors to provide addressing to each
`antifuse. An associated paper in this issue discusses the
`programming of antifuse structures in more detail [18].
`Antifuse technology is used in the FPGA’s from Actel [ 151
`[I], Quicklogic [6], and Crosspoint [31].
`A major advantage of the antifuse is its small size,
`little more than the cross-section of two metal wires. This
`advantage is somewhat reduced by the large size of the
`necessary programming transistors, which must be able
`to handle large currents, and the inclusion of isolation
`transistors that are sometimes needed to protect low voltage
`transistors from high programming voltages. A second
`major advantage of an antifuse is its relatively low series
`resistance. The on-resistance of the ONO antifuse is 300 to
`500 ohms [ 191, while the amorphous silicon antifuse is 50 to
`100 ohms [6] [3 11. Additionally, the parasitic capacitance
`of an unprogrammed amorphous antifuse is significantly
`lower than for other programming technologies.
`
`C. Floating Gate Programming Technology
`The floating gate programming technology uses technol-
`ogy found in ultraviolet erasable EPROM and electrically
`erasable EEPROM devices. The EPROM-based approach
`is used in devices from Altera [43] and Plus Logic [34].
`The programmable switch, illustrated in Fig. 3, is a
`transistor that can be permanently “disabled.” This is ac-
`complished by injecting a charge on the floating gate (gate
`2 in the figure) using a high voltage between the control
`gate 1 and the drain of the transistor. This charge increases
`
`ROSE er al.: ARCHITECTURE OF GATE ARRAYS
`
`I
`
`I
`
`-
`
`
`
`word
`
`Fig. 3. Floating gate programming technology
`
`the threshold voltage of the transistor so that it turns off.
`The charge is removed by exposing the floating gate to UV
`light. This lowers the threshold voltage of the transistor and
`makes the transistor function normally.
`Rather than using an EPROM transistor directly as a
`programmable switch, the unprogrammed transistor is used
`to pull down a “bit line” when the “word line” is set high,
`as illustrated in Fig. 3. While this approach can be simply
`used to make a connection between the word and bit lines,
`it can also be used to implement a wired-AND style of
`logic, thereby providing both logic and routing.
`As with
`the SRAM programming
`technology, a
`major advantage of the EPROM technology is its re-
`programmability. An advantage over SRAM, though, is
`that no external permanent memory is needed to program
`the chip on power-up. EPROM technology, however,
`requires three additional processing steps over an ordinary
`CMOS process. Two other disadvantages are the high ON-
`resistance of an EPROM transistor (about twice that of a
`similarly sized NMOS pass transistor) and the high static
`power consumption due to the pull-up resistor used (see
`Fig. 3).
`The EEPROM-based programming technology is used in
`the devices from AMD [3] and Lattice [4]. It is similar
`to the EPROM approach, except that removal of the gate
`charge can be done electrically, in-circuit, without UV light.
`This gives the added advantage of easy reprogrammability,
`which can be very helpful in some applications such as
`hardware updates to equipment in remote locations. An
`EEPROM cell, however, is roughly twice the size of an
`EPROM cell.
`
`E. Summary of Programming Technologies
`Table 1 lists the properties of each programming technol-
`ogy. All data assumes a 1.2 p m CMOS process technology.
`The first column gives the name of the technology. Note
`that there is separate information for the two different
`types of antifuse. The second column indicates if the
`configuration is lost when power is removed from the
`
`1015
`
`.
`
`-
`
`Authorized licensed use limited to: IEEE Publications Operations Staff. Downloaded on February 28,2023 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
`
`

`

`~
`
`Table 1 Comparison of Programming Technologies
`
`Technolugy Voletlle?
`and Process
`
`leProg7
`
`Area
`
`Yes
`in
`Circuit
`
`Large
`
`No
`
`Fuse small (via)
`Pmg. Tran. Large
`
`300-500
`
`5fF
`
`3
`
`No
`
`No
`
`Fuse small (via)
`Prog. Tran. Large
`
`50-100
`
`1.1-1.3fF
`
`3
`
`n u Trsnslato
`I2 pm CMOS
`
`Anti-fuae
`I2 pm CMOS
`
`Amorphous
`Antl-fuse
`I2 pm CMOS
`
`EPROM
`
`No
`
`12pm CMOS
`
`EEPROM
`
`No
`
`1 2 pm CMOS
`
`Yes
`olnd
`arcuk
`
`Yes
`in
`circuit
`
`Small
`in array
`
`2 - 4 k
`
`10-201F
`
`2x EPROM
`
`10-2OfF
`
`1
`
`3
`
`>5
`
`device. The third column indicates if the technology permits
`reprogramming. The fourth column provides the relative
`size of the programmable switch. The fifth column gives
`the series resistance of an “on” switch, and the sixth
`column gives the parasitic capacitance of an “off’ switch,
`not including any capacitance due to associated wiring
`or programming transistors. For reference, the capacitance
`of a 10 pm length of minimum-width wire in a 1.2 pm
`CMOS process is about 0.6 fF. The seventh column gives
`the number of additional processing steps required beyond
`standard CMOS.
`
`111. LOGIC BLOCKARCHITECTURE
`In this section we survey the commercial FPGA logic
`block architectures in use today. In the first section we
`discuss the combinational logic portion of the logic block.
`A discussion of the sequential logic portion is deferred to
`Section 111-D. In Section 111-E, we present several recent
`research results on the effect of the choice of the logic
`block on the density and performance of an FPGA.
`
`A. Survey of Commercial Logic Block Architectures
`FPGA logic blocks differ greatly in their size and imple-
`mentation capability. The two transistor logic block used
`in the Crosspoint FPGA can only implement an inverter
`but is very small in size, while the look-up table logic
`block used in the Xilinx 3000 series FPGA can implement
`any five-input logic function but is significantly larger. To
`capture these differences we classify logic blocks by their
`granularity. Granularity can be defined in various ways,
`for example, as the number of boolean, functions that the
`logic block can implement, the number of equivalent two-
`
`(b)
`Example logic function and two-input NAND gate im-
`Fig. 4.
`plementation.
`
`input NAND gates, the total number of transistors, total
`normalized area, or the number of inputs and outputs. The
`matter is further confused because in some architectures,
`such as the Altera FPGA [43] or the AMD FPGA [3], the
`logic and routing are tightly intertwined and it is difficult
`to separate their contributions to the architecture. For these
`reasons, we choose to classify the commercial logic blocks
`into just two categories: $ne-grain and coarse-grain.
`For all the logic blocks described below, we show how
`to implement the logic function f = ab + T, as illustrated
`in Fig. 4(a). Note that this is equivalent to the two-input
`NAND gate implementation given in Fig. 4(b).
`
`B. Fine-Grain Logic Blocks
`Fine-grain logic blocks closely resemble MPGA basic
`cells. The most fine grain logic block would be identical to a
`basic cell of an MPGA and would consist of few transistors
`that can be programmably interconnected.
`
`1016
`
`PROCEEDINGS OF THE IEEE, VOL. 81, NO. 7, JULY 1993
`
`Authorized licensed use limited to: IEEE Publications Operations Staff. Downloaded on February 28,2023 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
`
`

`

`Transistor Pair
`
`Fig. 5. Transistor pair tiles in cross-point FPGA.
`
`a
`
`f
`
`C
`
`TWO-Input
`NAND
`
`T
`
`~
`
`~
`NAND
`
`Transistors
`Turned Off
`for Isolation
`Fig. 6. Programmed cross-point FPGA for logic function
`f = a b + ?.
`
`-
`
`I
`
`~
`
`~
`
`~
`
`I
`I
`I
`I
`I
`I
`I
`
`Fig. 7. The Plessey logic block.
`
`it is easier to use small logic gates efficiently and the logic
`synthesis techniques for such blocks are very similar to
`those for conventional mask-programmed gate arrays and
`standard cells.
`The main disadvantage of fine-grain blocks is that they
`~
`
`require a relatively large number of wire segments and
`programmable switches. Such routing resources are costly
`in delay and area. As a result, FPGA's employing fine-grain
`blocks are in general slower and achieve lower densities
`than those employing coarse grain blocks. See Section 111-A
`for results supporting this claim.
`
`C. Coarse-Grain Logic Blocks
`1) The Actel Logic Block: The Actel logic block [ 151, [ 11
`is based on the ability of a multiplexer to implement
`different logic functions by connecting each of its inputs
`to a constant or to a signal [46]. For example, consider
`a two-to-one multiplexer with selector input s, inputs a
`and b and output f = sa + sb. By setting signal b to
`logic 0, the multiplexer can implement the AND function
`f = sa. Setting signal a to logic 1 provides the OR function
`f = s + b. By connecting together a number of multiplexers
`and basic logic gates, a logic block can be constructed
`which can implement a large number of functions in this
`manner.
`The Actel Act-1 logic block [I51 is illustrated in Fig.
`8(a). It consists of three multiplexers and one logic gate,
`has a total of 8 inputs and one output, and implements the
`function
`f = (s3 + s 4 ) (STW + S I X ) + (s3 + sq)(S;jy + s2z).
`
`By setting each variable to an input signal, or to a
`constant, 702 logic functions can be realized. For example,
`the logic function f = ab + E is realized by setting
`the variables as shown in Figure 8b: w = 1, x = 1, SI =
`0, y = 0. z = a. s2 = b, s3 = c, and sq = 0.
`The Act-2 logic block [I] is similar to Act-1, except that
`the separate multiplexers on the first row are joined and
`connected to a two-input AND gate, as shown in Fig. 9.
`The Act-2 combinational logic module can implement 766
`functions.
`2 ) Quicklogic Logic Block: The logic block in the FPGA
`from QuickLogic [6] is similar to the Actel logic blocks in
`that it employs a four to one multiplexer. Each input of the
`
`1) The Crosspoint FPGA: The FPGA from Crosspoint
`Solutions [3 11 uses a single transistor pair in the logic block,
`as illustrated in Fig. 5.
`Figure 6 illustrates how the function of Fig. 4(b) is
`implemented with the transistor pair tiles of the cross-point
`FPGA. Since the transistors are connected together in rows,
`the two two-input NAND gates are isolated by turning off
`the pair of transistors between the gates.
`In addition to the transistor pair tiles, the cross-point
`FPGA has a second type of logic block, called a RAM
`logic tile, that is tuned for the implementation of random
`access memory, but can also be used to build random
`logic functions in a manner similar to the Actel and The
`Quicklogic logic blocks described below.
`2) The Plessey FPGA: A second example of a fine-grain
`FPGA architecture is the FPGA from Plessey [33]. Here
`the basic block is a two-input NAND gate as illustrated in
`Fig. 7. Logic is formed in the usual way by connecting the
`NAND gates to implement the desired function. The logic
`function f = ab + C illustrated in Fig. 4(a) is implemented
`exactly as shown in Fig. 4(b). If the latch in Fig. 7 is not
`needed, then the configuration store is set to make the latch
`permanently transparent.
`Several other commercial FPGA's employ fine grain
`blocks. Algotronix [2] uses a two-input function block
`which can perform any function of two inputs. This is
`implemented using a configurable set of multiplexers. The
`logic block of Concurrent Logic's FPGA [13] contains a
`two-input AND gate and a two-input EXCLUSIVE-OR
`gate. The FPGA recently discussed by Toshiba in [32] also
`uses a two-input NAND gate.
`The main advantage of using fine grain logic blocks is
`that the useable blocks' are fully utilized. This is because
`' In all FPGA's, as well as in all MPGA's, only a fraction of the logic
`
`blocks available can be utilized in any design.
`
`ROSE ef d' ARCHITECTURE OF GATE ARRAYS
`
`IO17
`
`Authorized licensed use limited to: IEEE Publications Operations Staff. Downloaded on February 28,2023 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
`
`

`

`m i%&
`
`Ca M El
`
`Ez
`F1 4 -1
`
`I
`
`I
`02
`
`(a)
`
`03 04
`
`Fig. 10. The Quicklogic logic block.
`
`0
`
`1
`
`1
`
`
`
`(a)
`
`b
`
`(b)
`The Actel Act-1 logic block.
`
`Fig. 8.
`
`X
`
`81
`P2
`
`Y
`
`2
`
`- f
`
`a3 a4
`
`Fig. 9. The Actel Act-2 logic block.
`
`multiplexer (not just the select inputs) is fed by an AND
`gate, as illustrated in Fig. 10. Note that alternating inputs
`to the AND gates are inverted. This allows input signals to
`be passed in true or complement form, thus eliminating the
`need to use extra logic blocks to perform simple inversions.
`Multiplexer-based logic blocks have the advantage of
`providing a large degree of functionality for a relatively
`small number of transistors. This is, however, achieved
`at the expense of a large number of inputs (eight in the
`case of Actel and 14 in the case of QuickLogic), which
`when utilized place high demands on the routing resources.
`Such blocks are, therefore, more suited to FPGA’s that use
`
`Fig. 11. Lookup table-based logic.
`
`programmable switches of small size such as antifuses.
`3) The Xilinr Logic Block: The basis for the Xilinx logic
`block is an SRAM functioning as a look-up table (LUT).
`The truth table for a K-input logic function is stored in a
`2K x 1 SRAM. The address lines of the SRAM function
`as inputs and the output of the SRAM provides the value
`of the logic function. For example, consider the truth table
`of the logic function f = ab + given in Fig. ll(a). If
`this logic function is implemented using a three-input LUT,
`then the SRAM would have a 1 stored at address 000, a 0
`at 001 and so on, as specified by the truth table.
`The advantage of look-up tables is that they exhibit
`high functionality-a K-input LUT can implement any
`function of K inputs and there are 22K such functions. The
`disadvantage is that they are unacceptably large for more
`than about five inputs, since the number of memory cells
`needed for a K-input lookup table is 2”. While the number
`of functions that can be implemented increases very fast,
`these additional functions are not commonly used in logic
`designs and are also difficult to exploit for a logic synthesis
`tool. Hence it is often the case that a large LUT will be
`largely underutilized.
`The Xilinx 3000 series logic block [21] [2a] contains a
`five-input one-output LUT, as illustrated in Fig. 12. This
`block can be reconfigured into two four-input LUTs, with
`
`1018
`
`PROCEEDINGS OF THE IEEE, VOL. 81. NO. 7 , JULY 1993
`
`Authorized licensed use limited to: IEEE Publications Operations Staff. Downloaded on February 28,2023 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
`
`

`

`Dap, In
`
`A
`
`Fig. 12. The Xilinx 3000 logic block
`
`Cl cz w U
`I
`l
`l 1
`
`w
`G3
`02
`G1
`
`F4
`F3
`
`Fl
`
`uocll
`
`X
`
`a2
`
`G
`
`ai
`
`Fig. 13 The Xilinx 4000 logic block.
`
`the constraint that together they use a total of no more
`than five distinct inputs. This reconfigurability provides
`flexibility that translates into better logic block utilization
`because many common logic functions do not require as
`many as five inputs. The block also contains sequential
`logic and several multiplexers that connect the combina-
`tional inputs and outputs to the flip-flops or outputs. These
`multiplexers are controlled by the SRAM cells that are
`loaded at programming time.
`The Xilinx 4000 series logic block [23] contains two
`four-input LUT’s feeding into a three-input LUT as il-
`lustrated in Fig. 13. In this block, all of the inputs are
`distinct and available external to the logic block. This block
`introduces two significant architectural changes from the
`3000 series block. First, two differently sized LUT’s are
`used: a four input LUT and a three input LUT, giving the
`complete block a heterogenous flavor. In general, hetero-
`geneity allows for a better tradeoff between performance
`and logic density.
`The second architectural change in the Xilinx 4000 logic
`block is the use of two nonprogrammable connections from
`the two four-input LUT’s to the three-input LUT. These
`connections are significantly faster than any programmable
`interconnection since no programmable switches are used
`in series, and little is present in parallel. If proper use can
`be made of these fast connections FPGA performance can
`be greatly improved. There is a penalty for this type of
`connection, however; since the connection is permanent,
`the inflexibility means that the three-input LUT may often
`go unused, reducing the overall logic density.
`
`I I+
`
`abc
`
`(b)
`Fig. 14. The Altera 5000 Series logic block.
`
`The Xilinx 4000 block incorporates several additional
`features. Each LUT can be used directly as an SRAM block.
`This allows small amounts of memory to be more efficiently
`implemented. Another feature is the inclusion of circuitry
`that can be used to implement fast carry addition circuits.
`4 ) The Altera Logic Block: The architecture of the Altera
`FPGA [43] has evolved from the PLA-based architecture
`of traditional PLDs [28] with its logic block consisting of
`wide fanin (20 to over 100 inputs) AND gates feeding into
`an OR gate with three to eight inputs. Figure 14a illustrates
`the Altera MAX 5000 series logic block. Using the floating
`gate transistor-based programmable switch presented in
`Section 11-C, any vertical wire passing by an AND gate
`can be connected as an input to the gate. The three product
`terms are then OR’s together and can be programmably
`inverted by an exclusive OR gate, which can also be used
`to implement other arithmetic functions. Notice that each
`input signal is provided in both true and complement form,
`with two separate wires. This programmable inversion
`significantly increases the functional capability of the block.
`Figure 14(b) illustrates the implementation of the logic
`function f = ab + C. The x’s in the figure indicate the
`wired-AND connections described in Section II-C.
`The advantage of this type of block is that the wide
`AND gate can be used to form logic functions with few
`levels of logic blocks, reducing the need for programmable
`interconnect. It is difficult, however, to make efficient use
`of all of the inputs to all of the gates, resulting in loss of
`density. This loss is not as severe as it first appears because
`of the high packing density of the wired-AND gates, as well
`as the fact that logic connections also serve as the routing
`function. In other architectures where logic and routing are
`separate such unused inputs would incur a high penalty.
`A disadvantage of the wired-AND configuration is the
`use of pull-up devices that consume static power. An array
`full of these pull-ups will consume significant amount of
`power. To mitigate this, each gate in the MAX 7000 series
`
`ROSE et al.: ARCHITECTURE OF GATE ARRAYS
`
`1019
`
`Authorized licensed use limited to: IEEE Publications Operations Staff. Downloaded on February 28,2023 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
`
`

`

`-41 Y‘
`
`i+f
`(b) !a‘
`
`(a)
`
`u n u r d
`
`block can be programmed to consume about 60% less
`power but at the expense of about 40% increase in delay
`[44]. This feature can be used in noncritical paths to reduce
`power consumption.
`In addition to the wide AND-OR logic block, the MAX
`5000 employs one other type of logic block, called a logic
`expander. This is a wide-input NAND gate whose output
`can be connected to the AND-OR logic block. While a
`logic expander incurs the same delay as the logic block, it
`takes up less area and can be used to increase its effective
`number of product terms.
`The Altera MAX 7000 logic block 1441 is similar to
`the MAX 5000 except that it provides two more product
`terms and has more flexibility because neighboring blocks
`can “borrow” product terms from each other. This is
`accomplished using a small routing structure between the
`AND and OR gates called the product term select matrix.
`Several other FPGA’s use the wide AND-OR style of
`logic block, including those produced by Plus Logic 1341,
`AMD [3], and Lattice [4]. The device in 1341 employs other
`logic types in combination with the wide AND-OR gate.
`
`D. Sequential Logic
`Most of the logic blocks described above include some
`form of sequential logic. The Xilinx devices 122, 231 have
`two D flip-flops that can be programmably connected to
`the outputs of the two lookup tables. The Altera device
`1431 has one flip-flop per logic block. In the Act-1 device
`from Actel [ 151, the sequential logic is not explicitly present
`and so must be formed using programmable routing and
`the purely combinational logic blocks. In the Act-2 device
`[I], there are two alternating types of logic block: the C-
`module which is the purely combinational block described
`in Section 111-Cl), and the S-module which has similar
`combinational functionality to the C-module but includes
`a D flip-flop.
`The Plessey logic block 1331 also incorporates one D
`latch. It thus requires two blocks to make a master-slave
`flip-flop. The Algotronix logic block [2] forms sequential
`logic using feedback around the basic combinational logic
`module.
`
`E. EfSect of Logic Block Granularity on FPGA Density
`and Performance
`In recent years research efforts have been directed at
`determining choices for FPGA logic blocks that optimize
`density and performance 1351, 1361, [37l, [39l, 1241, [401,
`1201, 1261, [41]. In this section we briefly survey this
`research. For a more complete survey see [8]. The section
`is divided into two parts: the first deals with the effect of
`logic block granularity on FPGA density, while the second
`covers the effect of granularity on performance.
`I ) Effect of Granularify on Density As the granularity of
`a logic block increases, the number of blocks needed to
`implement a design should decrease. On the other hand
`a more functional (larger granularity) logic block requires
`more circuitry to implement it, and therefore occupies more
`
`Three implementations o f f = abd + bcd + ab?.
`
`(C)
`
`Fig. 15.
`
`area. This tradeoff suggests the existence of an “optimal”
`logic block granularity for which the FPGA area devoted
`to logic implementation is minimized. While this argument
`for logic area

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket