throbber
The Roles of FPGA’s in Reprogrammable
`Systems
`
`SCOTT HAUCK, MEMBER, IEEE
`
`Reprogrammable systems based on field programmable gate
`arrays are revolutionizing some forms of computation and
`digital logic. As a logic emulation system, they provide orders
`of magnitude faster computation than software simulation. As a
`custom-computing machine, they achieve the highest performance
`implementation for many types of applications. As a multimode
`system, they yield significant hardware savings and provide truly
`generic hardware.
`In this paper, we discuss the promise and problems of
`reprogrammable systems. This includes an overview of the chip
`and system architectures of reprogrammable systems as well as
`the applications of these systems. We also discuss the challenges
`and opportunities of future reprogrammable systems.
`Keywords— Adaptive computing, custom computing, FPGA,
`logic emulation, multi-FPGA systems, reconfigurable computing.
`
`I.
`
`INTRODUCTION
`In the mid-1980’s, a new technology for implementing
`digital logic was introduced: the field programmable gate
`array (FPGA). These devices could be viewed as either
`small, slow gate arrays (MPGA’s) or large, expensive
`programmable logic devices (PLD’s). FPGA’s were capable
`of implementing significantly more logic than PLD’s, espe-
`cially because they could implement multilevel logic, while
`most PLD’s were optimized for two-level logic. Although
`they did not have the capacity of MPGA’s, they also did
`not have to be custom fabricated, greatly lowering the costs
`for low-volume parts and avoiding long fabrication delays.
`While many of the FPGA’s were configured by static
`random access memory (SRAM) cells in the array, this was
`generally viewed as a liability by potential customers who
`worried over the chip’s volatility. Antifuse-based FPGA’s
`also were developed and for many applications were much
`more attractive, both because they tended to be smaller and
`faster due to less programming overhead and because there
`was no volatility to the configuration.
`
`Manuscript received May 5, 1997; revised January 20, 1998. This work
`was supported in part by the Defense Advanced Research Project Agency
`under Contract DABT63-97-C-0035 and in part by the National Science
`Foundation under Grants CDA-9703228 and MIP-9616572.
`The author is with the Department of Electrical and Computer Engi-
`neering, Northwestern University, Evanston, IL 60208-3118 USA.
`Publisher Item Identifier S 0018-9219(98)02669-3.
`
`In the late 1980’s and early 1990’s, there was a growing
`realization that the volatility of SRAM-based FPGA’s was
`not a liability but was in fact the key to many new types
`of applications. Since the programming of such an FPGA
`could be changed by a completely electrical process, much
`as a standard processor can be configured to run many
`programs, SRAM-based FPGA’s have become the work-
`horse of many new reprogrammable applications. Some
`uses of reprogrammability are simple extensions of the
`standard logic implementation tasks for which the FPGA’s
`were originally designed. An FPGA plus several different
`configurations stored in read-only memory (ROM) could
`be used for multimode hardware, with the functions on
`the chip changed in reaction to the current demands. Also,
`boards constructed purely from FPGA’s, microcontrollers,
`and other reprogrammable parts could be truly generic
`hardware, allowing a single board to be reprogrammed to
`serve many different applications.
`Some of the most exciting new uses of FPGA’s move
`beyond the implementation of digital
`logic and instead
`harness large numbers of FPGA’s as a general-purpose
`computation medium. The circuit mapped onto the FPGA’s
`need not be standard hardware equations but can even
`be operations from algorithms and general computations.
`While these FPGA-based custom-computing machines may
`not challenge the performance of microprocessors for all
`applications, for computations of the right form, an FPGA-
`based machine can offer extremely high performance, sur-
`passing any other programmable solution. Although a cus-
`tom hardware implementation will be able to beat the power
`of any generic programmable system, and thus there must
`always be a faster solution than a multi-FPGA system, the
`fact is that few applications will ever merit the expense
`of creating application-specific solutions. An FPGA-based
`computing machine, which can be reprogrammed like a
`standard workstation, offers the highest realizable perfor-
`mance for many different applications. In a sense, it is
`a hardware supercomputer, surpassing traditional machine
`architectures for certain applications. This potential has
`been realized by many different research machines. The
`Splash system [50] has provided performance on genetic
`string matching that is almost 200 times greater than all
`
`PROCEEDINGS OF THE IEEE, VOL. 86, NO. 4, APRIL 1998
`Authorized licensed use limited to: Riva Laughlin. Downloaded on May 08,2023 at 15:24:24 UTC from IEEE Xplore. Restrictions apply.
`
`615
`
`0018–9219/98$10.00 ª
`
`1998 IEEE
`
`Ex.1030
`CISCO SYSTEMS, INC. / Page 1 of 24
`
`

`

`(a)
`
`(b)
`
`Fig. 1. Actel’s programmable low-impedance circuit element (PLICE). As shown in (a), an
`unblown antifuse has an oxide–nitride–oxide (ONO) dielectric preventing current from flowing
`between diffusion and polysilicon. The antifuse can be blown by applying a 16-V pulse across the
`dielectric. This melts the dielectric, allowing a conducting channel to be formed (b). Current is then
`free to flow between the diffusion and the polysilicon [1], [54].
`
`other supercomputer implementations. The DECPeRLe-1
`system [133] has demonstrated world-record performance
`for many other applications, including RSA cryptography.
`One of the applications of multi-FPGA systems with
`the greatest potential is logic emulation. The designers of
`a custom chip need to verify that the circuit they have
`designed actually behaves as desired. Software simulation
`and prototyping have been the traditional solution to this
`problem. As chip designs become more complex, however,
`software simulation is only able to test an ever decreasing
`portion of the chips’ computations, and it is quite expensive
`in time and money to debug by repeated prototype fabrica-
`tions. The solution is logic emulation, the mapping of the
`circuit under test onto a multi-FPGA system. Since the logic
`is implemented in the FPGA’s in the system, the emulation
`can run at near real
`time, yielding test cycles several
`orders of magnitude faster than software simulation, yet
`with none of the time delays and inflexibility of prototype
`fabrications. These benefits have led many of the advanced
`microprocessor manufacturers to include logic emulation
`in their validation process.
`In this paper, we discuss the different applications
`and types of reprogrammable systems. In Section II, we
`present an overview of FPGA architectures as well as field
`programmable interconnect components (FPIC’s). Then,
`Section III details what kinds of opportunities these devices
`provide for new types of systems. We then categorize the
`types of reprogrammable systems in Section IV, including
`coprocessors and multi-FPGA systems. Section V describes
`in depth the different multi-FPGA systems, highlighting
`their important features. Last, Sections VI and VII conclude
`with an overview of the status of reprogrammable systems
`and how they are likely to evolve. Note that this paper is
`not meant to be a catalog of every existing reprogrammable
`architecture and application. We instead focus on some of
`the more important aspects of these systems in order to
`give an overview of the field.
`
`II. FPGA TECHNOLOGY
`One of the most common field programmable elements
`is PLD’s. PLD’s concentrate primarily on two-level, sum-
`of-products implementations of logic functions. They have
`simple routing structures with predictable delays. Since
`they are completely prefabricated, they are ready to use in
`seconds, avoiding long delays for chip fabrication. FPGA’s
`
`Fig. 2. Floating gate structure for EPROM/EEPROM. The float-
`ing gate is completely isolated. An unprogrammed transistor, with
`no charge on the floating gate, operates the same as a normal
`n-transistor, with the access gate as the transistor’s gate. To
`program the transistor, a high voltage on the access gate plus a
`lower voltage on the drain accelerates electrons from the source fast
`enough to travel across the gate oxide insulator to the floating gate.
`This negative charge then prevents the access gate from closing
`the source–drain connection during normal operation. To erase,
`EPROM uses ultraviolet light to accelerate electrons off the floating
`gate, while EEPROM removes electrons by a technique similar to
`programming but with the opposite polarity on the access gate
`[134], [148].
`
`are also completely prefabricated, but instead of two-level
`logic, they are optimized for multilevel circuits. This allows
`them to handle much more complex circuits on a single
`chip, but it often sacrifices the predictable delays of PLD’s.
`Note that FPGA’s are sometimes considered another form
`of PLD, often under the heading of complex PLD.
`Just as in PLD’s, FPGA’s are completely prefabricated
`and contain special features for customization. These con-
`figuration points are normally SRAM cells, EPROM, EEP-
`ROM, or antifuses. Antifuses are one-time programmable
`devices (Fig. 1), which when “blown” create a connection.
`When they are “unblown,” no current can flow between
`their terminals (thus, it is an “anti” fuse, since its behavior
`is opposite to a standard fuse). Because the configuration
`of an antifuse is permanent, antifuse-based FPGA’s are
`one-time programmable, while SRAM-based FPGA’s are
`reprogrammable, even in the target system. Since SRAM’s
`are volatile, an SRAM-based FPGA must be reprogrammed
`every time the system is powered up, usually from a ROM
`included in the circuit to hold configuration files. Note
`that FPGA’s often have on-chip control circuitry to load
`this configuration data automatically. EEPROM/EPROM
`(Fig. 2) are devices somewhere between SRAM and an-
`tifuse in their features. The programming of an EEP-
`ROM/EPROM is retained even when the power is turned
`off, avoiding the need to reprogram the chip at power-
`
`616
`
`PROCEEDINGS OF THE IEEE, VOL. 86, NO. 4, APRIL 1998
`Authorized licensed use limited to: Riva Laughlin. Downloaded on May 08,2023 at 15:24:24 UTC from IEEE Xplore. Restrictions apply.
`
`Ex.1030
`CISCO SYSTEMS, INC. / Page 2 of 24
`
`

`

`Fig. 3. Programming bit for (a) SRAM-based FPGA’s [145] and (b) a three-input lookup table.
`
`(a)
`
`(b)
`
`up, while their configuration can be changed electrically.
`However, the high voltages required to program the device
`often mean that they are not reprogrammed in the target
`system.
`than antifuses and EEP-
`SRAM cells are larger
`ROM/EPROM, meaning that SRAM-based FPGA’s will
`have fewer configuration points
`than FPGA’s using
`other
`programming
`technologies. However, SRAM-
`based FPGA’s have numerous advantages. Since they
`are easily reprogrammable,
`their configurations can be
`changed for bug fixes or upgrades. Thus, they provide
`an ideal prototyping medium. Also,
`these devices can
`be used in situations where they can expect
`to have
`numerous different configurations, such as multimode
`systems and reconfigurable computing machines. More
`details on such applications are included later in this
`paper. Because antifuse-based FPGA’s are only one-
`time programmable,
`they are generally not used in
`reprogrammable systems. EEPROM/EPROM devices could
`potentially be reprogrammed in-system, although in general
`this feature is not widely used. Thus,
`this paper will
`concentrate solely on SRAM-based FPGA’s.
`There are many different types of FPGA’s, with many
`different structures. Instead of discussing all of them here,
`which would be quite involved, this section will present
`two representative FPGA’s. Details on many others can
`be found elsewhere [21], [26], [75], [103], [112], [127].
`Note that reconfigurable systems can often employ non-
`FPGA reconfigurable elements; these will be described in
`Section V.
`In SRAM-based FPGA’s, memory cells are scattered
`throughout the FPGA. As shown in Fig. 3(a), a pair of
`cross-coupled inverters will sustain whatever value is pro-
`-transistor gate is provided
`grammed onto them. A single
`for either writing a value or reading a value back out.
`The ratio of sizes between the transistor and the upper
`-transistor
`inverter is set to allow values sent through the
`to overpower the inverter. The read-back feature is used
`during debugging to determine the current state of the
`system. The actual control of the FPGA is handled by the
`and
`outputs. One simple application of an SRAM
`terminal connected to the gate of
`bit is to have the
`-transistor. If a “1” is assigned to the programming
`an
`bit, the transistor is closed, and values can pass between
`the source and drain. If a “0” is assigned, the transistor
`is opened, and values cannot pass. Thus, this construct
`operates similarly to an antifuse, though it requires much
`
`Fig. 4. The Xilinx 4000 series FPGA structure [145]. Logic
`blocks are surrounded by horizontal and vertical routing channels.
`
`more area. One of the most useful SRAM-based structures
`programming
`is the lookup table (LUT). By connecting
`-input combinational
`bits to a multiplexer [Fig. 3(b)], any
`Boolean function can be implemented. Although it can
`,
`require a large number of programming bits for large
`LUT’s of up to five inputs can provide a flexible, powerful
`function implementation medium.
`One of the best known FPGA’s is the Xilinx logic
`cell array [126], [145]. In this section, we will describe
`their third-generation FPGA, the Xilinx 4000 series. The
`Xilinx array is an “Island-style” FPGA [127] with logic
`cells embedded in a general routing structure that permits
`arbitrary point-to-point communication (Fig. 4). The only
`requirement for good routing in this structure is that the
`source and destinations be relatively close together. Details
`of the routing structure are shown in Fig. 5. Each of the
`inputs of the cell (F1-F4, G1-G4, C1-C4, K) comes from
`one of a set of tracks adjacent to that cell. The outputs
`are similar (X, XQ, Y, YQ), except that they have the
`choice of both horizontal and vertical tracks. The routing
`structure is made up of three lengths of lines. Single-length
`lines travel the height of a single cell, where they then
`enter a switch matrix [Fig. 6(b)]. The switch matrix allows
`this signal to travel out vertically and/or horizontally from
`the switch matrix. Thus, multiple single-length lines can be
`cascaded together to travel longer distances. Double-length
`
`HAUCK: FPGA’S IN REPROGRAMMABLE SYSTEMS
`Authorized licensed use limited to: Riva Laughlin. Downloaded on May 08,2023 at 15:24:24 UTC from IEEE Xplore. Restrictions apply.
`
`617
`
`Ex.1030
`CISCO SYSTEMS, INC. / Page 3 of 24
`
`

`

`Fig. 5. Details of the Xilinx 4000 series routing structure [145]. The configurable logic blocks
`(CLB’s) are surrounded by vertical and horizontal routing channels containing single-length lines,
`double-length lines, and long lines. Empty diamonds represent programmable connections between
`perpendicular signal lines (signal lines on opposite sides of the diamonds are always connected).
`
`lines are similar, except that they travel the height of two
`cells before entering a switch matrix (notice that only half
`the double-length lines enter a switch matrix, and there is
`a twist in the middle of the line). Thus, double-length lines
`are useful for longer distance routing, traversing two cell
`heights without the extra delay and the wasted configuration
`sites of an intermediate switch matrix. Last, long lines are
`lines that go half the chip height and do not enter the
`switch matrix. In this way, routes of very long distance can
`be accommodated efficiently. With this rich sea of routing
`resources, the Xilinx 4000 series is able to handle fairly
`arbitrary routing demands, though mappings emphasizing
`local communication will still be handled more efficiently.
`As shown in Fig. 6(a), the Xilinx 4000 series logic cell
`is made up of three LUT’s, two programmable flip-flops,
`and multiple programmable multiplexers. The LUT’s allow
`arbitrary combinational functions of its inputs to be created.
`Thus, the structure shown can perform any function of five
`inputs (using all three LUT’s, with the F and G inputs
`identical), any two functions of four inputs (the two four-
`input LUT’s used independently), or some functions of up
`to nine inputs (using all three LUT’s, with the F and G
`inputs different). SRAM-controlled multiplexers then can
`route these signals out the X and Y outputs, as well as to
`the two flip-flops. The inputs at the top (C1-C4) provide
`enable and set or reset signals to the flip-flops, a direct
`connection to the flip-flop inputs, and the third input to
`the three-input LUT. This structure yields a very powerful
`method of implementing arbitrary, complex digital logic.
`
`Note that there are several additional features of the Xilinx
`FPGA not shown in these figures, including support for
`embedded memories and carry chains.
`While many SRAM-based FPGA’s are designed like
`the Xilinx architecture, with a routing structure optimized
`for arbitrary, long-distance communications, several other
`FPGA’s concentrate instead on local communication. The
`“cellular”-style FPGA’s [127] feature fast, local commu-
`nication resources at the expense of more global, long-
`distance signals. As shown in Fig. 7, the CLi FPGA [75]
`has an array of cells, with a limited number of routing
`resources running horizontally and vertically between the
`cells. There is one local communication bus on each side
`of the cell. It runs the height of eight cells, at which point
`it enters a repeater. Express buses are similar to local
`buses, except that there are no connections between the
`express buses and the cells. The repeaters allow access to
`the express buses. These repeaters can be programmed to
`connect together any of the two local buses and two express
`buses connected to it. Thus, limited global communication
`can be accomplished on the local and express buses, with
`the local buses allowing shorter distance communications
`and connections to the cells while express buses allow
`longer distance connections between local buses.
`While the local and global buses allow some of the
`flexibility of the Xilinx FPGA’s arbitrary routing structure,
`there are significantly fewer buses in the CLi FPGA than
`are present in the Xilinx FPGA. The CLi FPGA instead
`features a large number of local communication resources.
`
`618
`
`PROCEEDINGS OF THE IEEE, VOL. 86, NO. 4, APRIL 1998
`Authorized licensed use limited to: Riva Laughlin. Downloaded on May 08,2023 at 15:24:24 UTC from IEEE Xplore. Restrictions apply.
`
`Ex.1030
`CISCO SYSTEMS, INC. / Page 4 of 24
`
`

`

`Fig. 6. Details of the Xilinx (a) CLB and [(b), top] switchbox [145]. The multiplexers, LUT’s,
`and latches in the CLB are configured by SRAM bits. Diamonds in the switchbox represent six
`individual connections [(b), bottom], allowing any permutation of connections among the four
`signals incident to the diamond.
`
`(a)
`
`(b)
`
`As shown in Fig. 8, each cell receives two signals from each
`of its four neighbors. It then sends the same two outputs
`(A and B) to all of its neighbors. That is, the cell one to
`the north will send signals AN and BN, and the cell one
`to the south will send AS and BS, while both will receive
`the same signals A and B. The input signals become the
`inputs to the logic cell (Fig. 9).
`Instead of Xilinx’s LUT’s, which require many program-
`ming bits per cell, the CLi logic block is much simpler.
`It has multiplexers controlled by SRAM bits, which select
`one each of the A and B outputs of the neighboring cells.
`These are then fed into AND and XOR gates within the cell,
`as well as into a flip-flop. Although the possible functions
`are complex, notice that there is a path leading to the B
`output that produces the NAND of the selected A and B
`inputs, sending it out the B output. This path is enabled
`by setting the two 2 : 1 multiplexers to their constant input
`and setting B’s output multiplexer to the third input from
`the top. Thus, the cell is functionally complete. Also, with
`the XOR path leading to output A, the cell can efficiently
`implement a half-adder. The cell can perform a pure routing
`function by connecting one of the A inputs to the A output
`and one of the B inputs to the B output, or vice-versa.
`This routing function is created by setting the two 2 : 1
`multiplexers to their constant inputs and setting A’s and B’s
`output multiplexer to either of their top two inputs. There
`are also provisions for bringing in or sending out a signal
`on one or more of the neighboring local buses (NS1, NS2,
`EW1, EW2). Note that since there is only a single wire
`connecting the bus terminals, there can only be a single
`signal sent to or received from the local buses. If more
`than one of the buses is connected to the cell, they will be
`coupled together. Thus, the cell can take a signal running
`
`horizontally on an EW local bus and send it vertically on
`an NS local bus without using up the cell’s logic unit. By
`bringing a signal in from the local buses, however, the cell
`can implement two three-input functions.
`The major differences between the Island-style archi-
`tecture of the Xilinx 4000 series and the cellular style
`of the CLi FPGA is in their routing structure and cell
`granularity. The Xilinx 4000 series is optimized for com-
`plex, irregular random logic. It features a powerful routing
`structure optimized for arbitrary global routing and large
`logic cells capable of providing arbitrary four- and five-
`input functions. This provides a very flexible architecture,
`though one that requires many programming bits per cell
`(and thus cells that take up a large portion of the chip area).
`In contrast, the CLi architecture is optimized for highly
`local, pipelined circuits such as systolic arrays and bit-serial
`arithmetic. Thus, it emphasizes local communication at the
`expense of global routing and has simple cells. Because
`of the very simple logic cells, there will be many more
`CLi cells on a chip than will be found in the Xilinx FPGA,
`yielding a greater logic capacity for those circuits that match
`the FPGA’s structure. Because of the restricted routing,
`the CLi FPGA is much harder to map to automatically
`than the Xilinx 4000 series, though the simplicity of the
`CLi architecture makes it easier for a human designer to
`hand-map to the CLi’s structure. Thus, in general, cellular
`architectures tend to appeal to designers with appropriate
`circuit structures who are willing to spend the effort to
`hand-map their circuits to the FPGA, while the Xilinx 4000
`series is more appropriate for handling random-logic tasks
`and automatically mapped circuits.
`Compared with technologies such as full-custom standard
`cells and MPGA’s, FPGA’s will in general be slower and
`
`HAUCK: FPGA’S IN REPROGRAMMABLE SYSTEMS
`Authorized licensed use limited to: Riva Laughlin. Downloaded on May 08,2023 at 15:24:24 UTC from IEEE Xplore. Restrictions apply.
`
`619
`
`Ex.1030
`CISCO SYSTEMS, INC. / Page 5 of 24
`
`

`

`Fig. 7. The CLi6000 routing architecture [75]. One 8  8 tile, plus a few surrounding rows and
`columns, is shown. The full array has many of these tiles abutted horizontally and vertically.
`
`less dense due to the configuration points, which take
`up significant chip area, and add extra capacitance and
`resistance (and thus delay) to the signal lines. Thus, the
`programming bits add an unavoidable overhead to the
`circuit, which can be reduced by limiting the configurability
`of the FPGA but never totally eliminated. Also, since the
`metal layers in an FPGA are prefabricated, while the other
`technologies custom fabricate the metal layers for a given
`circuit, the FPGA will have less optimized routing. This
`again results in slower and larger circuits. Even given
`these downsides, FPGA’s have the advantage that they are
`completely prefabricated. This means that they are ready
`to use instantly, while mask programmed technologies can
`require weeks to be customized. Also, since there is no
`custom fabrication involved in an FPGA, the fabrication
`costs can be amortized over all the users of the architecture,
`removing the significant nonrecurring engineering costs of
`other technologies. However, per-chip costs will in general
`be higher, making the technology better suited for low-
`volume applications. Also, since SRAM-based FPGA’s
`are reprogrammable, they are ideal for prototyping, since
`the chips are reusable after bug fixes or upgrades, where
`mask programmed and antifuse versions would have to be
`discarded.
`A technology similar to SRAM-based FPGA’s is FPIC’s
`[8] and field programmable interconnect devices (FPID’s)
`
`[72] (we will use FPIC from now on to refer to both FPIC’s
`and FPID’s). Like an SRAM-based FPGA, an FPIC is a
`completely prefabricated device with an SRAM-configured
`routing structure (Fig. 10). Unlike an FPGA, an FPIC has
`no logic capacity. Thus, the only use for an FPIC is as
`a device to interconnect
`its I/O pins arbitrarily. While
`this is not generally useful for production systems, since
`a fixed interconnection pattern can be achieved by the
`printed circuit board that holds a circuit, it can be quite
`useful in prototyping and reconfigurable computing (these
`applications are discussed later in this paper). In each of
`these cases, the connections between the chips in the system
`may need to be reprogrammable, or this connection pattern
`may change over time. In a reconfigurable computer, many
`different mappings will be loaded onto the system, and each
`of them may desire a different interconnection pattern. In
`prototyping, the connections between chips may need to
`be changed over time for bug fixes and logic upgrades.
`In either case, by routing all of the I/O pins of the logic-
`bearing chips to FPIC’s, the interconnection pattern can
`easily be changed over time. Thus, fixed routing patterns
`can be avoided, potentially increasing the performance and
`capacity of the prototyping or reconfigurable computing
`machine.
`the economic viability
`There is some question about
`of FPIC’s. The problem is that they must provide some
`
`620
`
`PROCEEDINGS OF THE IEEE, VOL. 86, NO. 4, APRIL 1998
`Authorized licensed use limited to: Riva Laughlin. Downloaded on May 08,2023 at 15:24:24 UTC from IEEE Xplore. Restrictions apply.
`
`Ex.1030
`CISCO SYSTEMS, INC. / Page 6 of 24
`
`

`

`Fig. 8. Details of the CLi routing architecture [75].
`
`Fig. 9. The CLi logic cell [75].
`
`advantage over an FPGA with the same I/O capacity, since
`in general an FPGA can perform the same role as the FPIC.
`One possibility is providing significantly more I/O pins in
`an FPIC than are available in an FPGA. This can be a
`major advantage, since it takes many smaller I/O chips to
`match the communication resources of a single high-I/O
`I/Os requires three chips with
`chip (i.e., a chip with
`two-thirds the I/O’s to match the flexibility). Because the
`packaging technology necessary for such high I/O chips is
`somewhat exotic, however, high-I/O FPIC’s can be expen-
`sive. Another possibility is to provide higher performance
`or smaller chip size with the same I/O capacity. Since there
`is no logic on the chip, the space and capacitance due to the
`logic can be removed. Even with these possible advantages,
`however, FPIC’s face the significant disadvantage that they
`are restricted to a limited application domain. Specifically,
`while FPGA’s can be used for prototyping, reconfigurable
`computing, low-volume products, fast time-to-market sys-
`
`tems, and multimode systems, FPIC’s are restricted to the
`interconnection portion of prototyping and reconfigurable
`computing solutions. Thus, FPIC’s may never become
`commodity parts, greatly increasing their unit cost.
`
`III. REPROGRAMMABLE LOGIC APPLICATIONS
`With the development of FPGA’s, there are now oppor-
`tunities for implementing quite different systems than were
`possible with other technologies. In this section, we will
`discuss many of these new opportunities, especially those
`of multi-FPGA systems.
`When FPGA’s were first introduced they were primarily
`considered to be just another form of gate array. While they
`had lower speed and capacity and a higher unit cost, they
`did not have the large startup costs and lead times necessary
`for MPGA’s. Thus, they could be used for implementing
`random logic and glue logic in low-volume systems with
`nonaggressive speed and capacity demands. If the capacity
`of a single FPGA was not enough to handle the desired
`computation, multiple FPGA’s could be included on the
`board, distributing the computation among these chips.
`FPGA’s are more than just slow, small gate arrays.
`The critical feature of (SRAM-based) FPGA’s is their in-
`circuit reprogrammability. Since their programming can be
`changed quickly, without any rewiring or refabrication,
`they can be used in a much more flexible manner than
`standard gate arrays. One example of this is multimode
`hardware. For example, when designing a digital
`tape
`recorder with error-correcting codes, one way to implement
`such a system is to have separate code-generation and
`code-checking hardware built into the tape machine. There
`is no reason to have both of these functions available
`simultaneously, however, since when reading from the
`tape there is no need to generate new codes, and when
`writing to the tape the code-checking hardware will be idle.
`
`HAUCK: FPGA’S IN REPROGRAMMABLE SYSTEMS
`Authorized licensed use limited to: Riva Laughlin. Downloaded on May 08,2023 at 15:24:24 UTC from IEEE Xplore. Restrictions apply.
`
`621
`
`Ex.1030
`CISCO SYSTEMS, INC. / Page 7 of 24
`
`

`

`Fig. 10. The Aptix FPIC architecture [8]. The boxed “P” indicates an I/O pin.
`
`Thus, we can have an FPGA in the system and have two
`different configurations stored in ROM, one for reading and
`one for writing. In this way, a single piece of hardware
`handles multiple computations. There have been several
`multiconfiguration systems built from FPGA’s, including
`the just-mentioned tape machine, generic printer, and CCD
`camera interfaces, pivoting monitors with landscape and
`portrait configurations, and others [43], [96], [120], [144].
`While the previous uses of FPGA’s still treat these chips
`purely as methods for implementing digital logic, there are
`other applications where this is not the case. A system
`of FPGA’s can be seen as a computing substrate with
`somewhat different properties than standard microproces-
`sors. The reprogrammability of the FPGA’s allows one
`to download algorithms onto the FPGA’s and to change
`these algorithms just as general-purpose computers can
`change programs. This computing substrate is different
`from standard processors in that it provides a huge amount
`of fine-grain parallelism, since there are many logic blocks
`on the chips, and the instructions are quite simple, on
`the order of a single five-bit-input, one-bit-output function.
`Also, while the instruction stream of a microprocessor can
`be arbitrarily complex, with the function computed by the
`
`logic changing on a cycle-by-cycle basis, the program-
`ming of an FPGA is in general held constant throughout
`the execution of the mapping (exceptions to this include
`techniques of run-time reconfigurability described below).
`Thus,
`to achieve a variety of different functions in a
`mapping, a microprocessor does this temporally, with dif-
`ferent functions executed during different cycles, while an
`FPGA-based computing machine achieves variety spatially,
`having different logic elements compute different functions.
`This means that microprocessors are superior for complex
`control flow and irregular computations, while an FPGA-
`based computing machine can be superior for data-parallel
`applications, where a huge quantity of data must be acted on
`in a very similar manner. Note that there is work being done
`on trying to bridge this gap and develop FPGA-processor
`hybrids that can achieve both spatial and limited temporal
`function variation [18], [34], [35], [88], [95], [97].
`There have been several computing applications where
`a multi-FPGA system has delivered the highest perfor-
`mance implementation. An early example is genetic string
`matching on the Splash machine [50]. Here, a linear array
`of Xilinx 3000 series FPGA’s was used to implement a
`systolic algorithm to determine the edit distance between
`
`622
`
`PROCEEDINGS OF THE IEEE, VOL. 86, NO. 4, APRIL 1998
`Authorized licensed use limited to: Riva Laughlin. Downloaded on May 08,2023 at 15:24:24 UTC from IEEE Xplore. Restrictions apply.
`
`Ex.1030
`CISCO SYSTEMS, INC. / Page 8 of 24
`
`

`

`two strings. The edit distance is the minimum number of in-
`sertions and deletions necessary to transform one string into
`another, so the strings “flea” and “fleet” would have an edit
`distance of three (delete “a” and insert “et” to go from “flea”
`to “

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket