`P A P E R
`
`Three Ages of FPGAs: A
`Retrospective on the First Thirty
`Years of FPGA Technology
`
`This paper reflects on how Moore’s Law has driven the design of FPGAs through
`three epochs: the age of invention, the age of expansion, and the age of accumulation.
`
`By Stephen M. (Steve) Trimberger, Fellow IEEE
`
`ABSTRACT | Since their introduction, field programmable gate
`arrays (FPGAs) have grown in capacity by more than a factor of
`10 000 and in performance by a factor of 100. Cost and energy
`per operation have both decreased by more than a factor of
`1000. These advances have been fueled by process technology
`scaling, but the FPGA story is much more complex than simple
`technology scaling. Quantitative effects of Moore’s Law have
`driven qualitative changes in FPGA architecture, applications
`and tools. As a consequence, FPGAs have passed through sev-
`eral distinct phases of development. These phases, termed
`‘‘Ages’’ in this paper, are The Age of Invention, The Age of
`Expansion and The Age of Accumulation. This paper summa-
`rizes each and discusses their driving pressures and funda-
`mental characteristics. The paper concludes with a vision of the
`upcoming Age of FPGAs.
`
`KEYWORDS | Application-specific integrated circuit (ASIC);
`commercialization; economies of scale; field-programmable
`gate array (FPGA); industrial economics; Moore’s Law; pro-
`grammable logic
`
`Fig. 1. Xilinx FPGA attributes relative to 1988. Capacity is logic cell
`count. Speed is same-function performance in programmable fabric.
`Price is per logic cell. Power is per logic cell. Price and power are scaled
`up by 10 000. Data: Xilinx published data.
`
`I . I N T R O D U C T I O N
`
`Xilinx introduced the first field programmable gate arrays
`(FPGAs) in 1984, though they were not called FPGAs until
`Actel popularized the term around 1988. Over the ensuing
`30 years, the device we call an FPGA increased in capacity
`by more than a factor of 10 000 and increased in speed by a
`factor of 100. Cost and energy consumption per unit func-
`tion decreased by more than a factor of 1000 (see Fig. 1).
`
`These advancements have been driven largely by process
`technology, and it is tempting to perceive the evolution of
`FPGAs as a simple progression of capacity, following semi-
`conductor scaling. This perception is too simple. The real
`story of FPGA progress is much more interesting.
`Since their introduction, FPGA devices have pro-
`gressed through several distinct phases of development.
`Each phase was driven by both process technology oppor-
`tunity and application demand. These driving pressures
`caused observable changes in the device characteristics
`and tools. In this paper, I review three phases I call the
`‘‘Ages’’ of FPGAs. Each age is eight years long and each
`became apparent only in retrospect. The three ages are:
`1) Age of Invention 1984–1991;
`Digital Object Identifier: 10.1109/JPROC.2015.2392104
`0018-9219 Ó 2015 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/
`redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
`318 Proceedings of the IEEE | Vol. 103, No. 3, March 2015
`
`Manuscript received September 18, 2014; revised November 21, 2014 and
`December 11, 2014; accepted December 23, 2014. Date of current version April 14, 2015.
`The author is with Xilinx, San Jose, CA 95124 USA (e-mail:
`steve.trimberger@xilinx.com).
`
`IPR2018-01600
`
`EXHIBIT
`2062
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2076, p. 1
`
`
`
`Trimberger: Three Ages of FPGAs
`
`The disadvantage of the FPGA per-unit cost premium
`over ASIC diminished over time as NRE costs became a
`larger fraction of the total cost of ownership of ASIC. The
`dashed lines in Fig. 2 indicate the total cost at some process
`node. The solid lines depict the situation at the next process
`node, with increased NRE cost, but lower cost per chip. Both
`FPGA and ASIC took advantage of lower cost manufacturing,
`while ASIC NRE charges continued to climb, pushing the
`crossover point higher. Eventually, the crossover point grew
`so high that for the majority of customers, the number of
`units no longer justified an ASIC. Custom silicon was war-
`ranted only for very high performance or very high volume;
`all others could use a programmable solution.
`This insight, that Moore’s Law [33] would eventually
`propel FPGA capability to cover ASIC requirements, was a
`fundamental early insight in the programmable logic busi-
`ness. Today, device cost is less of a driver in the FPGA
`versus ASIC decision than performance, time-to-market,
`power consumption, I/O capacity and other capabilities.
`Many ASIC customers use older process technology,
`lowering their NRE cost, but reducing the per-chip cost
`advantage.
`Not only did FPGAs eliminate the up-front masking
`charges and reduce inventory costs, but they also reduced
`design costs by eliminating whole classes of design prob-
`lems. These design problems included transistor-level de-
`sign, testing, signal integrity, crosstalk, I/O design and
`clock distribution.
`As important as low up-front cost and simpler design
`were, the major FPGA advantages were instantly availabi-
`lity and reduced visibility of a failure. Despite extensive
`simulation, ASICs rarely seemed to be correct the first
`time. With wafer-fabrication turnaround times in the
`weeks or months, silicon re-spins impacted schedules sig-
`nificantly, and as masking costs rose, silicon re-spins were
`noticeable to ever-rising levels in the company. The high
`cost of error demanded extensive chip verification. Since
`an FPGA can be reworked in minutes, FPGA designs in-
`curred no weeks-long delay for an error. As a result, veri-
`fication need not be as thorough. ‘‘Self-emulation,’’ known
`colloquially as ‘‘download-it-and-try-it,’’ could replace ex-
`tensive simulation.
`Finally, there was the ASIC production risk: an ASIC
`company made money only when their customer’s design
`went into production. In the 1980s, because of changing
`requirements during the development process, product
`failures or outright design errors, only about one-third of
`all designs actually went to production. Two-thirds of de-
`signs lost money. The losses were incurred not only by the
`ASIC customers, but also by the ASIC suppliers, whose
`NRE charges rarely covered their actual costs and never
`covered the cost of lost opportunity in their rapidly depre-
`ciating manufacturing facilities. On the other hand,
`programmable-logic companies and customers could still
`make money on small volume, and a small error could be
`corrected quickly, without costly mask-making.
`
`Vol. 103, No. 3, March 2015 | Proceedings of the IEEE 319
`
`Fig. 2. FPGA versus ASIC Crossover Point. Graph shows total cost
`versus number of units. FPGA lines are darker and start at the lower
`left corner. With the adoption of the next process node (arrows
`from the earlier node in dashed lines to later node in solid lines),
`the crossover point, indicated by the vertical dotted line, grew larger.
`
`2) Age of Expansion 1992–1999;
`3) Age of Accumulation 2000–2007.
`
`I I . P R E A M B L E : W H A T W A S T H E
`B I G D E AL AB O U T F PG A s ?
`
`A. FPGA Versus ASIC
`In the 1980s, Application-Specific Integrated Circuit
`(ASIC) companies brought an amazing product to the
`electronics market: the built-to-order custom integrated
`circuit. By the mid-1980s, dozens of companies were sell-
`ing ASICs, and in the fierce competition, the winning at-
`tributes were low cost, high capacity and high speed. When
`FPGAs appeared, they compared poorly on all of these
`measures, yet they thrived. Why?
`The ASIC functionality was determined by custom mask
`tooling. ASIC customers paid for those masks with an up-
`front non-recurring engineering (NRE) charge. Because
`they had no custom tooling, FPGAs reduced the up-front
`cost and risk of building custom digital logic. By making
`one custom silicon device that could be used by hundreds or
`thousands of customers,
`the FPGA vendor effectively
`amortized the NRE costs over all customers, resulting in
`no NRE charge for any one customer, while increasing the
`per-unit chip cost for all.
`The up-front NRE cost ensured that FPGAs were more
`cost effective than ASICs at some volume [38]. FPGA
`vendors touted this in their ‘‘crossover point,’’ the number
`of units that justified the higher NRE expense of an ASIC.
`In Fig. 2, the graphed lines show the total cost for a number
`units purchased. An ASIC has an initial cost for the NRE,
`and each subsequent unit adds its unit cost to the total. An
`FPGA has no NRE charge, but each unit costs more than the
`functionally equivalent ASIC, hence the steeper line. The
`two lines meet at the crossover point. If fewer than that
`number of units is required, the FPGA solution is cheaper;
`more than that number of units indicates the ASIC has
`lower overall cost.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2076, p. 2
`
`
`
`Trimberger: Three Ages of FPGAs
`
`Fig. 3. Generic PAL architecture.
`
`B. FPGA Versus PAL
`Programmable logic was well established before the
`FPGA. EPROM-programmed Programmable Array Logic
`(PAL) had carved out a market niche in the early 1980s.
`However, FPGAs had an architectural advantage. To un-
`derstand the FPGA advantage, we first look at the simple
`programmable logic structures of these early 1980s de-
`vices. A PAL device, as depicted in Fig. 3, consists of a two-
`level logic structure [6], [38]. Inputs are shown entering at
`the bottom. On the left side, a programmable and array
`generates product terms, ands of any combination of the
`inputs and their inverses. A fixed or gate in the block at
`the right completes the combinational logic function of the
`macrocell’s product terms. Every macrocell output is an
`output of the chip. An optional register in the macrocell
`and feedback to the input of the and array enable a very
`flexible state machine implementation.
`Not every function could be implemented in one pass
`through the PAL’s macrocell array, but nearly all common
`functions could be, and those that could not were realized
`in two passes through the array. The delay through the PAL
`array is the same regardless of the function performed or
`where it is located in the array. PALs had simple fitting
`software that mapped logic quickly to arbitrary locations in
`the array with no performance concerns. PAL fitting soft-
`ware was available from independent EDA vendors,
`allowing IC manufacturers to easily add PALs to their
`product line.
`PALs were very efficient from a manufacturing point of
`view. The PAL structure is very similar to an EPROM
`memory array, in which transistors are packed densely to
`yield an efficient implementation. PALs were sufficiently
`similar to memories that many memory manufacturers
`were able to expand their product line with PALs. When
`the cyclical memory business faltered, memory manufac-
`turers entered the programmable logic business.
`The architectural issue with PALs is evident when one
`considers scaling. The number of programmable points in
`
`320 Proceedings of the IEEE | Vol. 103, No. 3, March 2015
`
`the and array grows with the square of the number of
`inputs (more precisely, inputs times product terms). Pro-
`cess scaling delivers more transistors with the square of the
`shrink factor. However, the quadratic increase in the and
`array limits PALs to grow logic only linearly with the
`shrink factor. PAL input and product-term lines are also
`heavily loaded, so delay grows rapidly as size increases. A
`PAL, like any memory of this type, has word lines and bit
`lines that span the entire die. With every generation, the
`ratio of the drive of the programmed transistor to the
`loading decreased. More inputs or product terms increased
`loading on those lines. Increasing transistor size to lower
`resistance also raised total capacitance. To maintain speed,
`power consumption rose dramatically. Large PALs were
`impractical in both area and performance. In response, in
`the 1980s, Altera pioneered the Complex Programmable
`Logic Device (CPLD), composed of several PAL-type blocks
`with smaller crossbar connections among them. But FPGAs
`had a more scalable solution.
`The FPGA innovation was the elimination of the and
`array that provided the programmability. Instead, config-
`uration memory cells were distributed around the array to
`control functionality and wiring. This change gave up the
`memory-array-like efficiency of the PAL structure in favor
`of architectural scalability. The architecture of the FPGA,
`shown in Fig. 4, consists of an array of programmable logic
`blocks and interconnect with field-programmable switches.
`The capacity and performance of the FPGA were no longer
`limited by the quadratic growth and wiring layout of the
`and array. Not every function was an output of the chip, so
`
`Fig. 4. Generic array FPGA architecture. 4 4 array with three wiring
`tracks per row and column. Switches are at the circles at intersections.
`Device inputs and outputs are distributed around the array.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2076, p. 3
`
`
`
`•
`
`•
`
`capacity could grow with Moore’s Law. The consequences
`were great.
`•
`FPGA architecture could look nothing like a mem-
`ory. Design and manufacturing were very different
`than memory.
`The logic blocks were smaller. There was no gua-
`rantee that a single function would fit into one.
`Therefore, it was difficult to determine ahead of
`time how much logic would fit into the FPGA.
`The performance of the FPGA depended on where
`the logic was placed in the FPGA. FPGAs required
`placement and routing, so the performance of the
`finished design was not easy to predict in advance.
`• Complex EDA software was required to fit a design
`into an FPGA.
`With the elimination of the and-array, FPGA architects
`had the freedom to build any logic block and any inter-
`connect pattern. FPGA architects could define whole new
`logic implementation models, not based on transistors or
`gates, but on custom function units. Delay models need
`not be based on metal wires, but on nodes and switches.
`This architectural freedom ushered in the first Age of
`FPGAs, the Age of Invention.
`
`I I I . A G E O F I N V E N T I O N 1 98 4 – 1 9 9 1
`
`The first FPGA, the Xilinx XC2064, contained only 64 logic
`blocks, each of which held two three-input Look-Up Tables
`(LUTs) and one register [8]. By today’s counting, this would
`be about 64 logic cells, less than 1000 gates. Despite its
`small capacity, it was a very large dieVlarger than the
`commercial microprocessors of the day. The 2.5-micron
`process technology used for the XC2064 was barely able to
`yield it. In those early years, cost containment was critical
`to the success of FPGAs.
`‘‘Cost containment was critical to the success of FPGAs.’’
`A modern reader will accept that statement as some kind of
`simplistic statement of the obvious, but this interpretation
`seriously underemphasizes the issue. Die size and cost per
`function were crushingly vital. The XC2064, with only
`64 user-accessible flip-flops, cost hundreds of dollars because
`it was such a large die. Since yield (and hence, cost) is super-
`linear for large die, a 5% increase in die size could have
`doubled the cost or, worse, yield could have dropped to zero
`leaving the startup company with no product whatsoever.
`Cost containment was not a question of mere optimization; it
`was a question of whether or not the product would exist. It
`was a question of corporate life or death. In those early years,
`cost containment was critical to the success of FPGAs.
`As a result of cost pressure, FPGA architects used their
`newfound freedom to maximize the efficiency of the
`FPGA, turning to any advantage in process technology and
`architecture. Although static memory-based FPGAs were
`re-programmable, they required an external PROM to
`store the programming when power was off. Reprogramm-
`ability was not considered to be an asset, and Xilinx
`
`Trimberger: Three Ages of FPGAs
`
`downplayed it to avoid customer concerns about what
`happened to their logic when power was removed. And
`memory dominated the die area.
`Antifuse devices promised the elimination of the second
`die and elimination of the area penalty of memory-cell
`storage, but at the expense of one-time programmability. The
`early antifuse was a single transistor structure; the memory
`cell switch was six transistors. The area savings of antifuses
`over memory cells was inescapable. Actel
`invented the
`antifuse and brought it to market [17], and in 1990 the largest
`capacity FPGA was the Actel 1280. Quicklogic and Cross-
`point followed Actel and also developed devices based on the
`advantages of the antifuse process technology.
`In the 1980s, Xilinx’s four-input LUT-based architec-
`tures were considered ‘‘coarse-grained’’. Four-input func-
`tions were observed as a ‘‘sweet spot’’ in logic designs, but
`analysis of netlists showed that many LUT configurations
`were unused. Further, many LUTs had unused inputs,
`wasting precious area. Seeking to improve efficiency,
`FPGA architects looked to eliminate waste in the logic
`block. Several companies implemented finer-grained ar-
`chitectures containing fixed functions to eliminate the
`logic cell waste. The Algotronix CAL used a fixed-MUX
`function implementation for a two-input LUT [24]. Con-
`current (later Atmel) and their licensee, IBM, used a
`small-cell variant that included two-input nand and xor
`gates and a register in the CL devices. Pilkington based
`their architecture on a single nand gate as the logic block
`[23], [34]. They licensed Plessey (ERA family), Toshiba
`(TC family) and Motorola (MPA family) to use their nand-
`cell-based, SRAM-programmed device. The extreme of
`fine-grained architecture was the Crosspoint CLi FPGA, in
`which individual
`transistors were connected to one
`another with antifuse-programmable connections [31].
`Early FPGA architects noted that an efficient inter-
`connect architecture should observe the two-dimensionality
`of the integrated circuit. The long, slow wires of PALs were
`replaced by short connections between adjacent blocks that
`could be strung together as needed by programming to form
`longer routing paths. Initially, simple pass transistors steered
`signals through the interconnect segments to adjacent
`blocks. Wiring was efficient because there were no unused
`fractions of wires. These optimizations greatly shrank the
`interconnect area and made FPGAs possible. At the same
`time,
`though,
`they increased signal delay and delay
`uncertainty through FPGA wiring due to large capacitances
`and distributed series resistances through the pass transistor
`switch network. Since interconnect wires and switches
`added size, but not (billable) logic, FPGA architects were
`reluctant to add much. Early FPGAs were notoriously
`difficult to use because they were starved for interconnect.
`
`I V . A G E O F I N V E NT I ON I N R E T R O S P E C T
`
`In the Age of Invention, FPGAs were small, so the design
`problem was small. Though they were desirable, synthesis
`
`Vol. 103, No. 3, March 2015 | Proceedings of the IEEE 321
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2076, p. 4
`
`
`
`Trimberger: Three Ages of FPGAs
`
`and even automated placement and routing were not
`considered essential. Many deemed it impractical even to
`attempt design automation on the personal computers of
`the time, since ASIC placement and routing was being
`done on mainframe computers. Manual design, both
`logical and physical, was acceptable because of the small
`problem size. Manual design was often necessary because
`of the limited routing resources on the chips [41].
`Radically different architectures precluded universal
`FPGA design tools, as were available in the ASIC business.
`FPGA vendors took on the added burden of EDA devel-
`opment for their devices. This was eventually recognized
`as an advantage, as FPGA vendors experimented and im-
`proved their architectures. The PAL vendors of the pre-
`vious decade had relied upon external tool vendors to
`provide software for mapping designs into their PALs. As a
`result, PAL vendors were restricted to those architectures
`the tool vendors supported, leading to commoditization,
`low margins and lack of innovation. PLD architecture was
`stifled while FPGA architecture flourished.
`A further advantage of captive software development
`was that FPGA customers were not required to purchase
`tools from a third-party EDA company, which would have
`increased their NRE costs. As they did with NRE charges,
`FPGA vendors amortized their tool development costs into
`their silicon pricing, keeping the up-front cost of using
`their devices very low. EDA companies were not much
`interested in FPGA tools anyway with their fragmented
`market, low volume, low selling price, and requirement to
`run on underpowered computers.
`In the Age of Invention, FPGAs were much smaller than
`the applications that users wanted to put into them. As a
`result, multiple-FPGA systems [1], [42] became popular, and
`automated multi-chip partitioning software was identified as
`an important component of an FPGA design suite [36], even
`as automatic placement and routing were not.
`
`V. I N T E RL U D E : S H A K E O U T I N
`FP G A B US I N E S S
`
`The Age of Invention ended with brutal attrition in the FPGA
`business. A modern reader may not recognize most of the
`companies or product names in Section III and in the FPGA
`genealogical tree in Fig. 5 [6], [38]. Many of the companies
`simply vanished. Others quietly sold their assets as they
`exited the FPGA business. The cause of this attrition was
`more than the normal market dynamics. There were impor-
`tant changes in the technology, and those companies that did
`not take advantage of the changes could not compete. Quan-
`titative changes due to Moore’s Law resulted in qualitative
`changes in the FPGAs built with semiconductor technology.
`These changes characterized the Age of Expansion.
`
`VI . A G E O F E XP A N S I O N 1 9 9 2 – 1 99 9
`
`Fig. 5. FPGA architecture genealogical tree, ca. 2000. All trademarks
`are the property of their respective owners.
`
`Pioneering the fabless business model, FPGA startup com-
`panies typically could not obtain leading-edge silicon tech-
`nology in the early 1990s. As a result, FPGAs began the Age
`of Expansion lagging the process introduction curve. In the
`1990s, they became process leaders as the foundries realized
`the value of using the FPGA as a process-driver application.
`Foundries were able to build SRAM FPGAs as soon as they
`were able to yield transistors and wires in a new technology.
`FPGA vendors sold their huge devices while foundries
`refined their processes. Each new generation of silicon
`doubled the number of transistors available, which doubled
`the size of the largest possible FPGA and halved the cost per
`function. More import than simple transistor scaling, the
`introduction of chemical-mechanical polishing (CMP)
`permitted foundries to stack more metal layers. Valuable
`
`Through the 1990s, Moore’s Law continued its rapid pace of
`improvement, doubling transistor count every two years.
`
`Fig. 6. Growth of FPGA LUTs and interconnect wires Wire length is
`measured in millions of transistor pitches.
`
`322 Proceedings of the IEEE | Vol. 103, No. 3, March 2015
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2076, p. 5
`
`
`
`for ASICs, this capability was explosive for FPGAs because
`the cost of valuable (nonbillable) interconnect dropped even
`faster than the cost of transistors, and FPGA vendors
`aggressively increased the interconnect on their devices to
`accommodate the larger capacity (see Fig. 6).
`This rapid process improvement had several effects
`which we now examine.
`
`A. Area Became Less Precious
`No one who joined the FPGA industry in the mid-1990s
`would agree that cost was unimportant or area was not
`precious. However, those who had experienced the ago-
`nies of product development in the 1980s certainly saw the
`difference. In the 1980s, transistor efficiency was neces-
`sary in order to deliver any product whatsoever. In the
`1990s, it was merely a matter of product definition. Area
`was still important, but now it could be traded off for
`performance, features and ease-of-use. The resulting de-
`vices were less silicon efficient. This was unthinkable in
`the Age of Invention just a few years earlier.
`
`B. Design Automation Became Essential
`In the Age of Expansion, FPGA device capacity in-
`creased rapidly as costs fell. FPGA applications became too
`large for manual design. In 1992, the flagship Xilinx
`XC4010 delivered a (claimed) maximum of 10 000 gates.
`By 1999, the Virtex XCV1000 was rated at a million. In the
`early 1990s, at the start of the Age of Expansion, automatic
`placement and routing was preferred, but not entirely
`trusted. By the end of the 1990s, automated synthesis [9],
`placement and routing [3], [4], [19], [32], [37] were required
`steps in the design process. Without the automation, the
`design effort would be simply too great. The life of an FPGA
`company was now dependent upon the ability of design
`automation tools to target the device. Those FPGA compa-
`nies that controlled their software controlled their future.
`Cheaper metal from process scaling led to more prog-
`rammable interconnect wire, so that automated placement
`tools could succeed with a less-precise placement.
`Automated design tools required automation-friendly
`architectures, architectures with regular and plentiful
`interconnect resources to simplify algorithmic decision-
`making. Cheaper wire also admitted longer wire segmen-
`tation,
`interconnect wires that spanned multiple logic
`blocks [2], [28], [44]. Wires spanning many blocks effec-
`tively make physically distant logic logically closer, im-
`proving performance. The graph in Fig. 7 shows large
`performance gains from a combination of process technol-
`ogy and interconnect reach. Process scaling alone would
`have brought down the curve, but retained the shape;
`longer segmentation flattened the curve. The longer seg-
`mented interconnect simplified placement because with
`longer interconnect, it was not as essential to place two
`blocks in exactly the right alignment needed to connect
`them with a high performance path. The placer can do a
`sloppier job and still achieve good results.
`
`Trimberger: Three Ages of FPGAs
`
`Fig. 7. Performance scaling with longer wire length segmentation.
`
`On the down side, when the entire length of the wire
`segment is not used, parts of the metal trace are effectively
`wasted. Many silicon-efficient Age of Invention architec-
`tures were predicated on wiring efficiency, featuring short
`wires that eliminated waste. Often, they rigidly followed
`the two-dimensional limitation of the physical silicon giv-
`ing those FPGAs the label ‘‘cellular.’’ In the Age of Expan-
`sion, longer wire segmentation was possible because the
`cost of wasted metal was now acceptable. Architectures
`dominated by nearest-neighbor-only connections could
`not match the performance or ease-of-automation of archi-
`tectures that took advantage of longer wire segmentation.
`A similar efficiency shift applied to logic blocks. In the
`Age of Invention, small, simple logic blocks were attractive
`because their logic delay was short and because they
`wasted little when unused or partially used. Half of the
`configuration memory cells in a four-input LUT are wasted
`when a three-input function is instantiated in it. Clever
`designers could manually map complex logic structures
`efficiently into a minimum number of fine-grained logic
`blocks, but automated tools were not as successful. For
`larger functions, the need to connect several small blocks
`put greater demand on the interconnect. In the Age of
`Expansion, not only were there more logic blocks, but the
`blocks themselves became more complex.
`Many Age of Invention architectures, built for effi-
`ciency with irregular logic blocks and sparse interconnect,
`were difficult to place and route automatically. During the
`Age of Invention, this was not a serious issue because the
`devices were small enough that manual design was practi-
`cal. But excessive area efficiency was fatal to many devices
`and companies in the Age of Expansion. Fine-grained
`architectures based on minimizing logic waste (such as the
`Pilkington nand-gate block, the Algotronix/Xilinx 6200
`Mux-based 2LUT block, the Crosspoint transistor-block)
`simply died. Architectures that achieved their efficiency by
`starving the interconnect also died. These included all
`nearest-neighbor grid-based architectures. The Age of Ex-
`pansion also doomed time-multiplexed devices [14], [39],
`
`Vol. 103, No. 3, March 2015 | Proceedings of the IEEE 323
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2076, p. 6
`
`
`
`Trimberger: Three Ages of FPGAs
`
`since equivalent capacity expansion could be had without the
`attendant complexity and performance loss by merely
`waiting for the next process generation. The survivors in
`the FPGA business were those that
`leveraged process
`technology advancement to enable automation. Altera was
`first, bringing the long-distance connections of their CPLDs
`to the Altera FLEX architecture. FLEX was more automatable
`than other FPGAs of the period that were dominated by short
`wires. It achieved quick success. In the mid-1990s, AT&T/
`Lucent released ORCA [26] and Xilinx scaled up its XC4000
`interconnect in number and length as it built larger devices.
`The Age of Expansion was firmly established.
`
`C. Emergence of SRAM as Technology of Choice
`One aspect of the rapid progress of Moore’s Law was the
`need to be on the forefront of process technology. The easiest
`way to double the capacity and halve the cost for logic was to
`target the next process technology node. This pressured
`FPGA vendors to adopt leading-edge process technology.
`FPGA companies using technologies that could not be easily
`implemented on a new process were at a structural
`disadvantage. This was the case with nonvolatile program-
`mable technologies such as EPROM, Flash and antifuse.
`When a new process technology becomes available, the first
`components that are available are transistors and wires, the
`essential components of electronics. A static-memory-based
`device could use a new, denser process immediately.
`Antifuse devices were accurately promoted as being more
`efficient on a particular technology node, but it took months
`or years to qualify the antifuse on the new node. By the time
`the antifuse was proven, SRAM FPGAs were already starting
`to deliver on the next node. Antifuse technologies could not
`keep pace with technology, so they needed to be twice as
`efficient as SRAM just to maintain product parity.
`Antifuse devices suffered a second disadvantage: lack
`of reprogrammability. As customers grew accustomed to
`‘‘volatile’’ SRAM FPGAs, they began to appreciate the ad-
`vantages of in-system programmability and field-updating
`of hardware. In contrast, a one-time-programmable device
`needed to be physically handled to be updated or to re-
`medy design errors. The alternative for antifuse devices
`was an extensive ASIC-like verification phase, which un-
`dermined the value of the FPGA.
`The rapid pace of Moore’s Law in the Age of Expansion
`relegated antifuse and flash FPGAs to niche products.
`
`D. Emergence of LUT as Logic Cell of Choice
`LUTs survived and dominated despite their docu-
`mented inefficiency in the Age of Expansion for several
`reasons. First, LUT-based architectures were easy targets
`for synthesis tools. This statement would have been dis-
`puted in the mid-1990s, when synthesis vendors com-
`plained that FPGAs were not ‘‘synthesis friendly.’’ This
`perception arose because synthesis tools were initially de-
`veloped to target ASICs. Their technology mappers ex-
`pected a small library in which each cell was described as a
`
`324 Proceedings of the IEEE | Vol. 103, No. 3, March 2015
`
`network of nands with inverters. Since a LUT implements
`any of the 22n
`combinations of its inputs, a complete library
`would have been enormous. ASIC technology mappers did
`a poor job on LUT-based FPGAs. But by the mid-1990s,
`targeted LUT mappers exploited the simplicity of mapping
`arbitrary functions into LUTs [9].
`The LUT has hidden efficiencies. A LUT is a memory,
`and memories lay out efficiently in silicon. The LUT also
`saves interconnect. FPGA programmable interconnect is
`expensive in area and delay. Rather than a simple metal
`wire as in an ASIC, FPGA interconnect contains buffers,
`routing multiplexers and the memory cells to control them.
`Therefore, much more of the cost of the logic is actually in
`the interconnect [15]. Since a LUT implements any func-
`tion of its inputs, automation tools need only route the
`desired signals together at a LUT in order to retire the
`function of those inputs. There was no need to make mul-
`tiple levels of LUTs just to create the desired function of a
`small set of inputs. LUT input pins are arbitrarily swappa-
`ble, so the router need not target a specific pin. As a