throbber
Homayoun
`
`Reference 27
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2139, p. 1
`
`

`

`I N V I T E D
`P A P E R
`
`Three Ages of FPGAs: A
`Retrospective on the First Thirty
`Years of FPGA Technology
`
`This paper reflects on how Moore’s Law has driven the design of FPGAs through
`three epochs: the age of invention, the age of expansion, and the age of accumulation.
`
`By Stephen M. (Steve) Trimberger, Fellow IEEE
`
`ABSTRACT | Since their introduction, field programmable gate
`arrays (FPGAs) have grown in capacity by more than a factor of
`10 000 and in performance by a factor of 100. Cost and energy
`per operation have both decreased by more than a factor of
`1000. These advances have been fueled by process technology
`scaling, but the FPGA story is much more complex than simple
`technology scaling. Quantitative effects of Moore’s Law have
`driven qualitative changes in FPGA architecture, applications
`and tools. As a consequence, FPGAs have passed through sev-
`eral distinct phases of development. These phases, termed
`‘‘Ages’’ in this paper, are The Age of Invention, The Age of
`Expansion and The Age of Accumulation. This paper summa-
`rizes each and discusses their driving pressures and funda-
`mental characteristics. The paper concludes with a vision of the
`upcoming Age of FPGAs.
`
`KEYWORDS | Application-specific integrated circuit (ASIC);
`commercialization; economies of scale; field-programmable
`gate array (FPGA); industrial economics; Moore’s Law; pro-
`grammable logic
`
`Fig. 1. Xilinx FPGA attributes relative to 1988. Capacity is logic cell
`count. Speed is same-function performance in programmable fabric.
`Price is per logic cell. Power is per logic cell. Price and power are scaled
`up by 10 000. Data: Xilinx published data.
`
`I . I N T R O D U C T I O N
`
`Xilinx introduced the first field programmable gate arrays
`(FPGAs) in 1984, though they were not called FPGAs until
`Actel popularized the term around 1988. Over the ensuing
`30 years, the device we call an FPGA increased in capacity
`by more than a factor of 10 000 and increased in speed by a
`factor of 100. Cost and energy consumption per unit func-
`tion decreased by more than a factor of 1000 (see Fig. 1).
`
`These advancements have been driven largely by process
`technology, and it is tempting to perceive the evolution of
`FPGAs as a simple progression of capacity, following semi-
`conductor scaling. This perception is too simple. The real
`story of FPGA progress is much more interesting.
`Since their introduction, FPGA devices have pro-
`gressed through several distinct phases of development.
`Each phase was driven by both process technology oppor-
`tunity and application demand. These driving pressures
`caused observable changes in the device characteristics
`and tools. In this paper, I review three phases I call the
`‘‘Ages’’ of FPGAs. Each age is eight years long and each
`became apparent only in retrospect. The three ages are:
`1) Age of Invention 1984–1991;
`Digital Object Identifier: 10.1109/JPROC.2015.2392104
`0018-9219 Ó 2015 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/
`redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
`318 Proceedings of the IEEE | Vol. 103, No. 3, March 2015
`
`Manuscript received September 18, 2014; revised November 21, 2014 and
`December 11, 2014; accepted December 23, 2014. Date of current version April 14, 2015.
`The author is with Xilinx, San Jose, CA 95124 USA (e-mail:
`steve.trimberger@xilinx.com).
`
`IPR2018-01600
`
`EXHIBIT
`2062
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2139, p. 2
`
`

`

`Trimberger: Three Ages of FPGAs
`
`The disadvantage of the FPGA per-unit cost premium
`over ASIC diminished over time as NRE costs became a
`larger fraction of the total cost of ownership of ASIC. The
`dashed lines in Fig. 2 indicate the total cost at some process
`node. The solid lines depict the situation at the next process
`node, with increased NRE cost, but lower cost per chip. Both
`FPGA and ASIC took advantage of lower cost manufacturing,
`while ASIC NRE charges continued to climb, pushing the
`crossover point higher. Eventually, the crossover point grew
`so high that for the majority of customers, the number of
`units no longer justified an ASIC. Custom silicon was war-
`ranted only for very high performance or very high volume;
`all others could use a programmable solution.
`This insight, that Moore’s Law [33] would eventually
`propel FPGA capability to cover ASIC requirements, was a
`fundamental early insight in the programmable logic busi-
`ness. Today, device cost is less of a driver in the FPGA
`versus ASIC decision than performance, time-to-market,
`power consumption, I/O capacity and other capabilities.
`Many ASIC customers use older process technology,
`lowering their NRE cost, but reducing the per-chip cost
`advantage.
`Not only did FPGAs eliminate the up-front masking
`charges and reduce inventory costs, but they also reduced
`design costs by eliminating whole classes of design prob-
`lems. These design problems included transistor-level de-
`sign, testing, signal integrity, crosstalk, I/O design and
`clock distribution.
`As important as low up-front cost and simpler design
`were, the major FPGA advantages were instantly availabi-
`lity and reduced visibility of a failure. Despite extensive
`simulation, ASICs rarely seemed to be correct the first
`time. With wafer-fabrication turnaround times in the
`weeks or months, silicon re-spins impacted schedules sig-
`nificantly, and as masking costs rose, silicon re-spins were
`noticeable to ever-rising levels in the company. The high
`cost of error demanded extensive chip verification. Since
`an FPGA can be reworked in minutes, FPGA designs in-
`curred no weeks-long delay for an error. As a result, veri-
`fication need not be as thorough. ‘‘Self-emulation,’’ known
`colloquially as ‘‘download-it-and-try-it,’’ could replace ex-
`tensive simulation.
`Finally, there was the ASIC production risk: an ASIC
`company made money only when their customer’s design
`went into production. In the 1980s, because of changing
`requirements during the development process, product
`failures or outright design errors, only about one-third of
`all designs actually went to production. Two-thirds of de-
`signs lost money. The losses were incurred not only by the
`ASIC customers, but also by the ASIC suppliers, whose
`NRE charges rarely covered their actual costs and never
`covered the cost of lost opportunity in their rapidly depre-
`ciating manufacturing facilities. On the other hand,
`programmable-logic companies and customers could still
`make money on small volume, and a small error could be
`corrected quickly, without costly mask-making.
`
`Vol. 103, No. 3, March 2015 | Proceedings of the IEEE 319
`
`Fig. 2. FPGA versus ASIC Crossover Point. Graph shows total cost
`versus number of units. FPGA lines are darker and start at the lower
`left corner. With the adoption of the next process node (arrows
`from the earlier node in dashed lines to later node in solid lines),
`the crossover point, indicated by the vertical dotted line, grew larger.
`
`2) Age of Expansion 1992–1999;
`3) Age of Accumulation 2000–2007.
`
`I I . P R E A M B L E : W H A T W A S T H E
`B I G D E AL AB O U T F PG A s ?
`
`A. FPGA Versus ASIC
`In the 1980s, Application-Specific Integrated Circuit
`(ASIC) companies brought an amazing product to the
`electronics market: the built-to-order custom integrated
`circuit. By the mid-1980s, dozens of companies were sell-
`ing ASICs, and in the fierce competition, the winning at-
`tributes were low cost, high capacity and high speed. When
`FPGAs appeared, they compared poorly on all of these
`measures, yet they thrived. Why?
`The ASIC functionality was determined by custom mask
`tooling. ASIC customers paid for those masks with an up-
`front non-recurring engineering (NRE) charge. Because
`they had no custom tooling, FPGAs reduced the up-front
`cost and risk of building custom digital logic. By making
`one custom silicon device that could be used by hundreds or
`thousands of customers,
`the FPGA vendor effectively
`amortized the NRE costs over all customers, resulting in
`no NRE charge for any one customer, while increasing the
`per-unit chip cost for all.
`The up-front NRE cost ensured that FPGAs were more
`cost effective than ASICs at some volume [38]. FPGA
`vendors touted this in their ‘‘crossover point,’’ the number
`of units that justified the higher NRE expense of an ASIC.
`In Fig. 2, the graphed lines show the total cost for a number
`units purchased. An ASIC has an initial cost for the NRE,
`and each subsequent unit adds its unit cost to the total. An
`FPGA has no NRE charge, but each unit costs more than the
`functionally equivalent ASIC, hence the steeper line. The
`two lines meet at the crossover point. If fewer than that
`number of units is required, the FPGA solution is cheaper;
`more than that number of units indicates the ASIC has
`lower overall cost.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2139, p. 3
`
`

`

`Trimberger: Three Ages of FPGAs
`
`Fig. 3. Generic PAL architecture.
`
`B. FPGA Versus PAL
`Programmable logic was well established before the
`FPGA. EPROM-programmed Programmable Array Logic
`(PAL) had carved out a market niche in the early 1980s.
`However, FPGAs had an architectural advantage. To un-
`derstand the FPGA advantage, we first look at the simple
`programmable logic structures of these early 1980s de-
`vices. A PAL device, as depicted in Fig. 3, consists of a two-
`level logic structure [6], [38]. Inputs are shown entering at
`the bottom. On the left side, a programmable and array
`generates product terms, ands of any combination of the
`inputs and their inverses. A fixed or gate in the block at
`the right completes the combinational logic function of the
`macrocell’s product terms. Every macrocell output is an
`output of the chip. An optional register in the macrocell
`and feedback to the input of the and array enable a very
`flexible state machine implementation.
`Not every function could be implemented in one pass
`through the PAL’s macrocell array, but nearly all common
`functions could be, and those that could not were realized
`in two passes through the array. The delay through the PAL
`array is the same regardless of the function performed or
`where it is located in the array. PALs had simple fitting
`software that mapped logic quickly to arbitrary locations in
`the array with no performance concerns. PAL fitting soft-
`ware was available from independent EDA vendors,
`allowing IC manufacturers to easily add PALs to their
`product line.
`PALs were very efficient from a manufacturing point of
`view. The PAL structure is very similar to an EPROM
`memory array, in which transistors are packed densely to
`yield an efficient implementation. PALs were sufficiently
`similar to memories that many memory manufacturers
`were able to expand their product line with PALs. When
`the cyclical memory business faltered, memory manufac-
`turers entered the programmable logic business.
`The architectural issue with PALs is evident when one
`considers scaling. The number of programmable points in
`
`320 Proceedings of the IEEE | Vol. 103, No. 3, March 2015
`
`the and array grows with the square of the number of
`inputs (more precisely, inputs times product terms). Pro-
`cess scaling delivers more transistors with the square of the
`shrink factor. However, the quadratic increase in the and
`array limits PALs to grow logic only linearly with the
`shrink factor. PAL input and product-term lines are also
`heavily loaded, so delay grows rapidly as size increases. A
`PAL, like any memory of this type, has word lines and bit
`lines that span the entire die. With every generation, the
`ratio of the drive of the programmed transistor to the
`loading decreased. More inputs or product terms increased
`loading on those lines. Increasing transistor size to lower
`resistance also raised total capacitance. To maintain speed,
`power consumption rose dramatically. Large PALs were
`impractical in both area and performance. In response, in
`the 1980s, Altera pioneered the Complex Programmable
`Logic Device (CPLD), composed of several PAL-type blocks
`with smaller crossbar connections among them. But FPGAs
`had a more scalable solution.
`The FPGA innovation was the elimination of the and
`array that provided the programmability. Instead, config-
`uration memory cells were distributed around the array to
`control functionality and wiring. This change gave up the
`memory-array-like efficiency of the PAL structure in favor
`of architectural scalability. The architecture of the FPGA,
`shown in Fig. 4, consists of an array of programmable logic
`blocks and interconnect with field-programmable switches.
`The capacity and performance of the FPGA were no longer
`limited by the quadratic growth and wiring layout of the
`and array. Not every function was an output of the chip, so
`
`Fig. 4. Generic array FPGA architecture. 4  4 array with three wiring
`tracks per row and column. Switches are at the circles at intersections.
`Device inputs and outputs are distributed around the array.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2139, p. 4
`
`

`

`•
`
`•
`
`capacity could grow with Moore’s Law. The consequences
`were great.
`•
`FPGA architecture could look nothing like a mem-
`ory. Design and manufacturing were very different
`than memory.
`The logic blocks were smaller. There was no gua-
`rantee that a single function would fit into one.
`Therefore, it was difficult to determine ahead of
`time how much logic would fit into the FPGA.
`The performance of the FPGA depended on where
`the logic was placed in the FPGA. FPGAs required
`placement and routing, so the performance of the
`finished design was not easy to predict in advance.
`• Complex EDA software was required to fit a design
`into an FPGA.
`With the elimination of the and-array, FPGA architects
`had the freedom to build any logic block and any inter-
`connect pattern. FPGA architects could define whole new
`logic implementation models, not based on transistors or
`gates, but on custom function units. Delay models need
`not be based on metal wires, but on nodes and switches.
`This architectural freedom ushered in the first Age of
`FPGAs, the Age of Invention.
`
`I I I . A G E O F I N V E N T I O N 1 98 4 – 1 9 9 1
`
`The first FPGA, the Xilinx XC2064, contained only 64 logic
`blocks, each of which held two three-input Look-Up Tables
`(LUTs) and one register [8]. By today’s counting, this would
`be about 64 logic cells, less than 1000 gates. Despite its
`small capacity, it was a very large dieVlarger than the
`commercial microprocessors of the day. The 2.5-micron
`process technology used for the XC2064 was barely able to
`yield it. In those early years, cost containment was critical
`to the success of FPGAs.
`‘‘Cost containment was critical to the success of FPGAs.’’
`A modern reader will accept that statement as some kind of
`simplistic statement of the obvious, but this interpretation
`seriously underemphasizes the issue. Die size and cost per
`function were crushingly vital. The XC2064, with only
`64 user-accessible flip-flops, cost hundreds of dollars because
`it was such a large die. Since yield (and hence, cost) is super-
`linear for large die, a 5% increase in die size could have
`doubled the cost or, worse, yield could have dropped to zero
`leaving the startup company with no product whatsoever.
`Cost containment was not a question of mere optimization; it
`was a question of whether or not the product would exist. It
`was a question of corporate life or death. In those early years,
`cost containment was critical to the success of FPGAs.
`As a result of cost pressure, FPGA architects used their
`newfound freedom to maximize the efficiency of the
`FPGA, turning to any advantage in process technology and
`architecture. Although static memory-based FPGAs were
`re-programmable, they required an external PROM to
`store the programming when power was off. Reprogramm-
`ability was not considered to be an asset, and Xilinx
`
`Trimberger: Three Ages of FPGAs
`
`downplayed it to avoid customer concerns about what
`happened to their logic when power was removed. And
`memory dominated the die area.
`Antifuse devices promised the elimination of the second
`die and elimination of the area penalty of memory-cell
`storage, but at the expense of one-time programmability. The
`early antifuse was a single transistor structure; the memory
`cell switch was six transistors. The area savings of antifuses
`over memory cells was inescapable. Actel
`invented the
`antifuse and brought it to market [17], and in 1990 the largest
`capacity FPGA was the Actel 1280. Quicklogic and Cross-
`point followed Actel and also developed devices based on the
`advantages of the antifuse process technology.
`In the 1980s, Xilinx’s four-input LUT-based architec-
`tures were considered ‘‘coarse-grained’’. Four-input func-
`tions were observed as a ‘‘sweet spot’’ in logic designs, but
`analysis of netlists showed that many LUT configurations
`were unused. Further, many LUTs had unused inputs,
`wasting precious area. Seeking to improve efficiency,
`FPGA architects looked to eliminate waste in the logic
`block. Several companies implemented finer-grained ar-
`chitectures containing fixed functions to eliminate the
`logic cell waste. The Algotronix CAL used a fixed-MUX
`function implementation for a two-input LUT [24]. Con-
`current (later Atmel) and their licensee, IBM, used a
`small-cell variant that included two-input nand and xor
`gates and a register in the CL devices. Pilkington based
`their architecture on a single nand gate as the logic block
`[23], [34]. They licensed Plessey (ERA family), Toshiba
`(TC family) and Motorola (MPA family) to use their nand-
`cell-based, SRAM-programmed device. The extreme of
`fine-grained architecture was the Crosspoint CLi FPGA, in
`which individual
`transistors were connected to one
`another with antifuse-programmable connections [31].
`Early FPGA architects noted that an efficient inter-
`connect architecture should observe the two-dimensionality
`of the integrated circuit. The long, slow wires of PALs were
`replaced by short connections between adjacent blocks that
`could be strung together as needed by programming to form
`longer routing paths. Initially, simple pass transistors steered
`signals through the interconnect segments to adjacent
`blocks. Wiring was efficient because there were no unused
`fractions of wires. These optimizations greatly shrank the
`interconnect area and made FPGAs possible. At the same
`time,
`though,
`they increased signal delay and delay
`uncertainty through FPGA wiring due to large capacitances
`and distributed series resistances through the pass transistor
`switch network. Since interconnect wires and switches
`added size, but not (billable) logic, FPGA architects were
`reluctant to add much. Early FPGAs were notoriously
`difficult to use because they were starved for interconnect.
`
`I V . A G E O F I N V E NT I ON I N R E T R O S P E C T
`
`In the Age of Invention, FPGAs were small, so the design
`problem was small. Though they were desirable, synthesis
`
`Vol. 103, No. 3, March 2015 | Proceedings of the IEEE 321
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2139, p. 5
`
`

`

`Trimberger: Three Ages of FPGAs
`
`and even automated placement and routing were not
`considered essential. Many deemed it impractical even to
`attempt design automation on the personal computers of
`the time, since ASIC placement and routing was being
`done on mainframe computers. Manual design, both
`logical and physical, was acceptable because of the small
`problem size. Manual design was often necessary because
`of the limited routing resources on the chips [41].
`Radically different architectures precluded universal
`FPGA design tools, as were available in the ASIC business.
`FPGA vendors took on the added burden of EDA devel-
`opment for their devices. This was eventually recognized
`as an advantage, as FPGA vendors experimented and im-
`proved their architectures. The PAL vendors of the pre-
`vious decade had relied upon external tool vendors to
`provide software for mapping designs into their PALs. As a
`result, PAL vendors were restricted to those architectures
`the tool vendors supported, leading to commoditization,
`low margins and lack of innovation. PLD architecture was
`stifled while FPGA architecture flourished.
`A further advantage of captive software development
`was that FPGA customers were not required to purchase
`tools from a third-party EDA company, which would have
`increased their NRE costs. As they did with NRE charges,
`FPGA vendors amortized their tool development costs into
`their silicon pricing, keeping the up-front cost of using
`their devices very low. EDA companies were not much
`interested in FPGA tools anyway with their fragmented
`market, low volume, low selling price, and requirement to
`run on underpowered computers.
`In the Age of Invention, FPGAs were much smaller than
`the applications that users wanted to put into them. As a
`result, multiple-FPGA systems [1], [42] became popular, and
`automated multi-chip partitioning software was identified as
`an important component of an FPGA design suite [36], even
`as automatic placement and routing were not.
`
`V. I N T E RL U D E : S H A K E O U T I N
`FP G A B US I N E S S
`
`The Age of Invention ended with brutal attrition in the FPGA
`business. A modern reader may not recognize most of the
`companies or product names in Section III and in the FPGA
`genealogical tree in Fig. 5 [6], [38]. Many of the companies
`simply vanished. Others quietly sold their assets as they
`exited the FPGA business. The cause of this attrition was
`more than the normal market dynamics. There were impor-
`tant changes in the technology, and those companies that did
`not take advantage of the changes could not compete. Quan-
`titative changes due to Moore’s Law resulted in qualitative
`changes in the FPGAs built with semiconductor technology.
`These changes characterized the Age of Expansion.
`
`VI . A G E O F E XP A N S I O N 1 9 9 2 – 1 99 9
`
`Fig. 5. FPGA architecture genealogical tree, ca. 2000. All trademarks
`are the property of their respective owners.
`
`Pioneering the fabless business model, FPGA startup com-
`panies typically could not obtain leading-edge silicon tech-
`nology in the early 1990s. As a result, FPGAs began the Age
`of Expansion lagging the process introduction curve. In the
`1990s, they became process leaders as the foundries realized
`the value of using the FPGA as a process-driver application.
`Foundries were able to build SRAM FPGAs as soon as they
`were able to yield transistors and wires in a new technology.
`FPGA vendors sold their huge devices while foundries
`refined their processes. Each new generation of silicon
`doubled the number of transistors available, which doubled
`the size of the largest possible FPGA and halved the cost per
`function. More import than simple transistor scaling, the
`introduction of chemical-mechanical polishing (CMP)
`permitted foundries to stack more metal layers. Valuable
`
`Through the 1990s, Moore’s Law continued its rapid pace of
`improvement, doubling transistor count every two years.
`
`Fig. 6. Growth of FPGA LUTs and interconnect wires Wire length is
`measured in millions of transistor pitches.
`
`322 Proceedings of the IEEE | Vol. 103, No. 3, March 2015
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2139, p. 6
`
`

`

`for ASICs, this capability was explosive for FPGAs because
`the cost of valuable (nonbillable) interconnect dropped even
`faster than the cost of transistors, and FPGA vendors
`aggressively increased the interconnect on their devices to
`accommodate the larger capacity (see Fig. 6).
`This rapid process improvement had several effects
`which we now examine.
`
`A. Area Became Less Precious
`No one who joined the FPGA industry in the mid-1990s
`would agree that cost was unimportant or area was not
`precious. However, those who had experienced the ago-
`nies of product development in the 1980s certainly saw the
`difference. In the 1980s, transistor efficiency was neces-
`sary in order to deliver any product whatsoever. In the
`1990s, it was merely a matter of product definition. Area
`was still important, but now it could be traded off for
`performance, features and ease-of-use. The resulting de-
`vices were less silicon efficient. This was unthinkable in
`the Age of Invention just a few years earlier.
`
`B. Design Automation Became Essential
`In the Age of Expansion, FPGA device capacity in-
`creased rapidly as costs fell. FPGA applications became too
`large for manual design. In 1992, the flagship Xilinx
`XC4010 delivered a (claimed) maximum of 10 000 gates.
`By 1999, the Virtex XCV1000 was rated at a million. In the
`early 1990s, at the start of the Age of Expansion, automatic
`placement and routing was preferred, but not entirely
`trusted. By the end of the 1990s, automated synthesis [9],
`placement and routing [3], [4], [19], [32], [37] were required
`steps in the design process. Without the automation, the
`design effort would be simply too great. The life of an FPGA
`company was now dependent upon the ability of design
`automation tools to target the device. Those FPGA compa-
`nies that controlled their software controlled their future.
`Cheaper metal from process scaling led to more prog-
`rammable interconnect wire, so that automated placement
`tools could succeed with a less-precise placement.
`Automated design tools required automation-friendly
`architectures, architectures with regular and plentiful
`interconnect resources to simplify algorithmic decision-
`making. Cheaper wire also admitted longer wire segmen-
`tation,
`interconnect wires that spanned multiple logic
`blocks [2], [28], [44]. Wires spanning many blocks effec-
`tively make physically distant logic logically closer, im-
`proving performance. The graph in Fig. 7 shows large
`performance gains from a combination of process technol-
`ogy and interconnect reach. Process scaling alone would
`have brought down the curve, but retained the shape;
`longer segmentation flattened the curve. The longer seg-
`mented interconnect simplified placement because with
`longer interconnect, it was not as essential to place two
`blocks in exactly the right alignment needed to connect
`them with a high performance path. The placer can do a
`sloppier job and still achieve good results.
`
`Trimberger: Three Ages of FPGAs
`
`Fig. 7. Performance scaling with longer wire length segmentation.
`
`On the down side, when the entire length of the wire
`segment is not used, parts of the metal trace are effectively
`wasted. Many silicon-efficient Age of Invention architec-
`tures were predicated on wiring efficiency, featuring short
`wires that eliminated waste. Often, they rigidly followed
`the two-dimensional limitation of the physical silicon giv-
`ing those FPGAs the label ‘‘cellular.’’ In the Age of Expan-
`sion, longer wire segmentation was possible because the
`cost of wasted metal was now acceptable. Architectures
`dominated by nearest-neighbor-only connections could
`not match the performance or ease-of-automation of archi-
`tectures that took advantage of longer wire segmentation.
`A similar efficiency shift applied to logic blocks. In the
`Age of Invention, small, simple logic blocks were attractive
`because their logic delay was short and because they
`wasted little when unused or partially used. Half of the
`configuration memory cells in a four-input LUT are wasted
`when a three-input function is instantiated in it. Clever
`designers could manually map complex logic structures
`efficiently into a minimum number of fine-grained logic
`blocks, but automated tools were not as successful. For
`larger functions, the need to connect several small blocks
`put greater demand on the interconnect. In the Age of
`Expansion, not only were there more logic blocks, but the
`blocks themselves became more complex.
`Many Age of Invention architectures, built for effi-
`ciency with irregular logic blocks and sparse interconnect,
`were difficult to place and route automatically. During the
`Age of Invention, this was not a serious issue because the
`devices were small enough that manual design was practi-
`cal. But excessive area efficiency was fatal to many devices
`and companies in the Age of Expansion. Fine-grained
`architectures based on minimizing logic waste (such as the
`Pilkington nand-gate block, the Algotronix/Xilinx 6200
`Mux-based 2LUT block, the Crosspoint transistor-block)
`simply died. Architectures that achieved their efficiency by
`starving the interconnect also died. These included all
`nearest-neighbor grid-based architectures. The Age of Ex-
`pansion also doomed time-multiplexed devices [14], [39],
`
`Vol. 103, No. 3, March 2015 | Proceedings of the IEEE 323
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2139, p. 7
`
`

`

`Trimberger: Three Ages of FPGAs
`
`since equivalent capacity expansion could be had without the
`attendant complexity and performance loss by merely
`waiting for the next process generation. The survivors in
`the FPGA business were those that
`leveraged process
`technology advancement to enable automation. Altera was
`first, bringing the long-distance connections of their CPLDs
`to the Altera FLEX architecture. FLEX was more automatable
`than other FPGAs of the period that were dominated by short
`wires. It achieved quick success. In the mid-1990s, AT&T/
`Lucent released ORCA [26] and Xilinx scaled up its XC4000
`interconnect in number and length as it built larger devices.
`The Age of Expansion was firmly established.
`
`C. Emergence of SRAM as Technology of Choice
`One aspect of the rapid progress of Moore’s Law was the
`need to be on the forefront of process technology. The easiest
`way to double the capacity and halve the cost for logic was to
`target the next process technology node. This pressured
`FPGA vendors to adopt leading-edge process technology.
`FPGA companies using technologies that could not be easily
`implemented on a new process were at a structural
`disadvantage. This was the case with nonvolatile program-
`mable technologies such as EPROM, Flash and antifuse.
`When a new process technology becomes available, the first
`components that are available are transistors and wires, the
`essential components of electronics. A static-memory-based
`device could use a new, denser process immediately.
`Antifuse devices were accurately promoted as being more
`efficient on a particular technology node, but it took months
`or years to qualify the antifuse on the new node. By the time
`the antifuse was proven, SRAM FPGAs were already starting
`to deliver on the next node. Antifuse technologies could not
`keep pace with technology, so they needed to be twice as
`efficient as SRAM just to maintain product parity.
`Antifuse devices suffered a second disadvantage: lack
`of reprogrammability. As customers grew accustomed to
`‘‘volatile’’ SRAM FPGAs, they began to appreciate the ad-
`vantages of in-system programmability and field-updating
`of hardware. In contrast, a one-time-programmable device
`needed to be physically handled to be updated or to re-
`medy design errors. The alternative for antifuse devices
`was an extensive ASIC-like verification phase, which un-
`dermined the value of the FPGA.
`The rapid pace of Moore’s Law in the Age of Expansion
`relegated antifuse and flash FPGAs to niche products.
`
`D. Emergence of LUT as Logic Cell of Choice
`LUTs survived and dominated despite their docu-
`mented inefficiency in the Age of Expansion for several
`reasons. First, LUT-based architectures were easy targets
`for synthesis tools. This statement would have been dis-
`puted in the mid-1990s, when synthesis vendors com-
`plained that FPGAs were not ‘‘synthesis friendly.’’ This
`perception arose because synthesis tools were initially de-
`veloped to target ASICs. Their technology mappers ex-
`pected a small library in which each cell was described as a
`
`324 Proceedings of the IEEE | Vol. 103, No. 3, March 2015
`
`network of nands with inverters. Since a LUT implements
`any of the 22n
`combinations of its inputs, a complete library
`would have been enormous. ASIC technology mappers did
`a poor job on LUT-based FPGAs. But by the mid-1990s,
`targeted LUT mappers exploited the simplicity of mapping
`arbitrary functions into LUTs [9].
`The LUT has hidden efficiencies. A LUT is a memory,
`and memories lay out efficiently in silicon. The LUT also
`saves interconnect. FPGA programmable interconnect is
`expensive in area and delay. Rather than a simple metal
`wire as in an ASIC, FPGA interconnect contains buffers,
`routing multiplexers and the memory cells to control them.
`Therefore, much more of the cost of the logic is actually in
`the interconnect [15]. Since a LUT implements any func-
`tion of its inputs, automation tools need only route the
`desired signals together at a LUT in order to retire the
`function of those inputs. There was no need to make mul-
`tiple levels of LUTs just to create the desired function of a
`small set of inputs. LUT input pins are

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket