throbber
SCIENCE
`TK
`789.S
`G36
`B48 '
`1999
`
`Intel Exhibit 1006 Part 1
`Intel v. Iida
`
`

`

`ARCHITECTURE AND CAD
`FOR
`DEEP-SUBMICRON FPGAs
`
`

`

`THE KLUWER INTERNATIONAL SERIES
`THE KLUWER INTERNATIONAL SERIES
`IN ENGINEERING AND COMPUTER SCIENCE
`IN ENGINEERING AND COMPUTER SCIENCE
`
`

`

`

`

`Distributors for North, Central and South America:
`Kluwer Academic Publishers
`101 Philip Drive
`Assinippi Park
`Norwell, Massachusetts 02061 USA
`Telephone (781) 871-6600
`Fax (781) 871-6528
`E-Mail <kluwer@wkap.com>
`
`Distributors for all other countries:
`Kluwer Academic Publishers Group
`Distribution Centre
`Post Office Box 322
`3300 AH Dordrecht, THE NETHERLANDS
`Telephone 31 78 6392 392
`Fax 31 78 6546 474
`E-Mail <orderdept@wkap.nl>
`
`, . Electronic Services <http://www.wkap.nl>
`
`Library of Congress Cataloging-in-Publication Data
`
`A C.I.P. Catalogue record for this book is available
`from the Library of Congress.
`
`Copyright © 1999 by Kluwer Academic Publishers
`
`All rights reserved. No part of this publication may be reproduced, stored in a
`retrieval system or transmitted in any form or by any means, mechanical, photo(cid:173)
`copying, recording, or otherwise, without the prior written permission of the
`publisher, Kluwer Academic Publishers, 101 Philip Drive, Assinippi Park, Norwell,
`Massachusetts 02061
`
`Printed on acid-free paper.
`
`Printed in the United States of America
`
`

`

`To Corinne, William, and Isabel,
`
`Betty and Frank Rose,
`
`Marie and Ron
`
`

`

`
`
`

`

`Table of Contents
`
`CHAPTER 1 Introduction ....... . .......... ... ....... . . . ... .. ..... . . l
`1.1 Overview of FPGAs ....... . ...................................... 2
`1.2 FPGA Architectural Issues . .. .... . . . ..... .. .. . .... ... ... .. . . ... . . . . 3
`1.3 Approach and CAD Tools ............... . .......................... 7
`1.4 Book Organization ........ . ................................... . ... 8
`1.5 Acknowledgments .. ..... .. .. . ......... . .. . ....................... 9
`
`CHAPTER 2 Background and Previous Work
`. ........................ . 11
`2.1 FPGA Architecture ...... .... ..... . .. . .. . . .. .... . . .. .. ........ . .. 11
`2.1.1
`FPGA Programming Technologies . . . .. ..... ...... .. . . . .. . . .. 11
`2.1.2 FPGA Logic Block Architecture ............ .. .............. . 13
`2.1.3 FPGA Routing Architecture . .. .. . . .. . ... ....... ... .... .. ... 14
`. . ...... . .. ... . . . . .. .... .. .... ... ....... 1 ••.•. •• 18
`2.2 CAD for FPGAs
`2.2.1 Synthesis and Logic Block Packing .......................... 19
`2.2.2 Placement ...... . . . . . .. .. . . . .... . . . ............. . .. . . . .. 22
`2.2.3 Routing .. . ...... .. .... . ...... .. ........... .. ...... . . ... 25
`2.2.4 Delay Modelling . . ... .. .. ........ . .. ... ........ .. .... .. .. 31
`2.2.5 Timing Analysis ... . .. ... .. . .. . .. . . .. . .. ... .. ..... . . ... . . 32
`2.3 Summary . .............. .. .... ... ......... . . . .. .... ... . . .... . . . 34
`
`CHAPTER 3 CAD Tools: Packing and Placement ....................... 37
`3.1 Logic Block Packing .. ... .. ...................... . ............... 37
`3.1.1 Cluster-Based Logic Blocks . ............ . .. . .............. . 38
`3.1 .2 Basic Logic Block Packing Algorithm: VPack .. . . ............. 39
`3.1.3 Timing-Driven Logic Block Packing: T-VPack .. .. .. . .... .. .. . 43
`Cluster Seed and Attraction Function . .. ... . . .... . . ........... 43
`Computational Complexity vs. Frequency of Timing Analysis . . ... 47
`
`

`

`

`

`Table of Contents
`
`ix
`
`5.2.1 CAD Flow .................... . . . ....... .. ... .. . ... .... 107
`5.2.2 Area-Efficiency Metric .......... . . . ... . ... . .... . .. . . . ... . 109
`5.2.3 Significant FPGA Architectural Details ...... .- ... ... ......... 109
`5.3 Experimental Results: Directionally-Biased Routing ...... . .... . . . .... 110
`5.3.1 Results for Square Logic Block Arrays ..... . . . .. ... ...... . .. 111
`5.3.2 Results for Rectangular Logic Block Arrays
`. . .. . . .. . .. . . ..... 113
`5.4 Experimental Results: Non-Uniform Routing . . . .... . .. . . . .. . .... . . .. 115
`5.4.1 Center/Edge Capacity Ratio . ... .. .. ......... . ............. 116
`5.4.2 Single Center Channel .. . .. .... .. . .. . .. . . . .. .. . . .. ... . . . . 120
`5.4.3 1/0 Channel .. . . ...... .. ... .. . . . .. . ... . . . .. ......... .. .. 122
`5.5 Summary ..... ........... . .... . ..... . . . ...... .. .. . . . . .... . ... . 126
`
`CHAPTER 6 Cluster-Based Logic Blocks . .... . . . .... . . .... . .... .. .. . . 127
`6.1 Motivation ... .. ... . . . . ... . . . ... .. . .. . .. . ...... . .. . ..... .. . ... . 127
`6.2 Experimental Methodology .............. · . . .... . . . .. .... ... .. . .. . . 130
`6.2.1 CAD Flow . . .. ..... . ...... .... .. ........... ... ... . . .. .. 130
`6.2.2 Area Model . .... .. ... . . ... .... .. ............. . ......... 132
`6.2.3 Delay Model .. . .. . .... . .. . .. .. . ...... . ................. 134
`6.2.4 Architecture Evaluation Metric: Area-Delay Product . . .... . .. . . 136
`. . ......... . . . .. . . .. ...... 136
`6.2.5 FPGA Architectural Assumptions
`Basic Architecture .............. .. ....... . ..... . .. ~ . . . . .. 137
`Routing Architecture ............................... . . . ... 137
`Effect of Cluster Size on the Physical Length of FPGA
`Routing Segments ............ . ..... . ..... . ....... .. . . 138
`Sizing Routing Transistors to Compensate for Different
`Physical Segment Lengths .... .. ........ . ... . .. .. . . . .. . 138
`6.3 Cluster Inputs Required vs. Cluster Size ........... . .... ... . .. ... . ... 139
`6.4 Flexibility of Logic Block to Routing Interconnect vs. Cluster Size ..... . . 141
`6.5 Speed and Area-Efficiency vs. Cluster Size . . . ... . ... . . . . . ........ . .. 142
`6.5.1 Discussion of Delay vs. Cluster Size Results . .. ... .. .. . .. . . ... 145
`6.6 Effect of Cluster Size on Compile Time ..... . ... .... ... . . . . . ... ... . . 147
`6.7 Summary .... ... .. . .. ........... .. . .. . . ...... . ....... . ... . . . .. 148
`
`

`

`

`

`Table of Contents
`
`xi
`
`B.1.1 FPGA Routing Structures ....... . .... .. . .. ..... .. .. ... . . . . 208
`Gate Boosting .... ... .. .................... . ....... .. . ... 208
`Buffers ...... ........................... · ............ . .. 210
`Connection Block to Logic Block Input Pins .. .... . .. ... ...... 212
`B.1.2 Logic Block Structures . ............ . ..... . ............... 214
`B.2 Delay and RC-Equivalent Circuit Extraction ..... .. .. . .. . ..... ...... . 217
`
`APPENDIX C Sizing of Routing Transistors and Metal . .. . ...... . ... . .. 221
`C.1 Sizing Pass Transistor Routing Switches . .... ..... .. ......... . .. .... 221
`C.2 Sizing Tri-State Buffer Routing Switches .. . .... . .................... 225
`C.3 Tri-State Buffers in Output Pin Connection Blocks .................... 228
`C.4 Metal Width and Spacing ............... .. .. . ............. .. . .. .. 228
`
`References .. . .. . . .. ...... .. .................................. . . .. 233
`
`Index .... . .... . . . .. ...... ... . . ......... ... .... . . ... . . . .... . . .. . . . 243
`
`

`

`
`
`

`

`CHAPTER 1
`
`Introduction
`
`Since their introduction in 1984, Field-Programmable Gate Arrays (FPGAs) have
`become one of the most popular implementation media for digital circuits and have
`grown into a $2 billion per year industry. As process geometries have shrunk into the
`deep-submicron region, the logic capacity of FPGAs has greatly increased, making
`FPGAs a viable implementation alternative for larger and larger designs. To make the
`best use of these new deep-submicron processes, one must re-architect one's FPGAs
`and Computer-Aided Design (CAD) tools. This book addresses several key issues in
`the design of high-performance FPGA architectures and CAD tools, wi\h particular
`emphasis on issues that are important for FPGAs implemented in deep-submicron
`processes.
`
`Three factors combine to determine the performance of an FPGA: the quality of the
`CAD tools used to map circuits into the FPGA, the quality of the FPGA architecture,
`and the electrical (i.e. transistor-level) design of the FPGA. In this book, we examine
`all three of these issues in concert, and we believe this is the first systematic study of
`FPGA architecture and CAD to do so.
`
`In order to investigate the quality of different FPGA architectures, one needs CAD
`tools capable of automatically implementing circuits in each FPGA architecture of
`interest. Once a circuit has been implemented in an FPGA architecture, one next
`needs accurate area and delay models to evaluate the quality (speed achieved, area
`required) of the circuit implementation in the FPGA architecture under test. This
`t~e develo ment of a high- ualit and highly
`book therefore has three major foci:
`flexible CAD infrastructure, the creation of accurate area and delay models for
`FPGAs, and the study of several important FPGA architectural issues.
`
`

`

`

`

`1.2 FPGA Architectural Issues
`
`3
`
`Time-to-market is the other key advantage of FPGAs. Full fabrication typically takes
`6 - 8 weeks. If problems are found in the finished chip it must be thrown away, and
`one must wait another 6 - 8 weeks to fabricate a corrected design. FPGAs, on the
`other hand, can be programmed in seconds, and any bugs found once the chip is tested
`in system can be corrected in minutes by reprogramming the FPGA. With today's
`short product cycles, the time-to-market advantage this provides is often compelling.
`
`FPGA programmability carries a price, however. In MPGAs and Standard Cells cir(cid:173)
`cuitry is interconnected with metal wires. FPGAs, in contrast, must connect circuitry
`via programmable switches. These switches have higher resistance t an me al wire-s
`· and add s 1gnificant c-apac1 ance to connections, reducing circuit speed. The switches
`also take up more area than metal wires would, so an FPGA must be considerably
`larger than an MPGA to implement the same circuit. A circuit implemented in an
`FPGA is typically 10 times larger and roughly 3 times slower than the same circuit
`implemented via an MPGA in an equivalent process [1]. The larger size of FPGA cir(cid:173)
`cuitry makes FPGA implementations more expensive than MPGAs for high volume
`designs, and the limited speed of FPGAs precludes their use in very high-speed
`designs. These differences mandate research into new FPGA architectures in order to
`reduce these speed and density penalties. In addition, because the FPGA marketplace
`is highly competitive each FPGA manufacturer is constant! searching for better
`FPGA architectures in 01:_d~r !9_gain a speed and density advantage.
`
`1.2 FPGA Architectural Issues
`
`All FPGAs consist of a large number of programmable logic blocks, each of which
`implement a small amount of digital logic, and programmable routing which allows
`the logic block inputs and outputs to be connected to form larger circuits. In this book
`we investigateJ hree different issues in FPGA architecture:
`concern FPGA rout(cid:173)
`ing design, and one concerns FPGA logic block design.
`~ -=----=---'----------._.::;._ ___ ~~
`
`/ The first issue we investigate is global routing architecture [2, 3]. ,The global routing
`~ architecture of an FPGA s ecifies the relative width of the various _wiring channels
`ll
`.
`,
`- .
`-
`-
`• -
`-
`- - ··
`-
`-···· --
`/ .: within the chip. Figure 1.1 depicts an example global routing architecture in which
`)(
`the channels -near the center of the FPGA are wider than those near the edges. In
`w/
`MPGA and Standard Cell implementations, a custom chip is created for each design,
`so routing channels can easily be made wider in areas of a chip where the demand for
`routing is greater. In FPGAs, however, all routing resources are prefabricated, so the
`width of all the routing channels is set by the FPGA manufacturer. Our goal, then, is
`to find the distribution of routing resources, or tracks, to the various channels that per-
`\ mits their efficient utilization by the largest class of circuits. If there are too few
`
`

`

`

`

`1.2 FPGA Architectural Issues
`
`5
`
`blocks, cluster-based logic blocks can improve FPGA speeds. A~ well, an FPGA in
`which every logic block contains several LUTs will need fewer logic blocks to imple-
`-
`-
`ment a circuit than an FPGA in wpich each logic block is a single LUT. This reduces
`\p,"\
`f the placement and routing problem considerably. Since- pla'cement andJ JI
`the si~
`~ ting is usually tne mo st t~ s u~p -in mapping a d~sign to an FPGA,
`cluster-based logic' blo~ks can· significantly reduce design compile time. As FPGAs
`grow larger, it is important to keep this compile time from growing too large or one of
`the key advantages of FPGAs, rapid prototyping and quick design turns, will be lost.
`
`We will investigate how the number of LUTs per cluster affects FPGA speed, area,
`compile time and other important FPGA architecture parameters, such as the proper
`number of inputs to a logic block and the required flexibility of the general-purpose
`interconnect. The trade-offs among these architectural issues and metrics are quite
`complex. For example, grouping related LUTs together into a single logic block
`reduces the number of connections to be routed between logic blocks, saving routing
`area. Since the eneral-purpose interconnect consume_s most __ of__!h~d_ie ~r~~ in
`this is a significant area savings. On the other hand, in the
`SRAM-based FPGAs 1
`logic c usters we study, the area required by the local routing grows quadratically with
`the number of LUTs in a cluster. For sufficiently large clusters, the area used by this
`local interconnect will exceed the area saved in the general interconnect. Similarly,
`since the local connections within clusters can be made very fast, we expect clusters
`containing a larger number of LUTs to attain higher circuit speeds. At some point,
`however, the speed gained by increasing the cluster size will diminish or be lost
`because increasing the cluster size slows the local cluster routing.
`
`While recent FPGAs from Xilinx [11), Altera [7], Lucent Technologi~s [5], Actel
`[12) and Yantis [13) have all grouped several LUTs together into larger logic blocks,
`there has been little published work investigating the impact of the number of LUTs
`per logic block on FPGA speed or area.
`
`The final FPGA issue we examine is that of detailed routing architecture [14). The
`detailed routing architecture of an FPGA defines how logic block inputs and outputs
`can be interconnected. The style of detailed routing architecture we will investigate is
`the island-style architecture [1] used by Xilinx, Lucent, and Yantis FPGAs [4, 5, 13].
`A simplified detailed-routing architecture is depicted in Figure 1.3. In many ways,
`detailed routing architecture is the key element of an FPGA because:
`
`1. Most of an FPGA's area is devoted to routing;
`2. Most of a circuit's delay is due to routing delays rather than logic block delays
`[15, 1]; and
`
`

`

`

`

`1.3 Approach and CAD Tools
`
`7
`
`completely unroutable. Achieving an appropriate balance between pass transistor
`based and tri-state buffer based switches is also crucial - pass transistors require less
`area and are faster for connections that go through only a few switches, but buffers are
`faster for connections that go through many switches. Similarly, if we have too few
`long routing wires, long connections will have to pass through many programmable
`switches and will be slow. On the other hand, too many long wires will result in short
`connections being made via long wires, wasting area and degrading speed.
`
`( It is difficult to look at one detailed routing architecture parameter in isolation
`· because the architectural parameters interact in complex ways to determine an
`FPGA's speed and area-efficiency.
`In the research presented in this book, we have
`tried to take a holistic approach that looks at all the parameters determining a detailed
`routing together, and assesses both the speed and area of each FPGA we investigate.
`We believe that this is the first study to examine such a broad spectrum of detailed
`routing architectures.
`
`1.3 Approach and CAD Tools
`
`Our approach to exploring FPGA architecture is an experimental one. Benchmark
`circuits are technology mapped, placed and routed into each FPGA architecture of
`interest and the area and/or delay of each circuit in that architecture is measured.
`These measurements allow us to determine the best architectural parameters.
`
`In order to evaluate different architectures, the CAD tools used in the empirical
`approach must be flexible enough to target all the architectures of interest. This pre(cid:173)
`cludes the use of commercial FPGA CAD tools, which target only the FPGAs of a
`specific manufacturer. Accordingly, we have created new CAD tools to pack multiple
`LUTs into a logic cluster, t,9 place logic blocl<:s_witbi!:! an FPGA, and to route the con-
`. nections between logic blocks [ 17, 10]. In addition to making these CAD tools flexi(cid:173)
`ble, we also took cons1dera5Iecaretoensure they produced high quality results, as
`poor CAD tools can lead to inaccurate architecture conclusions. The CAD tools
`understand the special features of each architecture, and attempt to fully optimize for
`each architecture they target. They incorporate several new features and enhance(cid:173)
`ments to existing algorithms, so the CAD algorithms are of interest in their own right.
`As well, these CAD tools are in use in numerous FPGA and CAD studies at the Uni(cid:173)
`versity of Toronto and around the world, and obtain the highest quality results on the
`set of standard benchmarks used to compare academic place and route tools [17].
`
`Ideally, one would lay out each FPGA architecture of interest to obtain exact area
`measurements and highly accurate delay values. In this book we study hundreds of
`
`

`

`8
`
`CHAPTER 1 Introduction
`
`different FPGA architectures, so this is clearly impossible. Accordingly, we have
`developed more abstract area models and delay estimates that do not require layout--of(cid:173)
`the FPGAs, but which are still sufficiently accurate to allow fair comparison of differ(cid:173)
`ent architectures.
`
`1.4 Book Organization
`
`Most of the material in this book is adapted from the Ph.D. thesis of Vaughn Betz
`(18), with important additional information (the T-VPack algorithm described in
`Chapter 3 and most of the experimental results and analysis concerning logic clusters
`presented in Chapter 6) adapted from the M.A.Sc thesis of Alexander Marquardt (19).
`
`The next chapter provides background information and details some of the previous
`work in the relevant areas of both CAD and FPGA architecture. C~apter 3 describes
`the CAD algorithms we use to pack several LUTs together into a logic cluster, and the
`algorithms we use to perform placement of logic blocks in an FPGA. In Chapter 4 we
`-=---=--- ·
`detail the algorithms used in our FPGA router.
`
`Chapters 5 through 7 then employ these CAD tools to investigate the three FPGA
`architectural issues outlined earlier. Each of these three chapters begins by describing
`the CAD flow and area and/or delay models used in the experimental study, and then
`presents the architectural results obtained. Chapter 5 evaluates different global rout(cid:173)
`ing architectures for FPGAs, and determines the most area-efficient architectures. In
`Chapter 6 we investigate cluster-based logic blocks and determine the logic cluster
`In Chapter 7 we ascertain which
`types that maximize area-efficiency and speed.
`detailed routing architectures lead to FPGAs with the best combination of speed and
`area-efficiency. The final chapter summarizes the architectural conclusions and pro(cid:173)
`vides suggestions for future work.
`
`The three appendices provide additional useful information. Appendix A outlines th,~
`graphic visualization features incorporated in our placementaiio rout ing CAD tools,
`and highlights their usefulness in algorithm and architecture research. Appendix B
`describes the transistor-level design of an FPGA and our characterization of a deep(cid:173)
`submicron IC process. Finally, in Appendix C we investigate how routing transistors
`and metal routing wires should be sized in order to create an FPGA with the best
`combination of area-efficiency and speed. The detailed information concerning the
`electrical design of an FPGA provided by Appendices B and C is essential to create
`accurate FPGA area and delay models, and hence to properly compare FPGA archi(cid:173)
`tectures.
`
`

`

`1.5 Acknowledgments
`
`9
`
`1.5 Acknowledgments
`
`The authors are indebted to Carl Harris of Kluwer Academic Publishers for his help
`and advice as we prepared this book. We would like to thank the other members of
`our research group at the University of Toronto -
`Steve Wilton, Mike Hutton,
`Mohammed Khalid, Yaska Sankar, Jordan Swartz, Rob McCready, and Paul Leventis
`-
`for the many valuable suggestions and insightful discussions they have provided
`over the years. Special thanks are due to Jordan Swartz for enhancing the VPR FPGA
`architecture generator to allow targeting of an architecture very similar to that of the
`Xilinx 4000X series FPGAs. Many other students and faculty in the University of
`Toronto Electrical Engineering department lent us their expertise on vexing problems;
`we would particularly like to thank Dave Lewis, Mark Stoodley, Guy Lemieux, Jason
`Anderson, and Jason Podaima. We also appreciate several interesting discussions on
`FPGA CAD with Russ Tessier of MIT.
`
`The authors are grateful for many enlightening discussions with people in the FPGA
`industry. Particular thanks are due to Steve Trimberger, Steve Young, and Bill Carter
`of Xilinx, Frank Heile and David Mendel of Altera, Sinan Kaptanoglu of Actel, and
`Tim Lacey of Cypress. We appreciate the helpful suggestions of Jason Cong, Steve
`Trimberger, Sinan Kaptanoglu, Corinna Lee, Stephen Brown, Dave Lewis, and Derek
`Corneil concerning the Ph.D. thesis on which the majority of this book is based.
`
`We gratefully acknowledge the financial support of our work provided by the Natural
`Sciences and Engineering Research Council of Canada (via an NSERC 1967 Science
`and Engineering Scholarship), the Information Technology Research ~Centre of
`Ontario, the Walter C. Sumner Foundation, a V. L. Henderson Research Fellowship,
`and Xilinx. Finally, we appreciate the 0.35 µm process data provided to us by the
`Canadian Microelectronics Corporation and the Taiwan Semiconductor Manufactur(cid:173)
`ing Company (TSMC).
`
`

`

`- -___,,_---~-!-
`
`.
`
`r
`r
`
`--
`
`- -
`
`\
`
`-
`
`-- - -
`
`-
`
`- -
`
`-
`
`

`

`CHAPTER 2
`
`Background and
`Previous Work
`
`The first half of this chapter provides background information about FPGA architec(cid:173)
`tures, and briefly describes the prior work relevant to this book. The second half of
`the chapter describes the CAD flow used to automatically map circuits into FPGAs
`and determine their speed, a_nd summarizes some of the prior work in the relevant
`areas of CAD.
`
`2.1 FPGA Architecture
`
`All FPGAs are composed of three fundamental components: logic blocks, 1/0 blocks
`and programmable routing, as Figure 2.1 shows [20). A circuit is implemented in an
`FPGA by programming each of the logic blocks to implement a small portion of the
`logic required by the circuit, and each of the 1/0 blocks to act as either an input pad or
`an output pad, as required by the circuit. The programmable routing is configured to
`make all the necessary connections between logic blocks and from logic blocks to I/O
`blocks. The following section briefly describes the basic technologies used to make
`FPGAs programmable. Sections 2.1.2 and 2.1.3 then describe some of the different
`architectures and prior research for FPGA logic blocks and routing, respectively.
`
`2.1.1 FPGA Programming Technologies
`
`There are three different approaches to making FPGAs programmable [1, 20). The
`most popular technology today uses SRAM cells to control pass transistors, multi-
`
`

`

`

`

`

`

`

`

`

`

`

`

`2.1 FPGA Architecture 17
`
`Brown et al investigated different segment length distributions in order to optimize
`the speed and area of an island-style FPGA [36, 37, 38]. They placed circuits, then
`routed them with a two-step (global then detailed) routing procedure, and evaluated
`the resulting speed and area for each FPGA architecture of interest. They found that
`segments of length greater than 3 or 4 were unnecessary, and that segments of length
`1 were also not very important. The best segmentation distributions, in terms of
`speed/area trade-off, had 40 - 70% length 3 wires, 10 - 50% length 2 wires, and 10 -
`20% length 1 wires. Their conclusions may have been affected by several factors,
`however. First, in the two-step routing scheme they use, the global router is unaware
`of the segmentation distribution, and hence is not attempting to optimize for it. This
`may make it difficult to use long wires effectively. Second, their area metric was
`number of tracks per channel required to route, W, while FPGA area is usually deter(cid:173)
`mined by transistor area. Third, their speed metric was average net delay, rather than
`the circuit critical path. Finally, the tools did not use timing analysis to determine
`which connections needed high-speed routing, and which could use area-efficient
`routes.
`
`Chow et al [39] performed a similar study of segmentation distributions in an island(cid:173)
`style FPGA, using similar tools to Brown et al [36]. They did not evaluate circuit
`speed, but assumed that longer wires would lead to faster circuits. Hence, they looked
`for segmentation distributions that incorporated length 2 and 3 wires but required only
`slightly higher W values to route circuits than a segmentation distribution where all
`wires were length 1. They also investigated a new architectural parameter, internal
`population, that is relevant to wires which span more than one logic block. All wires
`are assumed to be able to connect to other wires and to logic blocks at their endpoints.
`A wire segment is internally switch-block populated if it can make connections from
`its interior to other wire segments (via switch blocks); otherwise it is switch-block
`depopulated. Similarly, a wire that can connect to from its middle to logic blocks (via
`connection blocks) is internally connection-block populated. Figure 2.6 illustrates
`the four different internal population possibilities for segments. Notice that a length 3
`wire that is connection-block depopulated can connect to only 2 of the 3 logic blocks
`it passes. A length 3 wire that is switch-block depopulated can connect to wires in
`only two of the four vertical channels that pass by it. Internally depopulated wire seg(cid:173)
`ments are faster than populated wire segments because there are fewer switches, and
`hence less parasitic capacitance, connected to them. Chow et al found that the use of
`switch-block depopulated wires resulted in a significant increase in the required value
`of W to route circuits, while connection-block depopulation caused only a small ( ~0 -
`10%) increase in W. They also found that many architectures with 0 - 60% length 2,
`10 - 40% length 3, and the remainder length 1 wires required only about 10% more
`tracks per channel than an all length 1 architecture.
`
`

`

`

`

`2.2 CAD for FPGAs 19
`
`Circuit description (VHDL, schematic, ... )
`
`Synthesize to logic blocks
`
`Place logic blocks in FPGA
`
`Route connections between logic blocks
`
`FPGA programming file
`
`FIGURE 2. 7 FPGA CAD flow.
`
`Figure 2.7. In the following three sections we will describe the synthesis, placement
`and routing problems and briefly outline some of the prior work in each area. Once a
`circuit is placed and routed, delay models and timing analysis are used to assess its
`speed; we describe these problems in Sections 2.2.4 and 2.2.5.
`
`2.2.1 Synthesis and Logic Block Packing
`
`The first stage of synthesis converts the circuit description, which is usually in a hard(cid:173)
`ware description language or schematic form, into a netlist of basic gates. Then, the
`logic synthesis process converts this netlist of basic gates into a netlist of FPGA logic
`blocks such that the number of logic blocks needed is minimized and/or circuit speed
`is maximized. Logic synthesis is sufficiently complex that it is usually broken into
`two or more subproblems [41]; in this book we will use a three-stage synthesis flow as
`shown in Figure 2.8.
`
`Technology-independent logic optimization removes redundant logic and simplifies
`logic wherever possible [ 41, 42]. The optimized netlist of basic gates is then mapped
`to look-up tables [ 43, 44, 45, 46, 47]. Both of these problems have been extensively
`studied and good algorithms and tools capable of targeting the FPGAs we are inter(cid:173)
`ested in studying are publicly available, so this book does not study these phases of
`the synthesis process.
`
`The third synthesis step in Figure 2.8 is necessary whenever an FPGA logic block
`contains more than a single LUT. Logic block packing groups several LUTs and reg-
`
`

`

`

`

`2.2 CAD for FPGAs 21
`
`the "natural" clustering solution enough to significantly impact the result quality. It is
`difficult for labelling methods to simultaneously satisfy both a constraint on the num(cid:173)
`ber of inputs to a cluster (pin constraint) and the maximum number of items (e.g.
`LUTs) in a cluster (size constraint). Yang and Wong [60] present a labelling algo(cid:173)
`rithm which can simultaneously satisfy both size and pin constraints, but it tends to
`produce large amounts of logic replication and to only partially fill clusters. Hence it
`requires a large number of clusters to cover a circuit.
`
`In [61], Cong et al describe a logic block packing tool capable of targeting several dif(cid:173)
`ferent types of logic blocks. It uses a closeness metric to determine the desirability of
`putting two LUTs into the same logic block, and it respects any constraints restricting
`which LUTs can be packed together. In [61], the packing step is performed via maxi(cid:173)
`mum weighted matching on a graph representing the circuit. This is an O(nm) algo(cid:173)
`rithm, where n is the number of LUTs, and mis the number of edges in the graph [62,
`63]. For the logic clusters we study, the number of edges in this "closeness" graph, m,
`is O(n2), which would lead to an O(n3) algorithm. 1
`
`In [64], Shin and Kim describe a greedy algorithm to transform a netlist of circuit
`blocks into a netlist of clusters, where each cluster contains approximately the same
`number of circuit blocks, and the circuit blocks in each cluster tend to be highly con(cid:173)
`nected. Initially each circuit block is a cluster. The "closeness" of two clusters is a
`function of how many nets the two clusters share and the size of the cluster that would
`result from merging them. More formally:
`
`(C D) _
`. ClusterSize(C, D)
`NumCommonNet(C, D)
`l
`- min(NumPin(C), NumPin(D)) - a AverageClusterSize
`'
`c oseness
`
`(2.1)
`
`where a is a weighting factor controlling how balanced the cluster sizes should be.
`The two clusters with the largest closeness value are merged, and the closeness values
`of all the other clusters are updated to reflect the change. The merging process termi(cid:173)
`nates when the number of clusters in the netlist falls below some user-specified value.
`While this algorithm was not developed for FPGAs, and does not understand the hard
`constraints on cluster size and number of inputs intrinsic to logic clusters, the general
`approach of greedily optimizing a closeness function is highly adaptable, and hence
`of interest.
`
`1. If the edge weights are all integers within a limited magnitude range, algorithms with com(cid:173)
`plexity somewhat worse than O (n.5m), but less than O(nm) exist [63]. With m of O(n2) this
`would still lead to an algorithm with complexity worse than O(n2·5).
`
`

`

`

`

`

`

`

`

`2.2 CAD for FPGAs 25
`
`in a local area that are being used. Since the Triptych FPGA is usually unroutable in
`regions where the logic blocks are completely used, maintaining a porosity of 50% or
`so across the FPGA is essential. In [86, 87], Nag and Rutenbar describe a simulated
`annealing based tool that performs placement and routing simultaneously in one com(cid:173)
`bined step. After any swap of blocks, all the affected nets are re-routed via a maze
`router. To keep the CPU time reasonable, this maze router is constrained to look at
`only a small number of potential routes when the temperature is high; at lower tem(cid:173)
`peratures, the router is allowed to spend more time looking for routes.

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket