throbber
% 76
`Seg S68
`
`-
`
`LANDOVER
`
`Splash ?
`
`FPGAS in a Custom Computing Machine
`
`GenColl
`
`Jettrey M. Arnold
`Walter J. Kleinfelder
`
`(D CowrunaSogn> anes
`
`TEEE
`
`50 YEARS OF SERVICE °1946-1996
`
`Petitioner Microsoft Corporation - Ex. 1007, Cover 1
`
`Petitioner Microsoft Corporation - Ex. 1007, Cover 1
`
`

`

`Petitioner Microsoft Corporation - Ex. 1007, Cover 2
`
`Petitioner Microsoft Corporation - Ex. 1007, Cover 2
`
`
`
`

`

`Splash 2
`FPGAs in a Custom
`Computing Machine
`
`Petitioner Microsoft Corporation - Ex. 1007, Cover 3
`
`Petitioner Microsoft Corporation - Ex. 1007, Cover 3
`
`

`

`Petitioner Microsoft Corporation - Ex. 1007, Cover 4
`
`Xilinx, the Xilinx logo, XC3090, XC4010, XBLOX, XACT, LCA,
`and Configurable Logic Cell are trademarks of Xilinx, Inc.
`CM-2 and Paris are trademarks of Thinking Machines Corporation.
`VMEbusis a trademark of Motorola Corporation.
`SPARC and SPARCstation are trademarks of SPARC International,
`Inc. Products bearing a SPARC trademark are based on an architec-
`ture developed by Sun Microsystems, Inc. SPARCstation is licensed
`exclusively to Sun Microsystems, Inc.
`UNIX is a trademark of UNIX System Laboratories.
`Sun, Sun Workstation, SunOS, and SBus are trademarks of Sun Mi-
`crosystems, Inc.
`Design Compiler and FPGA Compiler are trademarks of Synopsys,
`Inc.
`
`DECis a trademark of Digital Equipment Corporation.
`Verilog is a trademark of Cadence Design Systems, Inc.
`
`Petitioner Microsoft Corporation - Ex. 1007, Cover 4
`
`

`

`Splash 2
`FPGAs in a Custom
`Computing Machine
`
`Petitioner Microsoft Corporation - Ex. 1007, Cover 5
`
`Duncan A. Buell
`Jeffrey M. Arnold
`Walter J. Kleinfelder
`Editors
`Center for Computing Sciences
`Bowie, Maryland
`
`IEEE Computer Society Press
`Los Alamitos, California
`
`@
`
`Tokyo
`
`Washington
`
`@
`
`Brussels
`
`Petitioner Microsoft Corporation - Ex. 1007, Cover 5
`
`

`

`' Library of Congress Cataloging-in-Publication Data
`
`CIP
`
`Buell, Duncan A.
`Splash 2: FPGAsin a custom computing machine / Duncan A. Buell,
`Jeffrey M. Arnold, Walter J. Kleinfelder.
`p.
`om.
`Includes bibliographical references and index.
`ISBN 0-8186-7413-X
`2. Electronic digital computers—Design
`1. Spash 2 (Computer)
`and construction.
`|. Arnold, Jeffrey M.
`fl. Kleinfelder, Walter J.
`Ill. Title.
`QA76.8.S65B84
`004.2 ' 2—dc20
`
`1996
`
`95-47397
`
`IEEE Computer Society Press
`10662 Los VaquerosCircle
`P.O. Box 3014
`Los Alamitos, CA 90720-1264
`
`Copyright © 1996 by TheInstitute of Electrical and Electronics Engineers, Inc. All rights reserved.
`Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries are permitted
`to photocopy isolated pages beyond the limits of US copyright law, for private use of their patrons. Other
`copying, reprint, or republication requests should be addressed to: IEEE Copyrights Manager, [EEE Service
`Center, 445 Hoes Lane, P.O. Box 1331, Piscataway, NJ 08855-1331.
`
`IEEE Computer Society Press Order Number BP07413
`Library of Congress Number 95-47397
`ISBN 0-8186-7413-X%
`
` a
`
`
`
`.
`.
`ame S
`Additional copies may be orderedfrom:
`feAt
`.
`Hoye
`IEEE Computer Society
`IEEEService Center
`IEEE Computer Society
`ra9 IEEE Okhputer Sofiety Press
`
`
`~ Custer Service Center 13, Avenue del’Aquilon§Ooshima Building445 Hoes Lane
`_
`19662 Los Vaqugros Circle
`P.O. Box 1331
`B-1200 Brussels
`2-19-1 Minami-Aoyama
`P.O. Box 3Q14
`Piscataway, NJ 08855-1331
`BELGIUM
`Minato-ku, Tokyo 107
`Los Alamitoy CA 90720-1264
`Tel: +1-908-981-1393
`Tel: +32-2-770-2198
`JAPAN
`3S
`SS" Tel: +1-$74-821-8380
`Fax: +1-908-981-9667
`Fax: +32-2-770-8505
`Tel: +81-3-3408-3118
`LL Fax:t-714-821-4641
`mis.custserv@computer.org
`euro.ofe@computer.org
`Fax: +81-3-3408-3553
`
`Boll: cs.books@computer.org
`
`tokyo.ofc@computer.org
`
`Assistant Publisher: Matt Loeb
`
`Technical Editor: Dharma P. Agrawal
`Acquisitions Assistant: Cheryl Smith
`Advertising/Promotions: Tom Fink
`Production Editor: Lisa O’ Conner
`Cover Image: Dan Kopetzky, Center for Computing Sciences
`
`Printed in the United States of America
`
`® The Institute of Electrical and Electronics Engineers,Inc
`
`Petitioner Microsoft Corporation - Ex. 1007, Cover 6
`
`Petitioner Microsoft Corporation - Ex. 1007, Cover 6
`
`

`

`Are
`See Sébe
`1196
`Contents
`
`
`
`PREFACE
`
`1 CUSTOM COMPUTING MACHINES: AN INTRODUCTION
`
`Petitioner Microsoft Corporation - Ex. 1007, p. TOC v
`
`Petitioner Microsoft Corporation - Ex. 1007, p. TOC v
`
`xi
`
`1
`
`10
`
`1.1
`
`1.2
`
`Introduction
`
`1
`
`The Context for Splash 2
`
`4
`
`FPGAs, 4
`1.2.1
`1.2.2 Architecture, 5
`1.2.3
`Programming, 6
`
`2 THE ARCHITECTURE OF SPLASH 2
`
`2.1
`
`2.2
`
`2.3
`
`2.4
`
`2.5
`
`Introduction
`
`10
`
`The Building Blocks
`
`11
`
`The System Architecture
`
`12
`
`Data Paths
`
`13
`
`The Splash 2 Array Board
`
`16
`
`The Linear Array, 16
`2.5.1.
`The Splash 2 Crossbar, 16
`2.5.2
`2.5.3 Xilinx Chip XO and Broadcast Mode, 17
`
`2.6
`
`The Interface Board and Control Features
`
`17
`
`

`

`vi
`
`3 HARDWARE IMPLEMENTATION
`
`Contents
`
`19
`
`3.1
`
`3.2
`
`3.3
`
`Introduction
`
`19
`
`Development Board Design
`
`21
`
`Interface Board Design
`
`21
`
`3.3.1 DMA Channel, 23
`3.3.2 XL and XR, 23
`3.3.3
`Interrupts, 24
`3.3.4 Clock, 24
`3.3.5
`Programming and Readback, 24
`3.3.6 Miscellaneous Registers, 25
`
`3.4
`
`Array Board Design
`
`25
`
`Processing Element, 26
`3.4.1
`3.4.2. Control Element, 28
`3.4.3
`External Memory Access, 28
`3.4.4 Crossbar, 28
`3.4.5
`Programming and Readback, 29
`3.4.6 Miscellaneous Registers, 29
`
`4 SPLASH 2: THE EVOLUTION OF A NEW ARCHITECTURE
`
`31
`
`4.1
`
`4.2
`
`4.3
`
`4.4
`
`4.5
`
`4.6
`
`4.7
`
`4.8
`
`4.9
`
`Splash 1
`
`31
`
`Splash 2: Thoughts on a Redesign
`
`34
`
`Programming Language
`
`36
`
`Choice of FPGAs
`
`37
`
`Choice of Host and Bus
`
`38
`
`Chip-to-Chip Interconnections
`
`39
`
`Multitasking
`
`42
`
`Chip XO and Broadcast
`
`Other Design Decisions
`
`43
`
`43
`
`5 SOFTWARE ARCHITECTURE
`
`46
`
`5.1
`
`5.2
`
`5.3
`
`Introduction
`
`46
`
`Background
`
`47
`
`VHDL as a Programming Language
`
`49
`
`5.3.1 History and Purpose of VHDL, 50
`5.3.2 WHDL Language Features, 50
`5.3.3.
`Problems with VHDL, 51
`
`5.4
`
`Software Environment
`
`51
`
`Petitioner Microsoft Corporation - Ex. 1007, TOC vi
`
`Petitioner Microsoft Corporation - Ex. 1007, TOC vi
`
`

`

`Contents
`
`‘
`
`5.5
`
`Programmer’s View of Splash 2
`
`55
`
`Programming Process, 55
`5.5.1
`Processing Element View, 56
`5.5.2
`Interface Board View, 57
`5.5.3.
`5.5.4 Host View, 57 —
`
`6 SOFTWARE IMPLEMENTATION
`
`6.1
`
`6.2
`
`Introduction
`
`60
`
`VHDL Environment
`
`60
`
`Splash 2 VHDL Library, 61
`6.2.1
`6.2.2. Standard Entity Declarations, 61
`6.2.3
`Programming Style, 64
`
`6.3
`
`Splash 2 Simulator
`
`66
`
`Structure, 66
`6.3.1
`Configuring the Simulator, 67
`6.3.2
`Input and Output, 68
`6.3.3
`6.3.4 Crossbar and Memory Models, 68
`6.3.5 Hardware Constraints, 70
`
`6.4
`
`Compilation
`
`70
`
`Logic Synthesis, 70
`6.4.1
`Physical Mapping, 71
`6.4.2
`6.4.3 Debugging Support, 71
`
`6.5
`
`Runtime System 72
`
`T2: A Symbolic Debugger, 72
`6.5.1
`Runtime Library, 73
`6.5.2
`6.5.3. Device Driver, 74
`
`6.6
`
`Diagnostics
`
`75
`
`vii
`
`60
`
`Petitioner Microsoft Corporation - Ex. 1007, TOC vii
`
`Petitioner Microsoft Corporation - Ex. 1007, TOC vii
`
`7 A DATA PARALLEL PROGRAMMING MODEL
`
`77
`
`7.1
`
`7.2
`
`Introduction
`
`78
`
`Data-parallel Bit C 80
`
`7.2.1
`7.2.2
`
`dbC Overview, 80
`dbC Example, 31
`
`73
`
`Compiling from dbC to Splash 2
`
`82
`
`Creating a Specialized SIMD Engine, 83
`7.3.1
`7.3.2 Generic SIMD Code, 84
`7.3.3. Generating VHDL, 84
`
`7A
`
`Global Operations
`
`88
`
`7.4.1 Nearest-Neighbor Communication, 88
`
`

`

`viii
`
`Contents
`
`7.4.2 Reduction Operations, 89
`7.4.3 Host/Processor Communication, 91
`
`75
`
`Optimization: Macro Instructions
`
`92
`
`7.5.1 Creating a Macro Instruction, 93
`7.5.2 Discussion, 94
`
`7.6
`
`77
`
`Evaluation: Genetic Database Search
`
`94
`
`Conclusions and Future Work
`
`95
`
`8 SEARCHING GENETIC DATABASES ON SPLASH 2
`
`97
`
`8.1
`
`Introduction
`
`97
`
`Edit Distance, 98
`8.1.1
`8.1.2 Dynamic Programming Algorithm, 98
`
`8.2
`
`Systolic Sequence Comparison
`
`100
`
`Bidirectional Array, 100
`8.2.1
`8.2.2 Unidirectional Array, 103
`
`8.3
`
`Implementation
`
`104
`
`8.3.1 Modular Encoding, 105
`8.3.2 Configurable Parameters, 106
`8.3.3. Bidirectional Array, 107
`8.3.4 Unidirectional Array, 107
`
`8.4
`
`8.5
`
`8.6
`
`Benchmarks
`
`107
`
`Discussion
`
`108
`
`Conclusions
`
`108
`
`9
`
`TEXT SEARCHING ON SPLASH 2
`
`110
`
`9.1
`
`9.2
`
`9.3
`
`9.4
`
`9.5
`
`9.6
`
`Introduction
`
`110
`
`The Text Searching Algorithm 111
`
`Description of the Single-Byte Splash Program 113
`
`Timings, Discussion
`
`114
`
`Outline of the 16-bit Approach
`
`115
`
`Conclusions
`
`116
`
`10 FINGERPRINT MATCHING ON SPLASH 2
`
`117
`
`10.1
`
`Introduction
`
`117
`
`10.2.
`
`Background
`
`120
`
`Petitioner Microsoft Corporation - Ex. 1007, TOC viii
`
`Petitioner Microsoft Corporation - Ex. 1007, TOC viii
`
`

`

`Contents
`
`ix
`
`10.2.1 Pattern Recognition Systems, 121
`10.2.2 Terminology, 122
`10.2.3 Stages in AFIS, 123
`
`10.3.
`10.4
`
`Splash 2 Architecture and Programming Models
`Fingerprint Matching Algorithm 125
`
`125
`
`10.4.1 Minutia Matching, 126
`10.4.2 Matching Algorithm, 127
`
`10.5
`
`Parallel Matching Algorithm 128
`
`10.5.1 Preprocessing on the Host, 131
`10.5.2 "Computations on Splash, 132
`10.5.3. VHDL Specification for X0, 133
`
`Simulation and Synthesis Results
`
`134
`
`Execution on Splash 2
`
`137
`
`10.7.1 User Interface, 137
`10.7.2 Performance Analysis, 137
`
`10.6
`
`10.7.
`
`10.8
`
`Conclusions
`
`139
`
`Petitioner Microsoft Corporation - Ex. 1007, TOC iv
`
`Petitioner Microsoft Corporation - Ex. 1007, TOC iv
`
`11 HIGH-SPEED IMAGE PROCESSING WITH SPLASH 2
`
`141
`
`11.1
`
`Introduction
`
`141
`
`11.2
`
`11.3.
`
`11.4
`
`11.5
`
`The VTSplash System 142
`
`Image Processing Terminology and Architectural Issues
`
`143
`
`Case Study: Median Filtering
`
`150
`
`Case Study: Image Pyramid Generation
`
`153
`
`11.5.1 Gaussian Pyramid, 154
`11.5.2 Two Implementations for Gaussian Pyramid on Splash 2, 155
`11.5.3. The Hybrid Pipeline Gaussian Pyramid Structure, 157
`11.5.4 The Laplacian Pyramid, 157
`11.5.5 Implementation of the Laplacian Pyramid on Splash 2, 159
`
`11.6
`
`Performance
`
`159
`
`11.7.
`
`Summary
`
`163
`
`12 THE PROMISE AND THE PROBLEMS
`
`166
`
`12.1
`
`Some Bottom-Line Conclusions
`
`166
`
`12.1.1 High Bandwidth I/O Is a Must, 166
`12.1.2 Memory Is a Must, 167
`12.1.3. Programming Is Possible, and Becoming More So, 168
`12.1.4 The Programming EnvironmentIs Crucial, 168
`
`12.2
`
`To Where from Here?
`
`169
`
`

`

`
`
`12.3.
`
`If Not Splash 3, Then What?
`
`171
`
`12.3.1. Architectures, 172
`12.3.2 Custom Processors, 173
`12.3.3 Languages, 174
`
`The “Killer” Applications
`
`177
`
`Final Words
`
`178
`
`12.4
`
`12.5.
`
`SPLASH 2 DEVELOPMENT—THE PROJECT MANAGER’S
`SUMMARY
`
`AN EXAMPLE APPLICATION
`
`REFERENCES
`
`Contents
`
`179
`
`186
`
`190
`
`Petitioner Microsoft Corporation - Ex. 1007, TOC x
`
`Petitioner Microsoft Corporation - Ex. 1007, TOC x
`
`
`

`

`Preface
`
`
`
`Petitioner Microsoft Corporation - Ex. 1007, Preface xi
`
`The Splash 2 project began at the Supercomputing Research Center! in September of
`1991 and ended, with success, in the spring of 1994. Splash 2 is an attached processor
`system using Xilinx XC4010 FPGAsas its processing elements. As such, it is a
`custom computing machine. Thatis to say that much of what would bethe instruction
`set architecture of the processing elements is not specified except in the details of the
`program developed by the application programmer. Although a higher-level block
`diagram of processing elements, memories, interconnect, and dataflow exists in the
`hardware structure of Splash 2, the details of the instruction set architecture level of
`the machine will vary from one application to the next.
`The Splash 2 project is significant for two reasons. First, Splash 2 is part of a
`complete computer system that has achieved supercomputer-like performance on a
`numberof different applications. By “complete computer system” we mean that SRC
`created or caused to be created an extant hardware system (replicated a dozen times),
`a complete programming and runtime environment, and a collection of application
`programsthat exercised the unique hardware.
`The secondsignificant aspect of Splash 2 is that we were fortunate enough to
`be able to build a large system, capable of performing real computations on real
`problems. One common complaint about performance results on novel computing
`machines or environments is that results on small problems cannot be accurately
`extrapolated to large problems. The Splash 2 system that was designed and built is
`a full-sized machine and does not suffer from this defect.
`To get to the point: why a book?
`
`1Renamed the Center for Computing Sciences in May 1995, but referred to throughout this book
`as SRC.
`
`xi
`
`Petitioner Microsoft Corporation - Ex. 1007, Preface xi
`
`

`

`xii
`
`Preface
`
`This is a novel computing machine. In order to understand what happens when
`the application programmer is permitted, indeed required, to design the processor
`architecture of the machine that will execute his program, it is necessary to see the
`system as a whole. Programmability and problems to be run on this machine both
`had majorinfluencesonits architecture, just as its architecture and its unique nature
`influence the kinds of problems one would expect to program for this machine and
`the nature of that programming. And standing between the user and the machine,
`as the old joke goes, is a new kind of programming environment and an evolving
`understanding of how this environment must allow the use of the hardware, without
`forcing every programmerto be a hardware-design engineer.
`Atthe first IEEE Workshop on FPGAs for Custom Computing Machines, one
`of the industrial attendees remarked that, although nearly everyone would agree, as
`part of a coffee-room discussionorthe like, that it would be interesting to think about
`building a “computer” using FPGAs, no one in management(except perhaps at SRC
`and DEC PRL) had put up the commitment necessary in time and money to do a
`real test of the idea. It was then remarked that, given the nature of the marketplace
`and of engineering management,thesefirst attempts had to be successful in order to
`open the door for future work. We feel we have been successful, and we offer in
`this book an in-depth look at one of the handful of data points in the design space
`of this new kind of machine.
`We would hope that this book has a broad appeal and is readable with under-
`standing by nearly all computer scientists and computer engineers. To the hardware
`designer, perhaps we can offer a new look at programming applications on a mod-
`erately general FPGA-based computing machine instead of designing circuits for a
`specific board incorporating FPGAs. The engineering world has viewed FPGAs, to a
`great extent, as the next logical step in a continuum ofelectronic devices; we offer,
`we feel, the option of viewing them much more broadly than that. To the computer
`architect we offer a variant hardware platform and_-an indication of how that general
`platform can be used. Much of computerarchitecture is a compromise between the
`functionality desired and the limits of what can be built given existing technology; we
`offer the use of a new technology that can offer, to a limited extent now, and could
`offer much more generally later, greatly increased functionality. For some of those
`who have hard problems in computation, we offer much of the power of special-
`purpose hardware without the inflexibility and uncertain delivery times of hardware.
`The long-term task is not to map a high-level languageto a particular architecture
`or range of architectures, but in some sense to create for each application program
`a suitable architecture to which the high-level language will be mapped. And to the
`language designers and compiler writers we offer a world to conquer. We have pre-
`sented one imperfect but usable approach to programming such a computing machine,
`and wetrust that others interested in the critical problem of making these machines
`programmable can learn both from what we did right and what we did wrong.
`Chapter 1 discusses the general concept of Custom Computing Machines, of
`which Splash 2 is one example. Chapters 2 and 3 describe at a high level and
`then in some detail the hardware architecture of Splash 2. Chapter 4 covers the
`design considerations and decisions in arriving at the second-generation Splash 2
`architecture. We present this chapter at the end of the section on hardware, on the
`basis that it is easier to understand variations in a design when those variations are
`compared against something concrete.
`
`Petitioner Microsoft Corporation - Ex. 1007, Preface xii
`
`Petitioner Microsoft Corporation - Ex. 1007, Preface xii
`
`

`

`Preface
`
`xiii
`
`Chapters 5 and 6 describe, also first at a higher and then at a lowerlevel, the
`software architecture of Splash 2. All the application programs in the latter chapters
`were done using VHDL as an applications programming language and these tools
`in support. The main goal of the Splash 2 project was to show that software, as
`described in Chapters 5 and 6, could make a computer using FPGAsasits processing
`elements into something that reasonable people would call “programmable,” and, in
`that sense, the heart of the Splash 2 project is in Chapters 5 and 6. Throughout
`the life of Splash 2, however, there has been an alternative view of programming.
`This view is reflected in Chapter 7 on the Splash 2 version of the programming
`language dbC. The approach taken in dbC is to permit the programmerto use a
`version of C as the programming language. It is the compiler which then becomes
`responsible for, in essence, recognizing the instruction set architecture necessary to
`execute the program and then creating in the FPGA the requisite registers, logic units,
`and control.
`Chapters 8 through 11 then describe four different applications programmed
`to conclusion on Splash 2. The first of these—the sequence comparison problem—
`was the driving application, in the sense that funding for Splash 2 was based onits
`perceived ability to perform this computation. This and the text processing application
`were done at SRC.
`The Splash 2 project team was fortunate in that SRC’s parent company, the
`Institute for Defense Analyses, issued two contracts, to Virginia Polytechnic Institute
`and to Michigan State University, for applications work on Splash 2 in image pro-
`cessing and fingerprint identification. Both applications seemed good matches with
`the Splash 2 architecture but lay outside the normal realm of SRC’s research program.
`The faculty members involved have each prepared a chapter on these applications.
`Weclose in Chapter 12 with some opinions and speculations about the future.
`In an appendix, the project manager presents a chronology of the entire Splash 2
`project.
`It is incumbent on us, and a genuine pleasure, to thank the Center for Com-
`puting Sciences of the Institute for Defense Analyses and the CCS Director, Francis
`Sullivan, for supporting us in our writing and editing of this book. All royalties
`will be donated to the Center for Excellence in Education, formerly known as the
`Rickover Foundation, in McLean, Virginia. The Center for Excellence in Education
`supports science and engineering education through its sponsorship of the Research
`Science Institute each summer for high school seniors, its Role Models and Lead-
`ers Project in Washington, D.C., Los Angeles, and Chicago for promising women
`and minority high school students intending to study science and engineering, and
`its support and preparation of the United States Informatics Olympiad team each
`year.
`
`Petitioner Microsoft Corporation - Ex. 1007, Preface xiii
`
`The Splash 2 project was a success in large part due to the ability of those who
`were involved nearly full-time, but it might not have taken the course it did had the
`hard-core Splash 2 players not had the chance to get advice and occasional help from
`a muchlarger group of experts both at SRC and elsewhere.
`Weacknowledge, therefore, the help and advice of Nate Bronson, Dan Burns,
`Bill Carlson, Neil Coletti, Maripat Corr, Steve Cuccaro, Hillory Dean, Chuck Fiduc-
`cia, Brad Fross, Charles Goedeke, Maya Gokhale, Frank Hady, Dzung Hoang,Bill -
`Holmes, Amy Johnston, Elaine (Davis) Keith, Dan Kopetzky, Andy Kopser, Steve
`Kratzer, Jim Kuehn, Sara Lucas, Michael Mascagni, Marge McGarry, John McHenry,
`
`Petitioner Microsoft Corporation - Ex. 1007, Preface xiii
`
`

`

`
`
`xiv
`
`Preface
`
`Ron Minnich, Lindy Moran, Fred More, Mark Norder, Lou Podrazik, Dan Pryor,
`Craig Reese, Paul Schneck, Brian Schott, Nabeel Shirazi, Doug Sweely, Dennis
`Sysko, Mark Thistle, Bob Thompson, Ken Wallgren, Alice Yuen, Neal Ziring, and
`Jennifer Zito.
`
`Duncan A. Buell
`Jeffrey M. Arnold
`Walter J. Kleinfelder
`Bowie, Maryland
`March 1996
`
`Petitioner Microsoft Corporation - Ex. 1007, Preface xiv
`
`Petitioner Microsoft Corporation - Ex. 1007, Preface xiv
`
`

`

`CHAPTER 1
`
`Custom Computing
`Machines: An Introduction
`
`Duncan A. Buell!
`
`Petitioner Microsoft Corporation - Ex. 1007, p. 1
`
`It is a basic observation about computing that generality and efficiency are in some
`sense inversely related to one another; the more general-purpose an object is and
`thus the greater the numberof tasks it can perform,the less efficient it will be in
`performing anyofthose specific tasks. Design decisions are therefore almost always
`compromises; designers identify key features or applications for which competitive
`efficiency is a must and then expandthe range asfar as is practicable without unduly
`damaging performance on the main targets.
`This thesis has certainly been true in processor architecture of computers aimed
`at computationally intense problems. Vector processors and vector supercomputers
`have targeted computationally intense, array-oriented floating point problems, usually
`in the hard sciences and engineering, but have notsacrificed the necessary speed on
`their core applications in orderto be all things to all people. Thus, on computationally
`intense problemsthat do not fit well on traditional supercomputers, perhaps due to
`suchthings as integer arithmetic or scalar code, fast workstations can often outperform
`supercomputers.
`To counter the problem of computationally intense problems for which general-
`purpose machines cannot achieve the necessary performance, special-purpose proces-
`sors, attached processors, and coprocessors have been built for many years, especially —
`
`1.1 INTRODUCTION
`
`‘A version of this chapter appeared as Buell and Pocek [11] and is used with permission.
`
`Petitioner Microsoft Corporation - Ex. 1007, p. 1
`
`

`

`Custom Computing Machines: An Introduction
`
`Chapter1
`
`
`
`It is no exaggeration to say that machines using FPGAsas their processing
`elements have demonstrated that very high performance on an absolute scale, and
`extraordinary performance when measured againstprice, is possible with this tech-
`nology. The PeRLe machines built at DEC’s Paris Research Laboratory have been
`programmed on a number of applications with impressive results [7, 8, 9, 31]: An
`implemented multiplier can compute a 50-coefficient, 16-bit polynomial convolution
`FIRfilter at 16 times audio real time. An implementation of RSA decryption executes
`at 10 timesthe bit rate of an AT&T ASIC. A Hough Transform implementation for
`an application in high-energy physics achieves a compute rate that a standard 64-bit
`computer could not equal without a 1.2 GHz clockrate.
`As can be seen from the later chapters of this book, some of the applications
`programmed on Splash 2 have achieved similarly promising results. It was a general
`observation made by those involved in the Splash 2 project that, on applications that
`fit the machine, one Splash 2 Array Board could deliver approximately the compute
`powerof one Cray YMPprocessor. One of the commercial licensees of the Splash 2
`technologysells its system for about $40,000. Of course, not all applicationsfit well,
`and most that do not fit well actually fit very badly indeed, butthis is nonetheless a
`performance-to-price ratio substantial enough to warrant continued investigation and
`experimentation.
`The idea of reconfiguring a computer is certainly not new. The Burroughs
`B1700 had multiple instruction sets with different targets (Fortran, COBOL, and
`such) implemented with different microcode. In another way, the Paris functions of
`the Thinking Machines Corp. CM-2 differed from one version to the next. In the
`former case, standard views of hardware instructions were implemented. In the latter
`case, with a novel machine and a new architecture, we presume that an effort was
`made to implement function calls that users were seen to need and to delete unneeded
`functions when the instruction store ran out.
`
`in such areas as image or signal processing (for which many of the computational
`tasks can be very well defined). The problem with such machines is that they are
`special-purpose; as problems change or new ideas and techniques develop, their lack
`of flexibility makes them problematic as long-term solutions.
`Enter the FPGA. Field Programmable Gate Arrays, first introduced in 1986
`by Xilinx [34], were seen rather immediately by a few people to offer a totally
`new avenue to explore in the world of processor engineering. The great strength
`of the computer as a tool is in its ability to be adapted, via programming,
`to a
`multitude of computational tasks. The possibility now existed for an FPGA-based
`computing device not only to be configured to act like special-purpose hardware
`for a particular problem, but to be reconfigured for different problems and for this
`reconfiguration to be a programming process. By being more than single-purpose,
`such a machine would have the advantage of being flexible with at least a limited
`range of different applications. By being programmable, such a machine would open
`up “design of high-performance hardware” to individuals who can “design hard-
`ware” in an abstract sense but not a concrete sense. Finally, by being designed to
`operate as if they were hardware, the applications for these machines can achieve
`the hardware-like performance one gets from having explicitly parallel computa-
`tions, from not having instructions and data fetched and decoded, and from hav-
`ing the ability to design processing units that reflect precisely the processing being
`done.
`
`Petitioner Microsoft Corporation - Ex. 1007, p. 2
`
`Petitioner Microsoft Corporation - Ex. 1007, p. 2
`
`

`

`Section 1.1
`
`Introduction
`
`3
`
`to give to these
`label
`A certain amount of disagreement exists over what
`machines and how to refer to them. The group at DEC’s Paris research lab refers
`to their machine as a Programmable Active Memory (PAM) [7, 9, 29, 31]. Another
`commercial entity uses the term “virtual computer” [12]. From Brown University
`we have Processor Reconfiguration through Instruction Set Metamorphosis (PRISM)
`[4, 5, 6, 32]. We have already used the term “FPGA-based computing device” here.
`That noneofthese are truly satisfactory was evidencedin the spring of 1994 when the
`comp.arch.fpga newsgroup was discussed and established; the most contentious
`point was over the name. Both the new newsgroup and the IEEE workshops we have
`organized use the term FPGA, thus being in some sense bound in terminology to a
`particular technology (unless, of course, one can convince the developers of the next
`technology that its name should allow FPGA as its acronym). We have chosen to
`use the term Custom Computing Machine (CCM). None of these terms is perfect,
`but we believe that this one is no worse than any of the others.
`The work on CCMsalso differs from what is considered reconfigurable com-
`puter architecture, in that the term “reconfigurable” usually refers to a much higher
`level of the system. In the case of CCMs,that which is reconfigurable and significant
`by virtue of its reconfigurability is the “processor architecture” itself. A reconfig-
`urable computer, by contrast, is likely to be either a multiprocessor in which the
`interconnections among the processors can change, or a heterogeneous machine with
`processors of different kinds that a user can choose to include or exclude in the view
`he/she has of “the computer.”
`It is to be emphasized that this is not a mature computing technology and that
`CCMsare not a panacea for all problems in high-performance computing. Among
`the many issues and problemsare the following:
`
`Petitioner Microsoft Corporation - Ex. 1007, p. 3
`
`1. Are FPGAslarge enough,or will they become large enough,so that a significant
`unit of computation can be put on a single chip so as to avoid both the loss
`of efficiency in going off-chip and the problems in partitioning a computation
`across multiple chips?
`2. With current technology, even in the best of circumstances, the user must be
`exposed to the hardware itself. What is the level of understanding about chip
`architecture, signal routing delays, and so forth, that a “programmer” must know
`in order to use a CCM? How much more must be known in order to obtain
`the performance that would warrant using a CCM instead of a general-purpose
`computer?
`3. If these machines are to be viewed as “computers,” then they must be capable of
`being programmed. How will this be done? Whatsort of programming language
`is appropriate? How do we “compile” when we have eliminated the underlying
`machine model?
`4. Granting the point that these are limited-purpose, but not special-purpose,
`machines, what is the range of architectures needed to cover the spectrum
`of applications for which these machines make sense?
`5. Whatare the cost/performance figures necessary to makethis a viable approach
`for getting a computing task done? General-purpose machines are cheaper and
`easier to use but can be slow. ASICs and special-purpose devices are faster
`but more expensive in small quantities, take longer to develop, and are hard to
`modify. Where might CCMsfit between these two?
`
`Petitioner Microsoft Corporation - Ex. 1007, p. 3
`
`

`

`4
`
`Custom Computing Machines: An Introduction
`
`Chapter1
`
`1.2 THE CONTEXT FOR SPLASH 2
`
`1.2.1 FPGAs
`
`One problem faced by those involved in Splash 1 and Splash 2 at the Super-
`computing Research Center? has been a stubborn refusal from some quarters to
`believe that achieving high performance on a CCM is possible without a design
`or programming agony so great as to be offputting to all but the most dedicated
`of “application designers/programmers.” Even in the face of our evidence to the
`contrary, the case has been difficult to make to somecritics.
`The case is especially hard because what is needed is to build a complete
`hardware system,to create or cause to be created a programming environment worthy
`of being called a programming environment, and then to develop a variety of different
`applications so that the proof of concept is complete. Further, since the goal is
`to demonstrate a competitive performance with more expensive and sophisticated
`machines, the CCM must be big enough to do real work and to be part of complete
`computational processing environments; it cannot be just a toy machine suitable
`only for doing kernels of problems. To our knowledge, only two such machines
`have been built that meet these criteria—Splash 2 and the larger of the DEC PAM
`systems.
`The goal in this bookis to present the Splash 2 experience as a whole. Splash 2
`was designed and developed in an iterative process from top to bottom to top and
`back again.
`
`Petitioner Microsoft Corporation - Ex. 1007, p. 4
`
`FPGAsin general have a wide spectrum of characteristics, but the FPGAs used for
`CCMshave been of two distinct types. The Xilinx XC4010, a typical example of
`one type, is a chip containing a 20 x 20 square array of Configurable Logic Blocks
`(CLBs) [34]. Each CLB can serve one ofthree functions, either as two flipflops, or
`as Boolean functions of nine inputs, or as 32 bits of RAM. The function use has
`two 4-inputfunctions, each producing an output; these two bits combine with a ninth
`input in a 3-input Boolean function. The RAM usage simply takes advantageofthe
`fact that the 4-input functions are done with lookup tables to allow the inputbits to
`be viewed as addresses.
`Connecting the CLBsto one anotherand to special Input Output Blocks (IOBs)
`on the periphery ofthe chip are routing resources running from CLB to CLB, skipping
`one CLB, or running the full length of the chip. Configuration of the FPGA is done
`by loading a bit file onto on-chip RAM.
`In contrast to the relatively coarse granularity of the Xilinx chips, the second
`type of FPGA, by Algotronix, Ltd., and by Concurrent Logic, Inc. [1, 13], is fine-
`grained. The Algotronix chip is a 32 x 32 array of 2-input, 1-output Boolean function
`logic cells, with the signal lines running only point to point from one cell to its
`neighbors in each rectangular direction?
`
`2The Supercomputing Research Center was renamed the Center for Computing Sciences in May
`1995, but will be referred to throughout this book as SRC.
`3Algotronix is now a part of Xilinx.
`
`Petitioner Microsoft Corporation - Ex. 1007, p. 4
`
`

`

`Section 1.2
`
`The Context for Splash 2
`
`5
`
`Toafirst-order approximation,the chipsfirst marketed by Concurrent Logic and
`now by Atmel* resemble the Algotronix chip. Interestingly enough, the unscientific
`best-guess estimates at the SRC in thefall of 1991, when S

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket