throbber
QA 76
`-
`.8
`.368 S68
`
`1996
`
`LAN DOVER
`
`GenCoH
`
`splaSh z
`
`EFEM Ira % WWW Emmottlaw Momma
`
`
`
` Emma A r.
`
`iflmmy M. magi
`
`mm? EL fittofimfigétfiw
`
`IEEE
`
`Petitioner Microsoft Corporation - Ex. 1007, Cover 1
`
`@W QgréeiasirgéarefihégagficaAND
`
`SQYEARS OFSERVICE-~1946- 1996
`
`Petitioner Microsoft Corporation - EX. 1007, Cover 1
`
`

`

`
`
`
`
`Petitioner Microsoft Corporation - EX. 1007, Cover 2
`
`Petitioner Microsoft Corporation - Ex. 1007, Cover 2
`
`

`

`Splash 2
`FPGAs in a Custom
`Computing Machine
`
`
`
`Petitioner Microsoft Corporation - EX. 1007, Cover 3
`
`Petitioner Microsoft Corporation - Ex. 1007, Cover 3
`
`

`

`Xilinx, the Xilinx logo, XC3090, XC4010, XBLOX, XACT, LCA,
`and Configurable Logic Cell are trademarks of Xilinx, Inc.
`CM-2 and Paris are trademarks of Thinking Machines Corporation.
`VMEbus is a trademark of Motorola Corporation.
`SPARC and SPARCstation are trademarks of SPARC International,
`Inc. Products bearing a SPARC trademark are based on an architec-
`ture developed by Sun Microsystems, Inc. SPARCstation is licensed
`exclusively to Sun Microsystems, Inc.
`UNIX is a trademark of UNIX System Laboratories.
`Sun, Sun Workstation, SunOS, and SBus are trademarks of Sun Mi-
`crosystems, Inc.
`
`Design Compiler and FPGA Compiler are trademarks of Synopsys,
`Inc.
`
`DEC is a trademark of Digital Equipment Corporation.
`Verilog is a trademark of Cadence Design Systems, Inc.
`
`
`
`Petitioner Microsoft Corporation - EX. 1007, Cover 4
`
`Petitioner Microsoft Corporation - Ex. 1007, Cover 4
`
`

`

`§plash 2
`FPGAs in a Custom
`Computing Machine
`
`Duncan A. Buell
`
`Jeffrey M. Arnold
`Walter J. Kleinfelder
`Editors
`
`Center for Computing Sciences
`Bowie, Maryland
`
`IEEE Computer Society Press
`Los Alamitos, California
`
`Washington
`
`0
`
`Brussels
`
`0
`
`Tokyo
`
`
`
`Petitioner Microsoft Corporation - EX. 1007, Cover 5
`
`Petitioner Microsoft Corporation - Ex. 1007, Cover 5
`
`

`

`' Library of Congress CataIoging-in-Publication Data
`
`Buell, Duncan A.
`Splash 2: FPGAs in a custom computing machine / Duncan A. Buell,
`Jeffrey M. Arnold, Walter J. Kleinfelder.
`p.
`cm.
`includes bibliographical references and index.
`ISBN 0-8186-7413-X
`2. Electronic digital computers—Design
`1. Spash 2 (Computer)
`and construction.
`I. Arnold, Jeffrey M.
`II. Kleinfelder. Walter J.
`lII. Title.
`QA76.8.865B84
`004.2 ' 2-—-d020
`
`1996
`
`CIP
`
`95-47397
`
`IEEE Computer Society Press
`
`@ 10662 Los Vaqueros Circle
`
`P.O. Box 3014
`Los Alamitos, CA 90720-1264
`
`Copyright © 1996 by The Institute of Electrical and Electronics Engineers, Inc. All rights reserved.
`Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries are permitted
`to photocopy isolated pages beyond the limits of US copyright law, for private use of their patrons. Other
`copying, reprint, or republication requests should be addressed to: IEEE Copyrights Manager, IEEE Service
`Center, 445 Hoes Lane, PO. Box 1331, Piscataway, NJ 08855-1331.
`
`IEEE Computer Society Press Order Number BP07413
`Library of Congress Number 95-47397
`ISBN 0-8186-7413-X
`
`
`
`.,.W”“
`I“? “or:
`. *v-‘xfl‘
`'
`u
`(' ~'-.}
`1‘#1:“ L "’ IEEE gbfiiputer So iety Press
`.. ‘1
`Customer Service
`enter
`i
`1,0962 Los Vaqu ros Circle
`. (PO. Box 3914
`\‘2\\”;Los Alalpi
`, CA 90720—1264
`{’5’ Tel: +137I 4-821—8380
`Fax: yf-7l4‘821—4641
`fimfiil: cs.books@computer.org
`
` ,m
`
`.
`.
`,
`,
`Additional copies may be orderedfmm:
`
`IEEE Service Center
`445 Hoes Lane
`PO. Box 1331
`Piscataway, NJ 08855-1331
`Tel: +1-908-981—1393
`Fax: +1~908-981-9667
`mis.custserv@computer.org
`
`IEEE Computer Society
`13, Avenue de l’AquiIon
`B—1200 Brussels
`BELGIUM
`Tel: +32-2-770-2198
`Fax: +32-2-770—8505
`euro.ofc@computer.org
`
`IEEE Computer Society
`Ooshima Building
`2-19-1 Minami-Aoyama
`Minato-ku, Tokyo 107
`JAPAN
`TeI: +81—3—3408-31 18
`Fax: +81»3—3408-3553
`toky0.ofc@computer.org
`
`Assistant Publisher: Matt Loeb
`
`Technical Editor: Dharma P. Agrawal
`Acquisitions Assistant: Cheryl Smith
`Advertising/Promotions: Tom Fink
`Production Editor: Lisa O’Conner
`
`Cover Image: Dan Kopetzky, Center for Computing Sciences
`
`Printed in the United States of America
`
`<9 The Institute of Electrical and Electronics Engineers, Inc
`
`Petitioner Microsoft Corporation - Ex. 1007, Cover 6
`
`Petitioner Microsoft Corporation - EX. 1007, Cover 6
`
`

`

`($2474,
`.34.? S5 5’
`m ea
`
`Contents
`
`
`xi
`
`1
`
`10
`
`PREFACE
`
`1 CUSTOM COMPUTING MACHINES: AN INTRODUCTION
`
`1.1
`
`1.2
`
`Introduction
`
`1
`
`The Context for Splash 2
`
`4
`
`FPGAs, 4
`1.2.1
`1.2.2 Architecture, 5
`1.2.3
`Programming, 6
`
`2 THE ARCHITECTURE OF SPLASH 2
`
`2.1
`
`2.2
`
`2.3
`
`2.4
`
`2.5
`
`Introduction
`
`10
`
`The Building Blocks
`
`11
`
`The System Architecture
`
`12
`
`Data Paths
`
`13
`
`The Splash 2 Array Board
`
`16
`
`The Linear Array, 16
`2.5.1
`The Splash 2 Crossbar, 16
`2.5.2
`2.5.3 Xilinx Chip X0 and Broadcast Mode, 17
`
`2.6
`
`The Interface Board and Control Features
`
`17
`
`
`
`Petitioner Microsoft Corporation - EX. 1007, p. TOC V
`
`Petitioner Microsoft Corporation - Ex. 1007, p. TOC v
`
`

`

`vi
`
`3 HARDWARE IMPLEMENTATION
`
`Contents
`
`19
`
`3.1
`
`3.2
`
`3.3
`
`Introduction
`
`19
`
`Development Board Design
`
`21
`
`Interface Board Design
`
`21
`
`3.3.1 DMA Channel, 23
`3.3.2 XL and XR, 23
`3.3.3
`Interrupts, 24
`3.3.4 Clock, 24
`3.3.5
`Programming and Readback, 24
`3.3.6 Miscellaneous Registers, 25
`
`3.4
`
`Array Board Design
`
`25
`
`Processing Element, 26
`3.4.1
`3.4.2 Control Element, 28
`3.4.3
`External Memory Access, 28
`3.4.4 Crossbar, 28
`3.4.5
`Programming and Readback, 29
`3.4.6 Miscellaneous Registers, 29
`
`4 SPLASH 2: THE EVOLUTION OF A NEW ARCHITECTURE
`
`31
`
`4.1
`
`4.2
`
`4.3
`
`4.4
`
`4.5
`
`4.6
`
`4.7
`
`4.8
`
`4.9
`
`Splash 1
`
`31
`
`Splash 2: Thoughts on a Redesign
`
`34
`
`Programming Language
`
`36
`
`Choice of FPGAs
`
`37
`
`Choice of Host and Bus
`
`38
`
`Chip—to—Chip Interconnections
`
`39
`
`Multitasking
`
`42
`
`Chip X0 and Broadcast
`
`43
`
`Other Design Decisions
`
`43
`
`5 SOFTWARE ARCHITECTURE
`
`46
`
`5.1
`
`5.2
`
`5.3
`
`Introduction
`
`46
`
`Background
`
`47
`
`VHDL as a Programming Language
`
`49
`
`5.3.1 History and Purpose of VHDL, 50
`5.3.2 VHDL Language Features, 50
`5.3.3
`Problems with VHDL, 51
`
`5.4
`
`Software Environment
`
`51
`
`Petitioner Microsoft Corporation - Ex. 1007, TOC vi
`
`Petitioner Microsoft Corporation - EX. 1007, TOC Vi
`
`

`

`vii
`
`60
`
`Contents
`
`7
`
`5.5
`
`Programmer’s View of Splash 2
`
`55
`
`Programming Process, 55
`5.5.1
`Processing Element View, 56
`5.5.2
`Interface Board View, 57
`5.5.3
`5.5.4 Host View, 57 A
`
`6 SOFTWARE IMPLEMENTATION
`
`6.1
`
`6.2
`
`Introduction
`
`60
`
`VHDL Environment
`
`60
`
`Splash 2 VHDL Library, 61
`6.2.1
`6.22 Standard Entity Declarations, 61
`6.2.3
`Programming Style, 64
`
`6.3
`
`Splash 2 Simulator
`
`66
`
`Structure, 66
`6.3.1
`Configuring the Simulator, 67
`6.3.2
`Input and Output, 68
`6.3.3
`6.3.4 Crossbar and Memory Models, 68
`6.3.5 Hardware Constraints, 70
`
`6.4
`
`Compilation
`
`70
`
`Logic Synthesis, 70
`6.4.1
`Physical Mapping, 71
`6.4.2
`6.4.3 Debugging Support, 71
`
`6.5
`
`Runtime System 72
`
`T2: A Symbolic Debugger, 72
`6.5.1
`Runtime Library, 73
`6.5.2
`6.5.3 Device Driver, 74
`
`6.6
`
`Diagnostics
`
`75
`
`7 A DATA PARALLEL PROGRAMMING MODEL
`
`77
`
`7.1
`
`7.2
`
`Introduction
`
`78
`
`Data-parallel Bit C 80
`
`7.2.1
`7.2.2
`
`dbC Overview, 80
`dbC Example, 81
`
`7.3
`
`Compiling from dbC to Splash 2
`
`82
`
`Creating a Specialized SIMD Engine, 83
`7.3.1
`7.3.2 Generic SIMD Code, 84
`7.3.3 Generating VHDL, 84
`
`7.4
`
`Global Operations
`
`88
`
`7.4.1 Nearest-Neighbor Communication, 88
`
`
`
`Petitioner Microsoft Corporation - EX. 1007, TOC vii
`
`Petitioner Microsoft Corporation - Ex. 1007, TOC vii
`
`

`

`viii
`
`Contents
`
`7.4.2 Reduction Operations, 89
`7.4.3 Host/Processor Communication, 91
`
`Optimization: Macro Instructions
`
`92
`
`7.5.1 Creating a Macro Instruction, 93
`7.5.2 Discussion, 94
`
`Evaluation: Genetic Database Search
`
`94
`
`Conclusions and Future Work
`
`95
`
`7.5
`
`7.6
`
`7.7
`
`SEARCHING GENETIC DATABASES 0N SPLASH 2
`
`97
`
`8.1
`
`Introduction
`
`97
`
`Edit Distance, 98
`8.1.1
`8.1.2 Dynamic Programming Algorithm, 98
`
`Systolic Sequence Comparison
`
`100
`
`Bidirectional Array, 100
`8.2.1
`8.2.2 Unidirectional Array, 103
`
`Implementation
`
`104
`
`8.3.1 Modular Encoding, 105
`8.3.2 Configurable Parameters, 106
`8.3.3 Bidirectional Array, 107
`8.3.4 Unidirectional Array, 107
`
`Benchmarks
`
`107
`
`Discussion
`
`108
`
`Conclusions
`
`108
`
`8.2
`
`8.3
`
`8.4
`
`8.5
`
`8.6
`
`TEXT SEARCHING ON SPLASH 2
`
`110
`
`9.1
`
`9.2
`
`9.3
`
`9.4
`
`9.5
`
`9.6
`
`Introduction
`
`110
`
`The Text Searching Algorithm 111
`
`Description of the Single—Byte Splash Program
`
`113
`
`Timings, Discussion
`
`114
`
`Outline of the 16-bit Approach
`
`115
`
`Conclusions
`
`1 16
`
`10 FINGERPRINT MATCHING ON SPLASH 2
`
`117
`
`10.1
`
`10.2
`
`Introduction
`
`1 17
`
`Background
`
`120
`
`Petitioner Microsoft Corporation - Ex. 1007, TOC viii
`
`Petitioner Microsoft Corporation - EX. 1007, TOC viii
`
`

`

`Contents
`
`ix
`
`10.2.1 Pattern Recognition Systems, 121
`10.2.2 Terminology, 122
`10.2.3 Stages in AFIS, 123
`
`10.3
`
`Splash 2 Architecture and Programming Models
`
`125
`
`10.4
`
`Fingerprint Matching Algorithm 125
`
`10.4.1 Minutia Matching, 126
`10.4.2 Matching Algorithm, 127
`
`10.5
`
`Parallel Matching Algorithm 128
`
`10.5.1 Preprocessing on the Host, 131
`10.5.2 ' Computations on Splash, 132
`10.5.3 VHDL Specification for X0, 133
`
`10.6
`
`Simulation and Synthesis Results
`
`134
`
`10.7
`
`Execution on Splash 2
`
`137
`
`10.7.1 User Interface, 137
`10.7.2 Performance Analysis, 137
`
`10.8
`
`Conclusions
`
`139
`
`11 HIGH-SPEED IMAGE PROCESSING WITH SPLASH 2
`
`141
`
`11.1
`
`Introduction
`
`141
`
`11.2
`
`The VTSplash System 142
`
`11.3
`
`Image Processing Terminology and Architectural Issues
`
`143
`
`11.4
`
`Case Study: Median Filtering
`
`150
`
`11.5
`
`Case Study: Image Pyramid Generation
`
`153
`
`11.5.1 Gaussian Pyramid, 154
`11.5.2 Two Implementations for Gaussian Pyramid on Splash 2, 155
`11.5.3 The Hybrid Pipeline Gaussian Pyramid Structure, 157
`11.5.4 The Laplacian Pyramid, 157
`11.5.5 Implementation of the Laplacian Pyramid on Splash 2, 159
`
`11.6
`
`Performance
`
`159
`
`11.7
`
`Summary
`
`163
`
`12 THE PROMISE AND THE PROBLEMS
`
`166
`
`12.1
`
`Some Bottom-Line Conclusions
`
`166
`
`12.1.1 High Bandwidth I/O Is a Must, 166
`12.1.2 Memory Is a Must, 167
`12.1.3 Programming Is Possible, and Becoming More So, 168
`12.1.4 The Programming Environment Is Crucial, 168
`
`12.2
`
`To Where from Here?
`
`169
`
`
`
`Petitioner Microsoft Corporation - EX. 1007, TOC iv
`
`Petitioner Microsoft Corporation - Ex. 1007, TOC iv
`
`

`

`12.3
`
`If Not Splash 3, Then What?
`
`171
`
`12.3.1 Architectures, 172
`12.3.2 Custom Processors, 173
`12.3.3 Languages, 174
`
`12.4
`
`12.5
`
`The “Killer” Applications
`
`177
`
`Final Words
`
`178
`
`A SPLASH 2 DEVELOPMENT—THE PROIECT MANAGER’S
`SUMMARY
`
`B AN EXAMPLE APPLICATION
`
`REFERENCES
`
`Contents
`
`179
`
`186
`
`190
`
`Petitioner Microsoft Corporation - Ex. 1007, TOC x
`
`
`Petitioner Microsoft Corporation - EX. 1007, TOC X
`
`

`

`Preface
`
`The Splash 2 project began at the Supercomputing Research Center1 in September of
`1991 and ended, with success, in the spring of 1994. Splash 2 is an attached processor
`system using Xilinx XC4010 FPGAs as its processing elements. As such, it is a
`custom computing machine. That is to say that much of what would be the instruction
`set architecture of the processing elements is not specified except in the details of the
`program developed by the application programmer. Although a higher-level block
`diagram of processing elements, memories, interconnect, and dataflow exists in the
`hardware structure of Splash 2, the details of the instruction set architecture level of
`the machine will vary from one application to the next.
`The Splash 2 project is significant for two reasons. First, Splash 2 is part of a
`complete computer system that has achieved supercomputer—like performance on a
`number of different applications. By “complete computer system” we mean that SRC
`created or caused to be created an extant hardware system (replicated a dozen times),
`a complete programming and runtime environment, and a collection of application
`programs that exercised the unique hardware.
`The second significant aspect of Splash 2 is that we were fortunate enough to
`be able to build a large system, capable of performing real computations on real
`problems. One common complaint about performance results on novel computing
`machines or environments is that results on small problems cannot be accurately
`extrapolated to large problems. The Splash 2 system that was designed and built is
`a full—sized machine and does not suffer from this defect.
`
`To get to the point: why a book?
`
`1Renamed the Center for Computing Sciences in May 1995, but referred to throughout this book
`as SRC.
`
`xi
`
`
`
`Petitioner Microsoft Corporation - Ex. 1007, Preface xi
`
`Petitioner Microsoft Corporation - Ex. 1007, Preface xi
`
`

`

`Xii
`
`Preface
`
`This is a novel computing machine. In order to understand what happens when
`the application programmer is permitted, indeed required, to design the processor
`architecture of the machine that will execute his program, it is necessary to see the
`system as a whole. Programmability and problems to be run on this machine both
`had major influences on its architecture, just as its architecture and its unique nature
`influence the kinds of problems one would expect to program for this machine and
`the nature of that programming. And standing between the user and the machine,
`as the old joke goes, is a new kind of programming environment and an evolving
`understanding of how this environment must allow the use of the hardware, without
`forcing every programmer to be a hardware-design engineer.
`At the first IEEE Workshop on FPGAS for Custom Computing Machines, one
`of the industrial attendees remarked that, although nearly everyone would agree, as
`part of a coffee-room discussion or the like, that it would be interesting to think about
`building a “computer” using FPGAs, no one in management (except perhaps at SRC
`and DEC PRL) had put up the commitment necessary in time and money to do a
`real test of the idea. It was then remarked that, given the nature of the marketplace
`and of engineering management, these first attempts had to be successful in order to
`open the door for future work. We feel we have been successful, and we offer in
`this book an in-depth look at one of the handful of data points in the design space
`of this new kind of machine.
`
`We would hope that this book has a broad appeal and is readable with under—
`standing by nearly all computer scientists and computer engineers. To the hardware
`designer, perhaps we can offer a new look at programming applications on a mod-
`erately general FPGA-based computing machine instead of designing circuits for a
`specific board incorporating FPGAS. The engineering world has viewed FPGAs, to a
`great extent, as the next logical step in a continuum of electronic devices; we offer,
`we feel, the option of viewing them much more broadly than that. To the computer
`architect we offer a variant hardware platform and-an indication of how that general
`platform can be used. Much of computer architecture is a compromise between the
`functionality desired and the limits of what can be built given existing technology; we
`offer the use of a new technology that can offer, to a limited extent now, and could
`offer much more generally later, greatly increased functionality. For some of those
`who have hard problems in computation, we offer much of the power of special-
`purpose hardware without the inflexibility and uncertain delivery times of hardware.
`The long-term task is not to map a high—level language to a particular architecture
`or range of architectures, but in some sense to create for each application program
`a suitable architecture to which the high-level language will be mapped. And to the
`language designers and compiler writers we offer a world to conquer. We have pre—
`sented one imperfect but usable approach to programming such a computing machine,
`and we trust that others interested in the critical problem of making these machines
`programmable can learn both from what we did right and what we did wrong.
`Chapter 1 discusses the general concept of Custom Computing Machines, of
`which Splash 2 is one example. Chapters 2 and 3 describe at a high level and
`then in some detail the hardware architecture of Splash 2. Chapter 4 covers the
`design considerations and decisions in arriving at the second-generation Splash 2
`architecture. We present this chapter at the end of the section on hardware, on the
`basis that it is easier to understand variations in a design when those variations are
`compared against something concrete.
`
`Petitioner Microsoft Corporation - Ex. 1007, Preface xii
`
`Petitioner Microsoft Corporation - Ex. 1007, Preface xii
`
`

`

`Preface
`
`xiii
`
`Chapters 5 and 6 describe, also first at a higher and then at a lower level, the
`software architecture of Splash 2. All the application programs in the latter chapters
`were done using VHDL as an applications programming language and these tools
`in support. The main goal of the Splash 2 project was to show that software, as
`described in Chapters 5 and 6, could make a computer using FPGAs as its processing
`elements into something that reasonable people would call “programmable,” and, in
`that sense, the heart of the Splash 2 project is in Chapters 5 and 6. Throughout
`the life of Splash 2, however, there has been an alternative view of programming.
`This view is reflected in Chapter 7 on the Splash 2 version of the programming
`language dbC. The approach taken in dbC is to permit the programmer to use a
`version of C as the programming language. It is the compiler which then becomes
`responsible for, in essence, recognizing the instruction set architecture necessary to
`execute the program and then creating in the FPGA the requisite registers, logic units,
`and control.
`
`Chapters 8 through 11 then describe four different applications programmed
`to conclusion on Splash 2. The first of these—the sequence comparison problem—
`was the driving application, in the sense that funding for Splash 2 was based on its
`perceived ability to perform this computation. This and the text processing application
`were done at SRC.
`
`The Splash 2 project team was fortunate in that SRC’s parent company, the
`Institute for Defense Analyses, issued two contracts, to Virginia Polytechnic Institute
`and to Michigan State University, for applications work on Splash 2 in image pro-
`cessing and fingerprint identification. Both applications seemed good matches with
`the Splash 2 architecture but lay outside the normal realm of SRC’s research program.
`The faculty members involved have each prepared a chapter on these applications.
`We close in Chapter 12 with some opinions and speculations about the future.
`In an appendix, the project manager presents a chronology of the entire Splash 2
`project.
`It is incumbent on us, and a genuine pleasure, to thank the Center for Com-
`puting Sciences of the Institute for Defense Analyses and the CCS Director, Francis
`Sullivan, for supporting us in our writing and editing of this book. All royalties
`will be donated to the Center for Excellence in Education, formerly known as the
`Rickover Foundation, in McLean, Virginia. The Center for Excellence in Education
`supports science and engineering education through its sponsorship of the Research
`Science Institute each summer for high school seniors, its Role Models and Lead-
`ers Project in Washington, DC, Los Angeles, and Chicago for promising women
`and minority high school students intending to study science and engineering, and
`its support and preparation of the United States Informatics Olympiad team each
`year.
`
`The Splash 2 project was a success in large part due to the ability of those who
`were involved nearly full-time, but it might not have taken the course it did had the
`hard-core Splash 2 players not had the chance to get advice and occasional help from
`a much larger group of experts both at SRC and elsewhere.
`We acknowledge, therefore, the help and advice of Nate Bronson, Dan Burns,
`Bill Carlson, Neil Coletti, Maripat Corr, Steve Cuccaro, Hillory Dean, Chuck Fiduc—
`cia, Brad Fross, Charles Goedeke, Maya Gokhale, Frank Hady, Dzung Hoang, Bill ,
`Holmes, Amy Johnston, Elaine (Davis) Keith, Dan Kopetzky, Andy Kopser, Steve
`Kratzer, Jim Kuehn, Sara Lucas, Michael Mascagni, Marge McGarry, John McHenry,
`
`
`
`Petitioner Microsoft Corporation - Ex. 1007, Preface xiii
`
`Petitioner Microsoft Corporation - Ex. 1007, Preface xiii
`
`

`

`xiv
`
`Preface
`
`Ron Minnich, Lindy Moran, Fred More, Mark Norder, Lou Podrazik, Dan Pryor,
`Craig Reese, Paul Schneck, Brian Schott, Nabeel Shirazi, Doug Sweely, Dennis
`Sysko, Mark Thistle, Bob Thompson, Ken Wallgren, Alice Yuen, Neal Ziring, and
`Jennifer Zito.
`
`Duncan A. Buell
`
`Jefi‘rey M. Arnold
`Walter J. Kleinfelder
`Bowie, Maryland
`March 1996
`
`Petitioner Microsoft Corporation - Ex. 1007, Preface xiv
`
`Petitioner Microsoft Corporation - EX. 1007, Preface xiv
`
`

`

`CHAPTER 1
`
`Custom Computing
`Machines: An Introduction
`
`Duncan A. Buell1
`
`1.1 INTRODUCTION
`
`It is a basic observation about computing that generality and efficiency are in some
`sense inversely related to one another; the more general-purpose an object is and
`thus the greater the number of tasks it can perform, the less efficient it will be in
`performing any of those specific tasks. Design decisions are therefore almost always
`compromises; designers identify key features or applications for which competitive
`efficiency is a must and then expand the range as far as is practicable without unduly
`damaging performance on the main targets.
`This thesis has certainly been true in processor architecture of computers aimed
`at computationally intense problems. Vector processors and vector supercomputers
`have targeted computationally intense, array-oriented floating point problems, usually
`in the hard sciences and engineering, but have not sacrificed the necessary speed on
`their core applications in order to be all things to all people. Thus, on computationally
`intense problems that do not fit well on traditional supercomputers, perhaps due to
`such things as integer arithmetic or scalar code, fast workstations can often outperform
`supercomputers.
`To counter the problem of computationally intense problems for which general-
`purpose machines cannot achieve the necessary performance, special-purpose proces-
`sors, attached processors, and coprocessors have been built for many years, especially '
`
`1A version of this chapter appeared as Buell and Pocek [11] and is used with permission.
`
`
`
`Petitioner Microsoft Corporation - EX. 1007, p. l
`
`Petitioner Microsoft Corporation - Ex. 1007, p. 1
`
`

`

`Custom Computing Machines: An Introduction
`
`Chapter 1
`
`in such areas as image or signal processing (for which many of the computational
`tasks can be very well defined). The problem with such machines is that they are
`special-purpose; as problems change or new ideas and techniques develop, their lack
`of flexibility makes them problematic as long—term solutions.
`Enter the FPGA. Field Programmable Gate Arrays, first introduced in 1986
`by Xilinx [34], were seen rather immediately by a few people to offer a totally
`new avenue to explore in the world of processor engineering. The great strength
`of the computer as a tool is in its ability to be adapted, via programming,
`to a
`multitude of computational tasks. The possibility now existed for an FPGA-based
`computing device not only to be configured to act like special-purpose hardware
`for a particular problem, but to be reconfigured for different problems and for this
`reconfiguration to be a programming process. By being more than single-purpose,
`such a machine would have the advantage of being flexible with at least a limited
`range of different applications. By being programmable, such a machine would open
`up “design of high-performance hardware” to individuals who can “design hard-
`ware” in an abstract sense but not a concrete sense. Finally, by being designed to
`operate as if they were hardware, the applications for these machines can achieve
`the hardware—like performance one gets from having explicitly parallel computa—
`tions, from not having instructions and data fetched and decoded, and from hav—
`ing the ability to design processing units that reflect precisely the processing being
`done.
`
`It is no exaggeration to say that machines using FPGAs as their processing
`elements have demonstrated that very high performance on an absolute scale, and
`extraordinary performance when measured against price, is possible with this tech-
`nology. The PeRLe machines built at DEC’s Paris Research Laboratory have been
`programmed on a number of applications with impressive results [7, 8, 9, 31]: An
`implemented multiplier can compute a 50-coefficient, 16-bit polynomial convolution
`FIR filter at 16 times audio real time. An implementation of RSA decryption executes
`at 10 times the bit rate of an AT&T ASIC. A Hough Transform implementation for
`an application in high—energy physics achieves a compute rate that a standard 64-bit
`computer could not equal without a 1.2 GHz clock rate.
`As can be seen from the later chapters of this book, some of the applications
`programmed on Splash 2 have achieved similarly promising results. It was a general
`observation made by those involved in the Splash 2 project that, on applications that
`fit the machine, one Splash 2 Array Board could deliver approximately the compute
`power of one Cray YMP processor. One of the commercial licensees of the Splash 2
`technology sells its system for about $40,000. Of course, not all applications fit well,
`and most that do not fit well actually fit very badly indeed, but this is nonetheless a
`performance-to-price ratio substantial enough to warrant continued investigation and
`experimentation.
`The idea of reconfiguring a computer is certainly not new. The Burroughs
`B1700 had multiple instruction sets with different targets (Fortran, COBOL, and
`such) implemented with different microcode. In another way, the Paris functions of
`the Thinking Machines Corp. CM-2 differed from one version to the next. In the
`former case, standard views of hardware instructions were implemented. In the latter
`case, with a novel machine and a new architecture, we presume that an effort was
`made to implement function calls that users were seen to need and to delete unneeded
`functions when the instruction store ran out.
`
`
`
`Petitioner Microsoft Corporation - Ex. 1007, p. 2
`
`Petitioner Microsoft Corporation - Ex. 1007, p. 2
`
`

`

`Section 1.1
`
`Introduction
`
`3
`
`to give to these
`label
`A certain amount of disagreement exists over what
`machines and how to refer to them. The group at DEC’s Paris research lab refers
`to their machine as a Programmable Active Memory (PAM) [7, 9, 29, 31]. Another
`commercial entity uses the term “virtual computer” [12]. From Brown University
`we have Processor Reconfiguration through Instruction Set Metamorphosis (PRISM)
`[4, 5, 6, 32]. We have already used the term “FPGA-based computing device” here.
`That none of these are truly satisfactory was evidenced in the spring of 1994 when the
`comp.arch. fpga newsgroup was discussed and established; the most contentious
`point was over the name. Both the new newsgroup and the IEEE workshops we have
`organized use the term FPGA, thus being in some sense bound in terminology to a
`particular technology (unless, of course, one can convince the developers of the next
`technology that its name should allow FPGA as its acronym). We have chosen to
`use the term Custom Computing Machine (CCM). None of these terms is perfect,
`but we believe that this one is no worse than any of the others.
`The work on CCMs also differs from what is considered reconfigurable com-
`puter architecture, in that the term “reconfigurable” usually refers to a much higher
`level of the system. In the case of CCMs, that which is reconfigurable and significant
`by virtue of its reconfigurability is the “processor architecture” itself. A reconfig-
`urable computer, by contrast, is likely to be either a multiprocessor in which the
`interconnections among the processors can change, or a heterogeneous machine with
`processors of different kinds that a user can choose to include or exclude in the View
`he/she has of “the computer.”
`It is to be emphasized that this is not a mature computing technology and that
`CCMs are not a panacea for all problems in high-performance computing. Among
`the many issues and problems are the following:
`
`1. Are FPGAs large enough, or will they become large enough, so that a significant
`unit of computation can be put on a single chip so as to avoid both the loss
`of efficiency in going off—chip and the problems in partitioning a computation
`across multiple chips?
`2. With current technology, even in the best of circumstances, the user must be
`exposed to the hardware itself. What is the level of understanding about chip
`architecture, signal routing delays, and so forth, that a “programmer” must know
`in order to use a CCM? How much more must be known in order to obtain
`
`the performance that would warrant using a CCM instead of a general-purpose
`computer?
`.
`3. If these machines are to be viewed as “computers,” then they must be capable of
`being programmed. How will this be done? What sort of programming language
`is appropriate? How do we “compile” when we have eliminated the underlying
`machine model?
`
`4. Granting the point that these are limited—purpose, but not special-purpose,
`machines, what is the range of architectures needed to cover the spectrum
`of applications for which these machines make sense?
`5. What are the cost/performance figures necessary to make this a viable approach
`for getting a computing task done? General-purpose machines are cheaper and 7
`easier to use but can be slow. ASICs and special-purpose devices are faster
`but more expensive in small quantities, take longer to develop, and are hard to
`modify. Where might CCMs fit between these two?
`
`
`
`Petitioner Microsoft Corporation - Ex. 1007, p. 3
`
`Petitioner Microsoft Corporation - Ex. 1007, p. 3
`
`

`

`4
`
`Custom Computing Machines: An Introduction
`
`Chapter 1
`
`One problem faced by those involved in Splash 1 and Splash 2 at the Super-
`computing Research Center2 has been a stubborn refusal from some quarters to
`believe that achieving high performance on a CCM is possible without a design
`or programming agony so great as to be offputting to all but the most dedicated
`of “application designers/programmers.” Even in the face of our evidence to the
`contrary, the case has been difficult to make to some critics.
`The case is especially hard because what is needed is to build a complete
`hardware system, to create or cause to be created a programming environment worthy
`of being called a programming environment, and then to develop a variety of different
`applications so that the proof of concept is complete. Further, since the goal is
`to demonstrate a competitive performance with more expensive and sophisticated
`machines, the CCM must be big enough to do real work and to be part of complete
`computational processing environments;
`it cannot be just a toy machine suitable
`only for doing kernels of problems. To our knowledge, only two such machines
`have been built that meet these criteria—Splash 2 and the larger of the DEC PAM
`systems.
`The goal in this book is to present the Splash 2 experience as a whole. Splash 2
`was designed and developed in an iterative process from top to bottom to top and
`back again.
`
`1.2 THE CONTEXT FOR SPLASH 2
`
`1.2.1 FPGAs
`
`FPGAS in general have a wide spectrum of characteristics, but the FPGAs used for
`CCMs have been of two distinct types. The Xilinx XC4010, a typical example of
`one type, is a chip containing a 20 x 20 square array of Configurable Logic Blocks
`(CLBs) [34]. Each CLB can serve one of three functions, either as two flipflops, or
`as Boolean functions of nine inputs, or as 32 bits of RAM. The function use has
`two 4—input functions, each producing an output; these two bits combine with a ninth
`input in a 3-input Boolean function. The RAM usage simply takes advantage of the
`fact that the 4—input functions are done with lookup tables to allow the input bits to
`be viewed as addresses.
`,
`
`Connecting the CLBs to one another and to special Input Output Blocks (IOBs)
`on the periphery of the chip are routing resources running from CLB to CLB, skipping
`one CLB, or running the full length of the chip. Configuration of the FPGA is done
`by loading a bit file onto on-chip RAM.
`In contrast to the relatively coarse granularity of the Xilinx chips, the second
`type of FPGA, by Algotronix, Ltd., and by Concurrent Logic, Inc. [1, 13], is fine-
`grained. The Algotronix chip is a 32 X 32 array of 2—input, l—output Boolean function
`logic cells, with the signal lines running only point to point from one cell to its
`neighbors in each rectangular direction.3
`
`2The Supercomputing Research Center was renamed the Center for Computing Sciences in May
`1995, but will be referred to throughout this book as SRC.
`3Algotronix is now a part of Xilinx.
`
`
`
`Petitioner Microsoft Corporation - Ex. 1007, p. 4
`
`Petitioner Microsoft Corporation - Ex. 1007, p. 4
`
`

`

`Section 1.2
`
`The Context for Splash 2
`
`5
`
`To a first-order approxima

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket