throbber
(EEE
`«p) THEINSTITUTEOF ELECTRICALAND ELECTRONICSENGINEERS september 1987
`
`HSSN 0018-9219)
`
`@
`
`}PECIAL ISSUE ON
`
`aeS,
`it o
`%
`7
`J
`
`
`MO. 64110
` i) See ees eo
`
`/
`25718
`NDA HALL LIS
`sf REALS DEPT c.:
`5109
`CHERRY 5S
`oe aS CITY .
`
`{NNLS SO2
`
`2
`
`3:
`
`opie itiemariinaemtiameaimmeal
`een ,
`
`Petitioner Microsoft Corporation - Ex. 1010, Cover 1
`Petitioner Microsoft Corporation - Ex. 1010, Cover 1
`
`

`

`1139|SCANNING THE ISSUE
`
`
`
`SPUTTTTTTTUTTTTTTTTTTTTTTUTTUOVTITUTITTIsteerer{]]||)1]LUVIUUATUULIUUAIHELI
`
`proceedings«I[ElZE }
`
`
`published monthly bythe Institute of Electrical and Electronics Engineers,Inc.
`september 1987
`
`SPECIAL ISSUE ON
`
`HARDWARE AND SOFTWARE FOR DIGITAL SIGNAL PROCESSING
`Fdited by Sanjit K. Mitra and Kalyan Mondal
`
`PAPERS
`
`VLSI
`
`The TM5320 Family of Digital Signal Processors, K-S. Lin, G. A. Frantz, and R. Simar, Ir.
`1143
`1160 VLSI Processor for Image Processing, M. Sugai, A. Kanuma, K. Suzuki, and M. Kubo
`1167 Digital Signal Processor for Test and Measurement Environment, A. Kareem, C. L. Saxe,
`F. Etheridge, and D. McKinney.
`1172. The Graph Search Machine (GSM): A VLSI Architecture for Connected Speech Recognition and
`Other Applications, S. C. Glinski, T. M. Lalumia, D. R. Cassiday, T. Koh, C. Gerveshi, G. A. Wilson,
`and ]. Kumar
`1185 DSP56200: An Algorithm-Specific Digital Signal Processor Peripheral, G. D. Hillman
`1192
`Parallel Bit-Level Pipelined VLSI Designs for High-Speed Signal Processing, M. Hatamian and
`G. L. Cash
`
`Systems
`
`1203) AG & 320-MHz 1024-Channel FFT Cross-Spectrum Analyzer for Radio Astronomy, ¥. Chikada,
`M.Ishiguro, H, Hirabayashi, M. Morimoto, K-I. Morita, T. Kanzawa, H. Iwashita, K. Nakazima,
`S-I. Ishikawa, T. Takahashi, K. Handa, T. Kasuga, S. Okumura, T. Miyazawa, T. Nakazuru, K. Miura,
`and S. Nagasawa
`1211 MUPSI: A Multiprocessor for Signal Processing, G. Bolch, F. Hofmann, B. Hoppe, C-U. Linster,
`R. Polzer, H. W. Schiissler, G. Wackersreuther, and F-X. Wurm
`1220 Data-Driven Multicomputers in Digital Signal Processing, J-L. Gaudiot
`1235
`Synchronous Data Flow, —. A. Lee and D. G. Messerschmitt
`1246 Multiple Digital Signal Processor Environment for Intelligent Signal Processing, W. 5, Gass,
`RK. T. Tarrant, B. . Pawate, M. Gammel, P. K. Rajasekaran, &, H. Wiggins, and C. D. Convington
`CAD
`
`1260 Computer-Aided Design of VLSI FIR Filters, P. R. Cappello and C-W. Wu
`1272
`A Silicon Compiler for Digital Signal Processing: Methodology, Implementation, and Applications,
`F.F. Yassa, J. R. Jasica, R.
`|. Hartley, and S. E, Noujaim
`Algorithms
`
`1283 Vectorized Mixed Radix Discrete Fourier Transform Algorithms, R. C. Agarwal andJ. W. Cooley
`1293
`Implementation of Digital Filtering Algorithms Using Pipelined Vector Processors, W. Sung and
`5. K. Mitra
`1304 Array Architectures for Iterative Algorithms, H. V. Jagadish, 5. K. Rao, and T. Kailath
`Software
`
`1322
`
`SIG—A General-Purpose Signal Processing Program, D. L. Lager and S. G. Azevedo
`
`PROCEEDINGS LETTERS
`
`1333 Cumulants: A Powerful Tool in Signal Processing, G. B. Giannakis
`1335 A Novel Design of a Combinational Networkto Facilitate Fault Detection, P. R. Bhattacharjee,
`5. K. Basu, and J. C Paul
`1336 Asynchronous Fiber Optic Local Area Network Using CDMA and Optical Correlation,
`M. A. Santoro and P. R. Prucnal
`1338) On a Constrained LMS Dither Algorithm, K. Cho and N. Ahmed
`
`Petitioner Microsoft Corporation - Ex. 1010, Cover 2
`Petitioner Microsoft Corporation - Ex. 1010, Cover 2
`eeeTHONACTOSONOF
`
`

`

`BOOK REVIEWS
`Spread Spectrum Systems, 2nd ed.
`1341
`1342) RC Active Filter Design Handbook,
`1342 Book Alert
`1344
`
`» by R. C. Dixon, reviewed by L. H. Sibul
`by FLW. Stephenson, reviewed by S. Natarajan
`FUTURE SPECIAL ISSUES!SPECIAL SECTIONS OF THE PROCEEDINGS
`
`COVER A family of successive digital si
`gnal processors (overlaying a programmable bit-level pipelined MAC
`cl
`Vip) representing an important developmentwithin the topic area of this special issue,
`1987 PROCEEDINGS EDITORIAL BOARD
`M. I. Skolnik, Edjtor
`R. L. Abrams
`J. L. Melsa
`C.N, Berglund
`R. M. Mersereau
`G. M. Borsuk
`J.D. Musa
`K. R. Carver
`T. C. Pilkington
`R. E. Crochiere
`M. B. Pursley
`J. O. Dimmock
`R. J. Ringlee
`R. C. Dixon
`A, C. Schell
`Tse-yun Feng
`GeneStrull
`A. L. Frisiani
`Yasuharu Suematsu
`G. T. Hawley
`Kiyo Tomiyasu
`W.K. Kahn
`Sven Treitel
`Sherman Karp
`Harry Urkowitz
`5. 5. Lam
`A. 5. Willsky
`Ruey-wen Liu
`J. C. Wiltse
`R. D. Masiello
`M. J. Wozny
`
`1987 IEEE PUBLICATIONS BOARD
`C. H. House, Chairman
`N. R. Dixon, Vice Chairman
`D. L. Staiger, Staff Secretary
`J. J. Baldini
`G. F. McClure
`Donald Christiansen
`H. P. Meisel
`Robert Cotellessa
`T. Pavlidis
`S. H. Durrani
`H. B. Rigas
`Bruce Eisenstein
`T. E. Sharp
`Irving Engelson
`D. H. Sheingold
`W. C. Farrell
`Marwan Simaan
`Sheldon Gruber
`M. L. Skolnik
`W. K. Kahn
`Jerome Suran
`Philip Lopresti
`R. C. Williamson
`
`HEADQUARTERS STAEF
`Eric Herz, Executive Director and General
`Manager
`ElwoodK. Gannett, Deputy General
`Manager
`
`SEINES|]UITTILLLIELT
`CPE]
`
`PUBLISHING SERVICES
`David L. Staiger, Staff Director
`Elizabeth Braham, Manager
`Database/Information Services
`W.R. Crone, Manager,
`IEEE PRESS/PROCEEDINGSOF THE IEEE
`Patricia H. Penick, Manager, Publication
`Administrative Services
`Otto W. Vathke, Publication Business
`Manager
`Ann H. Burgmeyer, Gail S. Ferenc,
`Carolyne Tamney, Production Managers
`ADVERTISING
`William R. Saunders, Advertising Director
`Ralph Obert, Advertising Production
`Manager
`by the Ist ofa month to be effective ior the following month’sissue. Send newaddress, plus
`mailing label showing old address, to the IEEE Service Center,
`Advertising correspondence should be addressed to the Advertising Department at EEE
`Headquarters,
`
`PROCEEDINGS STAFF
`HansP. Leander, Technical Editor
`Nela Rybowicz, Associate Editor
`PROCEEDINGS OF THEIEEEis published monthly by The Institute of Electrical and Elec-
`tronics Engineers, Inc, IEEE Headquarters: 345 East 47th Street, New York, NY 10017-2394. IEEE
`Service Center(for orders, subscriptions, and address changes): 445 Hoes Lane, Piscataway,
`N} 08054-4150, Telephones: Technical Editor 2 12-705-7906; Publishing Services 2 12-705-7560;
`IEEE Service Center 201-981-0060; Advertising 212-705-7579, Copyright and Reprint Permis-
`sions:Abstracting is permittedwithcredit to thesource.Librariesarepermittedtophotocopy
`beyondthelimits of the U.S. Copyright Law for private
`use Of patrons:(1) those post-1977
`articles that carry a codeat the bottomofthefirst Page, provided the per-capyfee indicated
`in thecadeis paid through theCopyright ClearanceCenter, 29 Congress Street, Salem, MA
`Manuscriptsshould be submitted in triplicate to the Editor at IEEE Headquarters. A summary
`01970; (2) pre-1978 articles withoutfee. Instructors are permitted to photocopyisolated arti-
`ofinstructionsfor Preparationis found in the mast recentJanuaryissue ofthis journal. Detailed
`cles for noncommercial classroomuse without fee. For all other copying, reprint or repub-
`instructions are containedin “Information for IEEE Authors,”available on request. See note
`lication permission, write to Copyrights and Permissions Department, IEEE Publishing Ser-
`at beginning of “Proceedings Letters” for special instructions for this section, Aiter a man
`vices, 345 East 47th Street, New York, NY 10017-2394, Copyright © 1987 by The Institute of
`uscript has been acceptedfar publication, theauthor'sorganizationwill be requested topay
`Electrical and Electronies Engineers, Inc. All rights reserved,Printed in U.S.A. Second-class
`a voluntary charge of $110 Per printed page to cover part of the publication cost. Respon-
`sibility for contents of Papers rests upon the authors and not the IEEE or its members,
`postage paid at New York, NY and at additional mailing
`offices. Postmaster: Send address
`changes to PROCEEDINGS OF THEIEEE, IEEE Service Cen
`08854-4150,
`ter, 445 Hoes Lane, Piscataway, NI
`Copyright: It
`is the policyof the IEEE to own the copyright to the technical contributionsit
`publishes on behalf of the interests of the IEEE,its authors and employers, andto facilitate
`AnnualSubscription: Memberand nonmembersubscriptionpricesavailableon request.Sin-
`the appropriate reuse of this material by others. To comply with the U.S, Copyright Law,
`gle Copies: IEEE members $10.00 (first copyonly), nonmembers $20.00 per copy. (Note:Add
`authors are required to sign
`an IEEE copyright transfer formbeiore publication. This form,
`34.00 for postage and handling charge to any order from
`4 copyof whichis found in the
`$1.00 to $50.00, including prepaid
`orders.) Other: Available in microfiche and microfilm, Ch
`Most recent January issue of this journal, returns to authors
`and their employers full rights to reuse their material for their own purposes. Authors must
`ange of address must be received
`submit a signed copyof this form with their manuscripts,
`CONTRIBUTED PAPERS
`The PRoceeDINGsOFTHE IEEEwelcomesforconsideration technical
`the topic andhowit meets the abovecriteria, an outline ofthe pro-
`Papers on topics of broad significance and long-range interest in
`all areas ofelectrical, electronics, and computer engineering. In-
`posed paper, and a brief biography showing the author's quali-
`depth reviews andtutorials are particularly appropriate, although
`fications to write the Paper(including reference to previously pub-
`lished material as well as the author's relation to the topic). If the
`papers describing individual research will be considered if they
`meetthe generalcriteria above, Papersthat do not meet these cri-
`Proposal receives a favorable review, the author will be encour-
`teria should be submitted to the appropriate IEEE Transactions and
`aged to prepare the Paper,
`whichafter submittalwill go through
`Journals.
`the normal review process,
`Itis suggested thata Prospective author, before preparing a full-
`Please send papers and
`Proposals to the Technical Editor,
`length manuscript, submit a Proposal containing a description of
`PROCEEDINGS OF THE IEEE,
`345 East 47th Street, New York, NY 10017-
`2394, USA (Telephone:
`212-705-7906).
`
`
`Petitioner Microsoft Corporation - Ex. 1010, Cover 3
`Petitioner Microsoft Corporation - Ex. 1010, Cover 3
`
`

`

`Data-Driven Multicomputers in Digital
`Signal Processing
`
`
`
`JEAN-LUC GAUDIOT, MEMBER,
`
`IEEE
`
`Newtechnologies of integration allow the design of powerful
`systems which may include several thousands of elementary pro-
`cessors. These multiprocessors may be used for a range of appli-
`cationsin signal and data processing. However, assuring the proper
`interaction of a large number of processors and the ultimate safe
`execution of the user programs presents a crucial scheduling prob-
`lem. The scheduling of operations upon the availability of their
`operands has been termedthe data-driven mode of execution and
`offers an elegant solution to the issue. This approachis described
`in this paper and several architectures which have been proposed
`or implemented (systolic arrays, data-flow machines, etc.) are
`examined in detail. The problems associated with data-driven exe-
`cution are also studied. A multi-level approach ta high-speed dig-
`ital signal processing is then evaluated.
`
`1.
`
`INTRODUCTION
`
`{f we are to approach the computational throughputs
`equivalent to billions of instructions per second which will
`be required from the processing systems of the future,
`improvements on all levels of computer design must be
`made,Faster technology and better packaging methods can
`be applied to raise clock rates. However, a one billion
`instructions per second machine would require a clock
`period as low as a nanosecond. This approachis inevitably
`bounded by physical limits such as the speed of light.
`Therefore,
`instead of considering the technological
`approach to performance improvement, we emphasize
`here the architectural method, Indeed, instead of merely
`increasing the clock frequency fora correspondingincrease
`in overall throughput, performance can also be improved
`by allowing multiple processing elements to collaborate on
`the same program. This inevitably introduces synchroniz-
`ation problems, and issues of resource allocation and shar-
`ing must be solved. Programmability is indeed the central
`problem. In one solution, a conventional language such as
`Fortranis used to program the application. A sophisticated
`compileris relied upon to partition a sequential program
`for execution on a multiprocessor. This approach has the
`
`Manuscript received September 4, 1986; revised January 23, 1987.
`This work was supported in part by the Department of Energy under
`Grant DE-FG03-87ER 25043. The views expressed in this paper are
`not necessarily endorsed by the U.S. Department of Energy.
`The authoris with the Computer ResearchInstitute, Department
`of Electrical Engineering—Systems, University of Southern Cali-
`fornia, Los Angeles, CA 90089, USA.
`{EEE Log Number8716208.
`
`advantage of imposing no “software retooling.” However,
`complex numerical applications will not be easily parti-
`tioned and muchpotential parallelism may remain unde-
`tected by the compiler.
`Ada, CSP [26], extended Fortran (e.g., HEP, Sequent), on
`the other hand, allow the programmerto deal with parallel
`processes by the use of primitives for parallel task spawn-
`ing, synchronization, and message passing. However, while
`the programmer can express someof the parallelism char-
`acteristic of the application, much potential concurrency
`may never be uncovered because of the inherent sequen-
`tial concepts of the language which must be countered
`through the use of special “parallelism spawning”instruc-
`tions. Also, developmenttime becomesimportant since the
`programmermust “juggle” with many parallel tasks to syn-
`chronize.
`In addition, debugging becomes correspond-
`ingly more difficult due to the sometimes undeterministic
`appearanceof errors.
`For these reasons, an implicit approach must be devised.
`In the above two methods,instruction scheduling is based
`upon a central program counter. We propose to demon-
`strate here the data-driven approach to programming mul-
`tiprocessors: instructions can be scheduled by the avail:
`ability of their operands. This model of executionis a subset
`of the functional model of execution [9]. It provides a sig-
`nificant improvement to the programmability of multipro-
`cessors by excluding the notion of global state and intro-
`ducing the notion of values applied to functions instead of
`instructions fetching the contents of memorycells as they
`are in the conventional “‘control-flow’’ model.
`The overall objective of this paper is to demonstrate the
`applicability of data-driven principles of execution to the
`design of high-performance signal and data processing
`architectures, Several approacheswill be demonstrated and
`their particular domain of application will be contrasted.
`The description of low-level processing systems is beyond
`the scope ofthis paper andthe interestedreaderis referred
`to an excellent survey by Allen (3]. Instead, we will con-
`centrate here on the issues related to building high-per-
`formance multiprocessors for signal processing applica-
`tions.
`In Section Il, we show the type of problems
`considered in signal processing. The data-flow principles
`of executionas they relate to digital signal processing prob-
`lems are describedin detail in Section Il while several exist-
`
`0048-9279/87/0900-1220807.00 © 1987 IEEE
`
`1220
`
`PROCEEDINGS OF THE IEEE, VOL. 75, NO. 9, SEPTEMBER 1987
`
`Petitioner Microsoft Corporation - Ex. 1010, p. 1220
`Petitioner Microsoft Corporation - Ex. 1010, p. 1220
`
`

`

`ing data-driven architectures are describedin SectionIV.
`In Section V, we analyze a multi-level data-driven archi-
`tecture and examine its programming environment. Con-
`clusions are drawn in Section VI.
`
`I. Tee REQUIREMENTS OF SIGNAL PROCESSING
`
`Digital signal processing techniquesare applied to many
`different technical problems, These include radar and sonar
`systems, image processing, speech recognition, etc. The
`elementary building blocks of these were originally con-
`centrated on such tasks as convolution, correlation, and
`Fourier transform. More complex algorithms (matrix oper-
`ations, linear systems solvers, etc.) are now considered.
`Higher order operations include not only simple problems
`such as elementaryfiltering (IIR, FIR, etc.), but also more
`complex functions such as adaptive and Kalmanfiltering
`[45]. Also, such complex problems as Computer-Aided
`Tomography or Synthetic Aperture Radar can be consid-
`ered[39], [16]. Signal processing algorithms are very appro-
`priate for description by functional languages. Indeed, a
`signal processing algorithm is often represented ina graph
`form (36] and can be decomposedin twolevels:
`
`* aregular level which can be implemented by a vector
`operation(i.e.,a loop in whichall iterations present no
`dependencies among themselves);
`alevel which contains conditional operations and heu-
`ristic decision making.
`
`*
`
`rhis description showsthat the lower operational levels
`caneasily deliver parallelism (by compiler analysis or pro-
`grammerinspection). This layer usually consists of simple
`constructs(arithmetic instructions, FFT butterfly networks,
`simplefilters, etc.). However, the higherlevels will require
`more complex probleminsight and even runtime depen-
`dency detectionin order to allow maximumparallelism. We
`will now describe principles of execution whichwill allow
`us to deliver this concurrency.
`
`Il. Data-Flow PRINCIPLES
`
`The data-flow solutionto the programmability problems
`of large-scale multiprocessors [5] has been pioneered by
`Adams [2], Chamberlin [11], and Rodriguez [43]. It is now
`describedin detail in this section.
`
`A. Basic Principles of Execution
`
`In the conventional von Neumann modelof execution,
`an instruction is declared executable when a Program
`Counter of the machine points to it. This event is usually
`under direct programmer control. While a control-flow pro-
`gramis a sequential listing of instructions, a data-flow pro-
`gram can be represented as a graph where the nodesare
`the instructions (actors) which communicate with other
`nodes over arcs (Fig. 1). An instruction is declared execut-
`able whenit has all its operands. In the graph represen-
`tation chosen above, this meansthat all the input arcs to
`an actor must carry data values (referredto as tokens) before
`this actor can be executed, Execution proceeds by first
`absorbing the input tokens, processing the input values
`accordingto the op. code of the actor, and accordingly pro-
`ducing result tokens on the output arcs. Insummary,it can
`
`GAUDIOT: DATA-DRIVEN MULTICOMPUTERSIN DSP,
`
`
`
`Fig. 1. A simple data-flow graph.
`
`be said that the data-flow model of execution obeys two
`fundamental principles:
`* Asynchrony of operations: The executability of an
`instruction is decided by a /ocal criterion only. The pres-
`ence of the operands can be sensed “locally” by each
`instruction, This is an attractive property for an implemen-
`tation in a distributed environment where no central con-
`troller should be used for global scheduling.
`* Functionality of the operations: The effect of each
`operation is limited to the productionof results to be con-
`sumedby a specific numberof other actors. This precludes
`the existence of “side-effects.” These side-effects may be
`long-ranging in that the execution of an instruction may
`effect the state of a cell of memory which will be used only
`muchlater by another unrelated operation.
`
`B, Data-Flow Interpreters
`When iterations are executed, the underlying principle
`of data-flow (single assignment of variables) must invariably
`be violated. Indeed, for an actor to be repeatedly evaluated
`as in an iteration, its input arcs must carry several tokens
`(from different iterations). Several solutions have been pro-
`posed to allow the controlled violationof these rules with-
`out compromising the safe execution of
`the program.
`Among these,
`the Acknowledgment scheme and the
`U-interpreter have been given the most consideration.
`1) Acknowledgment Scheme [14]: Proper matching of the
`tokens can be observed by ordering the token production.
`This would be done bya careful designof the program graph
`so as to insure that tokens of two different iterations can
`never overtake each other. In addition, it must be guar-
`anteed that no token pileup is encountered on any one arc.
`This condition can be verified by allowing the firing of an
`actor when tokens are on all input arcs and there are no
`tokens on any outputarcs. In order to enforcethis last con-
`dition, an acknowledgment must be sent by the succes-
`sor(s) to the predecessor when the token has been con-
`sumed (Fig. 2). Note that an actor is executable when it has
`received its input arguments as well as all acknowledg-
`ments. The parallelism which can be exploited from this
`scheme is mostly pipelining betweentheactorsof different
`iterations. Thus when the number of instructions in the
`body of aniteration is the same as the numberofavailable
`processors, the speedup observed by this mechanism of
`execution is maximal. However, for small iterations (com-
`pared to the size of the machine), the exploited parallelism
`
`Petitioner Microsoft Corporation - Ex. 1010, p. 1221
`Petitioner Microsoft Corporation - Ex. 1010, p. 1221
`
`$$C*#PMittonMicrosottCo
`
`

`

`*ACK’
`
`Fig. 2. The acknowledgment scheme.
`
`designed: the MITcell block architecture [13], the Hughes
`Data-Flow Machine [21], the DSFP [24], the USC TX16 [19],
`Unraveling
`Interpreter
`(U-Interpreter): The
`U-interpreter[4] Provides the most asynchronouspossible
`Operation. In orderto allow safe execution of actors in an
`iterative construct, tokens are tagged withinformation per-
`be found, This tag includestheiteration number. Indeed,
`the U-interpreter closely follows these principles: to each
`data tokenis attached a lag of the form u.P.s.i, wherePiden-
`tifies the procedure nameof the destination actor, while s
`
`raph. The function of this actoris to create anewcontext
`for the execution of the iteration: the input tokens are
`taggedby u.P.s.i while the Output tokensare identical but
`are tagged with u'P.t.1 where a’ is itself u.7, Note that this
`mechanism is sufficient to create an entirely different set
`of tokens fortwo nested iterations. Indeed, assumethat two
`tokens both belong to the same inner iteration i-but belong
`to thej; andj, outer iterations respectively. Thefirst token
`
`
`we we:
`
`
`
`Fig. 3. A typicaliterative construct in the U-interpreter,
`
`would be tagged u1.P.5.j (ul = w.j,) while the second is
`tagged with u2.P.s.i (u2 = uj). This showsthat an appro-
`Priate differentiation has been made between the two
`instances. Theoriginalcontext wis retrieved bythe L~' actor
`beforeexiting,
`Contrarily to the acknowledgment Scheme, this dynamic
`data-flow scheme allowsfull asynchronous execution of the
`
`without compiler-induced replication of the graph. Like-
`wise, multiple function calls and more particularly recur-
`sionsare allowed since each newactor instantiation receives
`a different tag. This means that the U-interpreter would be
`preferredto thestatic model when the ability for fast recur-
`sive calls is required, However, this flexibility comesat the
`expense of added hardware complexity. Indeed,it will be
`shownin Section IV-D that implementationof the U-inter-
`preter requires an associative memory for fast tag match-
`ing. Several machines based on theseprinciples have been
`studied: the MIT tagged token data-flow machine [6], the
`ESL DDSP [28], the University of Manchester machine[23],
`the ETL Sigma-1 [25], etc,
`
`C. Structure Handling
`This is a crucial issue in signal Processing for this kind
`of application requires that many data elements which
`belong to the same structure be processedin a parallel or
`pipelined fashion. One of the basic premises of data-flow
`Principles states that an output is a function ofits inputs
`only, regardless of the state of the machineat the time of
`execution. Whena structure of elementary elements must
`be processed, the absenceofside-effects Meansthatit may
`not be updatedforthis would implyits transition through
`several states. Instead, if any updates are needed, a new
`array which contains the new elements must be created.
`Copyingof all elements must be undertaken for the mod-
`ification of a single one. This solution imposesaninordi-
`nate overhead, This is why the implementation schemes we
`will now describecan shortcut this complete copying while
`
`PROCEEDINGS OF THE IEEE, VOL. 75, NO. 9, SEPTEMBER 1987
`
`Petitioner Microsoft Corporation - Ex. 1010, p. 1222
`Petitioner Microsoft Corporation - Ex. 1010, p. 1222
`
`
`eeeCC*SPtitionMicrosoftCi
`
`

`

`of the possibility of parallelism that can be detected only
`at runtime. For instance, dependencies on A and B related
`by A(F(/)) = B(i) maybe artificially imposed. However, the
`applications targetted by the HDFM include some amount
`of regularity which can beeasily detected by the compiler
`and implemented as conventionalarrays.
`4) TokenRelabeling [18]: In the U-Interpreter, the notion
`of array can be entirely ignoredat the lowest level of exe-
`cution. Instead, the tag associated with each token under
`the rules of the U-interpretationis usedas identification of
`the index of the array element of the high-level language.
`In other words, it can be simply said that, when an array A
`is created,its A(/) element will be taggedwith i (hereafter
`denotedA(i),;), if the elements are producedin the logical
`order. In the “production” of a unidimensional array, the
`iteration numbercan usually bedirectly interpreted as the
`index of the array elementjust produced by the iteration.
`Special token relabeling program graphs can be created
`to handle scatter and gather program constructs [27] (Fig.
`5(a)). This figure shows that an inversion function F7!
`
`Ay
`. (Relabeling}
`(en)
`. Ai)
`<=> A(F{EI}
`
`fey]
`
`[i]
`
`By
`*.
`\
`
`Clk
`
`| We
`
`[=1,100
`DO!
`J Cl} = BIL) + A{F(T))
`(a)
`
`=t
`— Fk),
`fe]
`
`treat]
`
`Aik),
`At Vay
`
`o pl
`Bh Alk)
`Fo> ALF)
`“
`(rte)
`
`i]
`
`[e
`
`i)
`
`-
`
`y(
`
`b)
`
`Fig. 5.
`
`(a) A gather operation. (b) Token Relabeling gather.
`
`This demonstrates that, without recourse to the calculation
`of F~', the properrelabeling of the A elements has been
`effectively produced.
`This algorithm requires no intermediary storage, does
`not need array operations, and imposes smaller hardware
`and execution overhead. This relabeling approach elimi-
`nates a large portion of the overheadassociated with the
`production and consumption ofarray A. Pipelining between
`the source and the sinkofa data structureis the goal of this
`unknownat compile time would be neededto perform the
`relabeling of data-flow tokens. Such acalculationis not truly
`necessary. Instead, we introduce(Fig. 5(b)) a sequence gen-
`erator which produces the F( j)’s, tagged by}. An exchanger
`actor (called x) swaps the tag andthe data value and pro-
`duces ji-;,;. Both streams (the A’s and jteciq) are input to a
`special relabeling actor 6 which only modifies the iteration
`portion of the tag. By the principles of the U-interpreter,
`only tokens which bear the same tag will be matched as
`proper inputs to the 6 actor. In other words, the mate of
`
`preserving the meaning of the program duringarrayu pdate
`operations,
`1) Heaps: Dennis[14] has proposed to represent arrays
`by directed acyclic graphs (also called heaps). Each value
`is representedas the leaf of a graph tree. The modification
`of a single element ina heapis represented in Fig. 4. Note
`
`642
`Ws
`Fig. 4. A heap update.
`
`13°21
`
`22
`
`2
`
`23
`
`NEW
`23 pi
`
`that the complexity of the modification of a single element
`of the array is O(n) for a copy operation, while it
`is
`O(log n) for the heap. Several instructions are exclusively
`devoted to the access of heaps [15]; SELECT receives a
`pointer to the root node, an index value, and returns a
`pointerto the substructure (which may bea leaf node) asso-
`ciated with the index; APPEND also needs the same two
`operands in addition to the value of the element to append
`to the structure.
`2) I-Structures: A heap must be entirely ready before it
`can be consumed because no consumption (SELECTactors)
`can take place until the pointer token appears(i.e., the cre-
`ation ofthe array is completed). In the [-structure scheme
`[7] constraints on the creation of arrays allowthe selection
`of individual elements (or substructures) from the array
`before its complete production. One possible implemen-
`tationofI-structures makes use of a “presence”bit which
`indicates when an elementof an array has been calculated
`andis ready for consumption. An attempt to read an empty
`cell would cause the read to be deferred until such time that
`the cell presencebit is marked. Conversely, a write into a
`cell, the presence bit of whichindicatesvalid stored data,
`could be cause for the generation of an error signal. The
`advantages of this scheme are:
`
`*
`
`* better performance because pipelining is allowed
`between I-structure consumers and producers;
`fess “serialization” of operations such as APPENDs,
`because they are allowed to occur independently on
`the same structure,
`3) HDFM Arrays: A special scheme for handling arrays in
`a VAL high-level environment has been designed for the
`Hughes Data-Flow Machine (HDFM)[21]. Ituses the fact that
`data-flow arrays as described above are overly “asynchro-
`nous,’ i.e., they do not take advantageof the data depen-
`dency information carried by the program graph. Safetyof
`accesses is respected by not allowing the updating of an
`array before all the reads from the current version of the
`array have been performed. Only then can the array be
`directly modified. Safety and correct execution of WRITE
`Operations are a compile-time task. This has the advantage
`of reducing the number of memory accesses (no complex
`graphof pointers must be traveredas in heaps) as well as
`of offering a better possibility of distribution of an array (no
`root node). However, spurious data dependencies may be
`introduced because the compiler is not necessarily aware
`
`GAUDIOT: DATA-DRIVEN MULTICOMPUTERS IN DSP
`
`4223
`
`Petitioner Microsoft Corporation - Ex. 1010, p. 1223
`Petitioner Microsoft Corporation - Ex. 1010, p. 1223
`PAAEHOWiscrOSONA
`
`

`

`their domain of applicationto design or compile time and
`refertothemas data-driven systems,We thus initially exam-
`ine multiprocessor systems where data dependencies have
`been frozen at design time (systolic arrays). We then con.
`sider Programmable systolic arrays (the Wavefront Array
`Processor) and multiprocessors scheduledat compile time
`by the use of data-flow Programgraphs(the ESL polycyclic
`Processor), Finally, we studysystems wherethedata depen-
`dencies provide scheduling information at runtime (the
`Hughes Data-Flow Machine) and examine the influence of
`the level of resolution upon the performance(the USC
`A. Systolic Arrays [32]
`
`TX 16),
`
`cessingproblems, Matrix operations, image Processing, etc,
`In summary,a systolic array is simply a collection of inter-
`connected Processing Elements (PEs),
`in order to incor-
`porate as many processors as possible, the structure of the
`PEs themselvesis kept toa maximum simplicityand usually
`includes only a few operation units, For design simplifi-
`cation, there are few types of PEs in the same system. By
`neighbor topology in orderto minimize communication
`delays as well as Powerclistribution issues. Note thattopol-
`
`ae ue ECA actor 6 is
`
`Point of view:
`
`ooSeng actor which takes the A input and relabels it
`with the data carried by the token on the other input are,
`In other words,it Oulputs A(F(i)),,.. Since | isa dummyvar-
`lable, andsince Fis bijective, it can be saidthat, ona global
`A(F(i)),, = Alk)p WAH
`scheme, just as it was the idea behind the design ofthe
`l-structures, However, thetoken relabelingapproachbrings
`4 better runtime memory management since tokens cor.
`responding to the various elements of the arraystill exist
`and must still be temporarily stored, they need not go
`through an additional storageas adata structure. Also, there
`iS NO Needfor “requests” for data as would be the casein
`an [-structure environment: When an I-structure is Created,
`actors which need data fromit must demand the element
`in question untilit has arrived, This May introduce heavy
`overheads as unsatisfied requests must be queuedin the
`structure itself, Garbage collection is automatically han-
`dled since whenthe “array token’is matched,it is auto-
`matically removed from the arc, In otherwords, whenit has
`beenused,itisswallowed byapplyingdata-flowprinciples,
`D. High-Level Data-Flow Languages
`In addition to the low-level mechanisms of execution
`which were described earlier, special high-level data-flow
`languages have been designed for €asier
`translation into
`data-flowgraphs. To be sure, these high-level languages are
`nota necessity: the Texas Instruments data-flow Project [31]
`relied upon Fortran Programming through the use of a
`modified version of a vectorizing compiler Originally des-
`tinedto the TI ASC. However, many high-level languages
`have been designedfor data-flow Prototypes. Most notable
`are VAL (Value

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket