`@ THE INSTITUTE OF ELECTRICALAND ELECTRONlCS ENGINEERS september 1987
`
`“SS" ONE-92$)
`
`0
`
`{PECIAL ISSUE ON
`
`‘33} - Ifi-i‘I 31331.1 -
`71“."1'1:
`E
`.-
`I
`-1'
`I!‘I
`n:-
`,.
`
`B 4
`
`'9II
`
`ag6ga
`
`s.,.
`'3
`In.
`4!Q
`
`
`
`g 5
`
`..
`
`lat-"Int" {IJM'KII
`
`I 33-3 2" 1-..
`NEW.
`1'-m.1-L._ (:11?-
`;a.=.-';'-:[+».L:s omit
`T
`'
`I
`“-
`‘.
`L'Vi‘i1i‘l’
`'-
`fihaéaificth J
`
`{MN l 35
`
`fi 02
`
`_
`
`'
`I
`fifl
`
`.04110
`
`
`
`Petitioner Microsoft Corporation - Ex. 1010, Cover 1
`Petitioner Microsoft Corporation - EX. 1010, Cover 1
`
`
`
`proceedings MEEE . '
`
`
`
`published monthly by the institute of Electrical and Electronics Engineers, inc.
`september 1987
`
`SPECIAL ISSUE ON
`
`HARDWARE AND SOFTWARE FOR DIGITAL SIGNAL PROCESSING
`
`Edited by Saniit K. Mitre and Kalyan Mondal
`
`1139
`
`SCANNING mr lSSUE
`
`PAPERS
`
`V15!
`
`The TMSBZB Family oi Digital Signal Processors, K's. lit), 6. A. Frantz, and R. Simar, Jr.
`1143
`1160 VLSI Processor for Image Processing, M. Strgai, A. Kanmna, K. Suzuki, and M. Kuhn
`116? Digital Signal Processor for Test and Measurement Environment, A. Kareem, C. L Saxe,
`f. Etheridge, and l). McKinney.
`i172 The Graph Search Machine (GEM): A VLSI Architecture For Connected Spec-ch Recognition and
`Other Applications, 5. C. Clinski, T. M. Laltrmia, D. R. Cassidatr, T. Koh, C. Cerveshi, G. A. Wilson,
`and ,-'. Kumar
`
`1185 DSPfirflUtt: An Algorithm-Specific Digital Signal Processor Peripheral, C. D. Hillman
`1192
`Parallel Bit-Level Pipelined VLSI Designs for High-Speed Signal Processing, M. Hatamian and
`G. l. Cash
`
`Systems
`
`1203
`
`A h >3 Ell-MHZ 'ln2al-Cl1annel FFF Cross-Spectrum Analyzer for Radio Astronomy, 1’. Citikada,
`M. lshiguro, H. Hiraliayashi, M. Morimoto, K-l. Morita, T. Kanzawa, H. twashita. K. Nakazima,
`fi-t. lsnikatm, T. Takahashi, K. Handa, T. Kasuga. S. Dkumura. T. t'tzliyazawa,
`'l'. Nakazurtr, K. Miura,
`and 5. Nagasawa
`1211 MUPSI: A Multiprocessor I'or Signal Processing, C. Bolch, F. Hol'inann. 8. Hoppe, C—U. Unster,
`R. Polzer, H. W. Schiissler, G. l-Vackersreutlter, and F-X. Worm
`1220 Data‘Driven Multicomputers in Digital Signal Processing, ,l-l.. Gatidi‘ot
`1235
`Synchronous Data Flow, E. A. Lee and D. G. Messerschmitr
`1246 Multiple Digital Signal Processor Environment for Intelligent Signal Processing, W. S. Cass,
`R. T.
`'l'arrant. B.
`l. Paware. M. Cammel, P. K. Raiasekarao, R. H. tt’iggms, and C. D. Cont-rngton
`CAD
`
`126i] Computer-Aided Design of VLSI FIR Filters. P. R. Cappello and CW. Wu
`1272
`A Silicon Fotnpiler for Digital Signal Processing: lt-lelhodology, lmpiementation, and Applications,
`F. F. Yassa, ,t. R. lasica, R.
`l'. Hartley, and S. E. Noufar'm
`Algorithms
`
`1283 Vectorized Mixed Radix Discrete Fourier Transform Algorithms, R. C. Agarwal and l. W. Cooley
`1293
`Implementation of Digital Filtering Algorithms Using Pipelinecl Vector Processors, W. Song and
`5. K. Mitre
`
`1304 Array Architectures for Iterative Algorithms, H. v. tagadr‘slt, S. K. Rao, and T. Kailath
`Software
`
`1322
`
`SlL‘.-A General-Purpose Signal Processing Program, 0. t. lager and S. C}. Azevedo
`
`PROCEEDINGS LETTERS
`
`1333 Cumulants: A Powerful Tool in Signal Pmcessing, C. B. Ciannakis
`1335
`A Novel Design of a Combinational Network to Facilitate FaLIIt Detection, P. R. Bhattacharjee,
`S. K. Basu, and l. C. Paul
`
`1336 Asynchronous Fiber Optic Local Area Network Using CDMA and Optical Correlation,
`M. A. Santoro and P. R. Prticnal
`1338 On a Constrained LMS Dither Algorithm, K. Cho and N. Ahmed
`
`llllllllllllllllllllllllllllllllllllllllllllSEPTEMBERllllllllllllllllllllllllllllllllll
`
`Petitioner Microsoft Corporation - Ex. 1010, Cover 2
`Petitioner Microsoft Corporation - Ex. 1010, Cover 2
`———.____________________________
`
`
`
`BOOK REVIEWS
`
`Spread Spectririii Systems. 2nd ed. by R. C. Dixon. reviewed by L H. Sibiii
`134i
`I342 RC Active Filter Design Handbook. by F. W. Stephenson. reviewed by 5. Nata
`1342 Book Alert
`1344
`ruruke srectat. IssuEs-‘srsctat. sections or THE PROCEEDINGS
`
`rain ii
`
`COVER A family of successive digital signal processors (overlaying aprogrammabie bit—level pipelined MAC
`chip) representing an ll‘l'l
`al issue.
`portant development within the topic area of this speci
`
`T987 PROCEEDINGS EDITORlAL BOARD
`M. l. Skolnik. Editor
`
`R L' Abrams
`C N. Bergltind
`G. M. Borsuk
`K. R. Carver
`R. E. Ci'ochiere
`J. C). Dirnmock
`R C. Dixon
`Tse—yun Feng
`A. L. Frisiani
`G. T. Hawle
`W. K. Kahn
`Sherman Karp
`5. 5. Lam
`R. D. Masieilo
`
`Ruey-wen Liu
`
`1' L' Melsa
`R. M. Mersereati
`l. D. Musa
`T. C. Pilkington
`M. B. Pursley
`R. l. Ringlee
`A. C. Schell
`Gene Stroll
`Yasuharu Sueinatsu
`Kiyo Ton‘it'yasti
`Sven Treitel
`Harry Urkowitz
`A. S. Willsky
`M. ]. Wozny
`
`I. C. Wiltse
`
`PROCEEDINGS STAFF
`Hans P. Leander. Technical Editor
`Nela Rybowicz. Associate Editor
`
`1987 IEEE PUBLICATIONS BOARD
`
`C. H. House, Chairman
`N. R. Dixon, Vice Chairman
`
`D. L. St'aiger, StaffSecretary
`J. i. Baldini
`G. F. McClure
`Donald Christiansen
`H. P. Meisel
`Robert Coteilessa
`T. Pavlidis
`S. H. Durrani
`H. B. Rigas
`Bruce Eisenstein
`T. E. Sharp
`living Engelson
`D. H. Sheingold
`W. C. Farrell
`Marwan Simaan
`Sheldon Gruber
`M. |. Skolnilr
`W. K. Kahn
`Jerome Suran
`Philip Lopresti
`R. C. Williamson
`
`HEADQUARTERS STAFF
`
`Eric Herz, Executive Director and General
`Manager
`
`Elwood K. Gannett, Deputy General
`Manager
`PUBLISHINEI SERVICES
`David L. Staiger. Staff Director
`Elizabeth Braharn, Manager
`Database/information Services
`W. R. Crone Manager,
`tEEE PRESSRPit-ocrtomosor m! iEEE
`Patricia H. Penick. Manager, Publication
`Administrative Services
`Otto W. Vathke. Publication Business
`Manager
`Ann H. Burgrneyer. Gail S. Peri-inc,
`
`Carolvne Tamney. Production Managers
`onrsnsirrc
`William R. Saunders, Advertising Director
`Ralph Obert. Advertising Production
`Manager
`
`bythe'tstofamo
`mailing label slit:
`
`nth to be effective tor the following month‘s issue. Send new atltiress. plus
`tiring old address. to the IEEE Service Center.
`
`eaaiaaeaasmlHillllllllllllllllllllllllll
`llIlIlIlIllllllllllllllllllllllllllllllllllll
`
`PROCEEDINGS OF THE IEEE is published monthly by The Institute of £Ieclrica| and Elec-
`tronics Engineers. Inc. IEEE Headquarters: 345 East Nth Street. New York, NY lat] lit-23%. iEEE
`Service Center [For orders. subscriptions. and address changesi: 445 Hoes Lane. I’iscatawriy.
`Ni 03054-4 I50. Telephones: Technical Editor 2 I2J05-79t16; Publishing Services 2 rues-75w;
`IEEE Service Center 20I-‘Jt1'I-IJIJ6I}: Advertising 2i2-705-i‘579. Copyright and Reprint Permis-
`sioits: Abstracting is permitted with credit to the source. Librariesare permitted to photocopy
`beyond the limits of the U.S, Copyright Law for private use at patrons: ti] those post-19??
`articles that carry a code at the bottom of the first page. provider] the per-copy tee indicator!
`in the code is paid lhrousit the Copyright Clearanre Center. 2‘) Congress Street. Salem. MA
`I] will); IZI pred‘ifll articles without fee. Instructors are permitted to photocopy isolated arti—
`cles ior noncominerciai classroom use without t‘ee. For all other copying. reprint or repubv
`iicatinn permission. u'rlle to Copyrights and Permissions Department. IEEIE Publishing Ser-
`vices. 343 East 47th Street. New York. NY TDD Ill—2394. Copyright E 195? by The Institute oi
`Electrical and Electronics Engineers. Inc. All rights reserved. Printed in U-S.r\. Second-class
`postage paid at New York. NY and at additional mailing offices. Postmaster: Send address
`outfit-415:1.
`changes to PROCEEDINGS OF THE lEEE. IEEL’ Setvicl’ Center. 445 Hoes Lane. Piscataway, NJ
`
`Headquarters.
`Advertising correspondence should he addressed to the Advertising Department at
`
`iEEE
`
`ofinstrtrctionslm
`Manuscripts should be submitted in triplicate lo the Editor at iEEI.’ Headquarters. A summer-,-
`preparation is found in the most recent tan uaryissue oilhis in urnal. Detailed
`instructions are contained in "Information for IEEEAttthors," available on i‘L‘thr-SI. See note
`at beginning at “l'rDCc-crlinss Letters
`' for special instructions for this section. Alter a mark
`inscripl has been accepted for publication, the author's organization will be requester! to pay
`a voluntary charge oi Stilt per printed page to rover pat! oi IlII' publication cost. Respon-
`sibility tor contents of papers rests upon the authors and not the IEtE or its members.
`Copyright: It
`is th
`publishes on beh
`e. policy at the IEEE to own IllE copyright to the technical t‘LInIfilJUlIfli’lS it
`the appropriate r
`aliol' the interests of the IEEE. its authors and employers. and to facilitate
`AnnualSuhscriplion: Member and nonmemher subscription pricesavailable on request. SinA
`authors are rcqui
`euse at this material by others. To comply with the US. Copyright Law.
`gle Copies: IEEE members SIOJJD tfirst copy only]. nonmembers 520.00 per copy. tNute: Add
`red to sign an IEEE copyright transfer torn: heiore publication. This term.
`EWIJJIJ for postage and handling charge to any order from $1.00 to $50.00. including prepaid
`a copy oiwhirli is tound in the most recent January issue of this journal. returns lo authors
`orders.) Other: Available in microfiche and microfilm. Change of address must be received
`and their employers full rights to reuse their material for their own purposes. Authors must
`submit a signed copy of this form with their manuscripts.
`
`CONTRIBUTED Papers
`The PROCEEDJNGSDFTHE tEEEwelcornes iorconsideration technical
`the topic and
`papers on topics of broad significance and long-range interest in
`posed paper, how it meets the above criteria. an outline oi the pro-
`all areas of electrical, electronics, and computer engineering. In-
`and a brief biography showing the author‘s quali—
`depth reviews and lutorials are particularly appropriate, although
`fications to write the paper {including reference to previously pub-
`lished material as well as the author’s relation to the to
`pic). II' the
`proposal receives a favorable review.
`the author will be encour~
`aged to prepare the paper. which after submittal will go through
`the normal review proeess.
`Please send papers and propOsals to the Technical Editor.
`PRDCEEDINGS or THE IEEE, 345 East 47th Street, New York, NY 10017-
`2394. USA (Telephone: 212-705-7906}.
`
`
`Petitioner Microsoft Corporation - Ex. 1010, Cover 3
`etitioner Microsoft Corporation - Ex. 1010, Cover 3
`P
`_
`________________________________
`
`
`
`Data-Driven Multicomputers in Digital
`Signal Processing
`
`
`
`JEAN-LUC GAUDIOT, MEMBER,
`
`lEEE
`
`New technologies of integration allow the design of powerful
`systems which may include several thousands of elementary pro—
`cessors. These multiprocesSors may be used (or a range of appli-
`cations in signal and data processing. However, assuring the proper
`interaction of a large number of processors and the ultimate safe
`execution of the user programs presents a crucial scheduling prob-
`lem. The scheduling of operations upon the availability of their
`operands has been termed the data-driven mode of execution and
`offers an elegant solution to the issue. This approach is described
`in this paper and several architectures which have been proposed
`or implemented (systolic arrays, data-flow machines, etc.) are
`examined in detail. The problems associated with data-driven exev
`cation are also studied. A multi-level approach to high-speed dig-
`ital signal processing is then evaluated.
`
`l.
`
`lNTRODUCTION
`
`if we are to approach the computational throughputs
`equivalent to billions of instructions per second which will
`be required from the processing systems of the future,
`improvements on all levels of computer design must be
`made. Fastertechnologv and better packaging methods can
`be applied to raise clock rates. However, a one billion
`instructions per second machine would require a clock
`period as low as a nanosecond. This approach is inevitably
`bounded by physical limits such as the speed of light.
`Therefore,
`instead of considering the technological
`approach to performance improvement, we emphasize
`here the architectural method. indeed, instead of merely
`increasing the clock frequency for a corresponding increase
`in overall throughput, performance can also be improved
`by allowing multiple processing elements to collaborate on
`the same program. This inevitably introduces synchroniz—
`ation problems, and issues of resource allocation and shar-
`ing must be solved. Programmability is indeed the central
`problem. In one solution, a conventional language such as
`Fortran is used to program the application. A' sophisticated
`compiler is relied upon to partition a sequential program
`for execution on a multiprocessor. This approach has the
`
`Manuscript received September rt, 1986,- revised JanuaryI 23, 1987.
`This work was supported in part bvthe Department olEnergy under
`Grant DE-FGO3-HFER 25043. The views expressed in this paper are
`not necessarily endorsed by the U.S. Department of Energy.
`The author is with the Computer Research Institute, Department
`of Electrical Engineering—Systems, University of Southern Cali-
`iornia, Los Angeles, CA 90059, USA.
`iEEE Log Number 8716208.
`
`advantage of imposing no “software retooling.” However,
`complex numerical applications will not be easily parti-
`tioned and much potential parallelism may remain unde~
`tected by the compiler.
`Ada, CSP [26}, extended Fortran le.g., HEP, Sequent], on
`the other hand, allow the programmer to deal with parallel
`processes by the use of primitives for parallel task spawn—
`ing, synchronization, and message passing. However, while
`the'programmer can express some of the parallelism char-
`acteristic of the application, much potential concurrency
`may never be uncovered because of the inherent sequen-
`tial concepts of the language which must be countered
`through the use of Special ”parallelism spawning" instruc-
`tions.Also,developmenttimebecomesimportantsincethe
`programmer must ”juggle“ with many parallel tasks to svn~
`chronize.
`in addition, debugging becomes correspond—
`ingly more difficult clue to the sometimes undeterministic
`appearance of errors.
`For these reasons, an implicitapproach must be devised.
`In the above two methods, instruction scheduling is based
`upon a central program counter. We propose to demon—
`strate here the data-driven approach to programming mul-
`tiprocessors: instructions can be scheduled by the avail:
`ability oftheiroperands. This model of execution is a subset
`of the functional model of execution [91. it provides a sig—
`nificant improvement to the programmabilitv of multipro-
`cessors by excluding the notion of global state and intro-
`ducing the notion of values applied to functions instead of
`instructions fetching the contents of memory cells as they
`are in the conventional ”control-flow” model.
`The overall objective of this paper is to demonstrate the
`applicability of data-driven principles of execution to the
`design of high-performance signal and data processing
`architectures. Several approaches will be demonstrated and
`their particular domain of application will be contrasted.
`The description of low—level processing systems is beyond
`the scope ofthis paper and the interested reader is referred
`to an excellent survey by Allen {3]. instead, we will con-
`centrate here on the i55ues related to building l1igl1~per—
`formance multiprocessors for signal processing applica—
`tions.
`in Section ll, we show the type of problems
`considered in signal processing. The data-flow principles
`of execution as they relate to digital signal processing prob-
`lems are described in detail in Section ill while several exist-
`
`lill|B-9219l3?l09Dfl-122l15lli.DD '1' 19B? lEEE
`
`'l22fl
`
`PROCEEDlNCS OF THE IEEE. VOL. 75, NO. 9, SEPTEMBER 1983’
`
`Petitioner Microsoft Corporation - Ex. 1010, p. 1220
`Petitioner Microsoft C0 oration - Ex. 1010, p. 1220
`
`
`
`int: data-driven architectures are described in Section N.
`In Section V, we analyze a multi~level data~driven archi-
`nit-lure and examine its programming environment. Con~
`t'lusions are drawn in Section VI.
`
`ll. Tut. Risoumrxttms or SIGNAL Pnocsssrno
`
`Digital signal processing techniques are applied to many
`different technical problems. These include radar and sonar
`systems, image processing, speech recognition. etc. The
`elementary building blocks of these were originally con—
`wrnrated on such tasks as convolution, correlation, and
`Fourier transform. More complex algorithms (matrix open
`ations, linear systems solvers, etc.) are now considered.
`Higher order operations include not only simple problems
`such as elementary filtering (llR, FlR, etc.], but also more
`complex functions such as adaptive and Kalman filtering
`[45]. Also, such complex problems as Computer—Aided
`Tomography or Synthetic Aperture Radar can be consid-
`ered (391, [16]. Signal processing algorithms are very approv
`priale for description by functional languages. Indeed, a
`signal processing algorithm is often represented in a graph
`form [36] and can be decomposed in two levels:
`
`1
`
`-
`
`a regular level which can be implemented by a vector
`operation tie, a loop in which all iterations present no
`dependencies among themselves);
`a level which contains conditional operations and heu-
`ristic decision making.
`
`this description shows that the lower operational levels
`can easily deliver parallelism (by compiler analysis or pro-
`grammer inspection}. This layer usually consists of simple
`constructs [arithmetic instructions, FFT butterfly networks,
`simple filters, etc}. However, the higher levels will require
`more complex problem insight and even runtime depen—
`dency detection in orderto allow maximum parallelism. We
`will now describe principles of execution which will allow
`us to deliver this concurrency.
`
`III. DATA-FLOW PRINCIPLES
`
`The data-flow solution to the programmability problems
`of large-scale multipmcessors [5] has been pioneered by
`Adams [2], Chamberlin [l1], and Rodriguez [43}. {t is now
`described in detail in this section.
`
`A, Basic Principles of Execution
`
`in the conventional von Neumann model of execution,
`an instruction is declared executable when a Program
`Counter of the machine points to it. This event is usually
`underdirectprogrammercontrol.Whileacontrol—flowpro-
`gram is a sequential listing of instructions, a data-flow pro—
`gram can be represented as a graph where the nodes are
`the instructions (actors) which communicate with other
`nodes over arcs (Fig. 1). An instruction is declared execut~
`able when it has all its operands. In the graph represen—
`tation chosen above, this means that all the input arcs to
`an actor must carry data values (referred to as tokens) before
`this actor can be executed. Execution proceeds by first
`absorbing the input tokens, processing the input values
`according to the 0p. code ofthe actor, and accordingly pro-
`ducing result tokens on the output arcs. in summary, it can
`
`C-AUUlfJT: oATA-DRNEN .x-tutricompurtas'in ose,
`
`
`
`Fig. l. A simple data—flow graph.
`
`be said that the data-flow model of execution obeys two
`fundamental principles:
`
`' Asynchrony of operations: The executability of an
`instruction is decided by a local criterion only. The pres-
`ence of the operands can be sensed ”locally” by each
`instruction. This is an attractive property for an implemen»
`tation in a distributed environment where no central con—
`
`troller should be used for global scheduling.
`- Functionality of the operations: The effect of each
`operation is limited to the production of results to be con-
`sumed byaspecific number ofother actors. This precludes
`the existence of ”side-effects." These side-effects may be
`long—ranging in that the execution of an instruction may
`effect the state of a cell of memory which will be used only
`much later by another unrelated operation.
`
`B. Data-Flow interpreters
`
`When iterations are executed, the underlying principle
`of data-flow (single assignmen l ofvariables) must invariably
`be violated. Indeed, for an actor to be repeatedly evaluated
`as in an iteration, its input arcs must carry several tokens
`(from different iterations). Several solutions have been pro—
`posed to allow the controlled violation of these rules with—
`out compromising the safe execution of
`the program.
`Among these,
`the Acknowledgment scheme and the
`U—interpreter have been given the most consideration.
`1‘) Acknowledgment Scheme I'M].- Proper matching of the
`tokens can be observed by ordering the token production.
`Thiswould be done byacareful design ofthe program graph
`so as to insure that tokens of two different iterations can
`
`never overtake each other. In addition, it must be guar-
`anteed that no token pileup is encountered on any one arc.
`This condition can be verified by allowing the firing of an
`actor when tokens are on all input arcs and there are no
`tokens on any output arcs. In order to enforce this last conv
`dition, an acknowledgment must be sent by the succes-
`sortsl to the predecessor when the token has been conv
`sumed (Fig. 2). Note that an actor is executable when it has
`received its input arguments as well as all acknOwIedg—
`ments. The parallelism which can be exploited from this
`scheme is mostly pipeli ning between the actors of different
`iterations. Thus when the number of instructions in the
`
`body of an iteration is the same as the number of available
`processors, the speedup observed by this mechanism of
`execution is maximal. However, for small iterations (com-
`
`pared to the size of the machine), the exploited parallelism
`
`Petitioner Microsoft Corporation - Ex. 1010, p. 1221
`Petitioner Microsoft Corporation - Ex. 1010, p. 1221
`_.__________________________________
`
`
`
`‘ACK‘
`
`Fig. 2. The acknowledgment scheme.
`
`
`W}
`
`
`
`Ir':
`
`(UJnterpreter): The
`lnterpreter
`U—interpreter [4] provides the most asynchronous possible
`operation. In order to allow safe execution of actors in an
`iterative construct, tokens are tagged with information per-
`
`be found. This tag includes the iteration number. indeed,
`the U-interpreter closely follows these principles: to each
`data token is attached a tag of the form u.P.s.l, where Piden-
`tifies the procedure name of the destination actor, while 3
`
`graph. The function of this actor is to create a new context
`for the execution of the iteration: the input tokens are
`tagged by u.P.s.l while the output tokens are identical but
`are tagged with u'.P.t’.t where u’ is itselfu.l. Note that this
`mechanism is sufficient to create an entirely different set
`of tokens fortwo nested iterations. lndeed, assume that two
`tokens both belong to the same inner iteration l‘but belong
`to the f, and,"2 outer iterations respectively. The first token
`
`'[Tl‘z
`
`Fig. 3. A typical iterative construct in the U-interpreter.
`
`would be tagged u1.P.s.itu1 = ml.) while the second is
`tagged with u2.P.s.l(u2 = uJZ}. This shows that an appro—
`priate differentiation has been made between the two
`instances. The originalcontext u is retrieved bythe l ' ' actor
`before exiting.
`Contrarily to the acknowledgment scheme, this dynamic
`data-flow scheme allows full asynchronous execution of the
`
`without compiler-inducer! replication of the graph. Like-
`wise, multiple function calls and more particularly recur—
`sions are allowed since each newactor instantiation receives
`a different tag. This means that the U-interpreter would be
`
`expense of added hardware complexity. lndeed, it will be
`shown in Section lV-D that implementation of the U-inter—
`
`ESL DDSP [28], the University of Manchester machine [23],
`the ETL Sigma-1 [25], etc.
`
`C. Structure Handling
`This is a crucial issue in signal processing for this kind
`of application requires that many data elements which
`belong to the same structure be processed in a parallel or
`pipelined fashion. One of the basic premises of data—flow
`principles states that an output is a function of its inputs
`only, regardless of the state of the machine at the time of
`
`several states. Instead, it any updates are needed, a new
`array which contains the new elements must be created.
`Copying of all elements must be undertaken for the mod-
`ification of a single one. This solution imposes an inordi—
`
`F‘ROCEEDINCS OF THE lEEE. VOL, 75. NO. 9, SEPTEMBER 1937
`
`Petitioner Microsoft Corporation - Ex. 1010, p. 1222
`P
`etitioner Microsoft Corporation - Ex. 1010, p. 1222
`__________________________—__—__——
`
`
`
`preserving lhemeaning ofthe program during array update
`operations»
`y)
`l-ieaps.‘ Dennis [14] has proposed 10 represent arrays
`by directed acyclic graphs (also called heaps). Each value
`is represented as the leaf of a graph tree. The modification
`of a single element in a heap is represented in Fig. 4. Note
`
`A
`
`a"
`
`L:
`Li
`Fig. 4. A heap update.
`
`1.3
`
`2.!
`
`2.2
`
`23
`
`'3?
`
`that the complexity of the modification of asingle element
`of the array '5 0th for 3 COPY operation, While 't
`*5
`Otlog ”l tor the heap. Several instructions are EXClUS'VElY
`devoted to the access of heaps [15]: SELECT receives a
`pointer to the root node, an index value, and returns a
`pointerto the substructure (which may bea leaf node) asso—
`ciated with the index; APPEND also needs the same two
`operands in addition to the value of the element to append
`to the structure.
`2} i-Structures: A heap must be entirely ready before it
`can beconsumed because noconsumption (SELECT actors)
`can take place until the pointer token appears ii.e., the cre-
`ation of the array is completed). In the I-structure scheme
`[7] constraints on the creation of arrays allow the selection
`of individual elements (or substructures) from the array
`before its complete production. One possible implemen—
`tation of l-structures makes use of a “presence” bit which
`indicates when an element of an array has been calculated
`and is ready for consumption. An attempt to read an empty
`cell would cause the read to be deferred until such time that
`the cell presence bit is marked. Conversely, a write into a
`cell, the presence bit of which indicates valid stored data,
`could be cause for the generation of an error signal. The
`advantages of this scheme are:
`' better performance because pipelining is allowed
`between i—structu re consumers and producers;
`'
`less "serialization” of operations such as APPENDs,
`because they are allowed to occur independently on
`the same structure.
`
`3) HDFMArrays:A special schemefor handling arrays in
`a VAL high-level environment has been designed for the
`Hughes Data—Flow MachinetHDFM)[2‘l]. ltusesthefactthat
`data-flow arrays as described above are overly ”asynchro-
`nous," i.e., they do not take advantage of the data depen~
`clency information carried by the program graph. Safety of
`accesses is respected by not allowing the updating of an
`array before all the reads from the current version of the
`array have been performed. Only then can the array be
`directly modified. Safety and correct execution of WRITE
`operations are a compile~time task. This has the advantage
`of reducing the number of memory accesses too complex
`graph of pointers must be travered as in heaps) as well as
`of offeringa better possibility ofdistribution ofan arraytno
`root node). However, spurious data dependencies may be
`introduced because the compiler is not necessarily aware
`
`of the possibility of parallelism that can be detected only
`at runlinie. For instance, dependencies: on A and 8 related
`by rltFtiii = b‘iil may be artificially imposed. However, the
`applications largettecl by the HDFM include some amount
`of regularity which can be easily detected by the compiler
`and implemented as conventional arrays.
`4} Token Reiabeiing {18}: in the U-lnterpreter, the notion
`of array can be entirely ignored at the lowest level of exe-
`cution. instead, the tag associated with each token under
`the rules of the U-interpretation is used as identification of
`the index of the arrayeiement of the high-level language.
`In other Words, it can be simply said that, when an array A
`is created, its Ail) element will be tagged with i thereafter
`denoted/40h”), if the elements are produced in the logical
`order. in the ”production” of a unidimensional array, the
`iteration number can usually be directly interpreted as the
`index of the array element just produced by the iteration.
`Special token relabeling program graphs can be created
`to handle scatter and gather program constructs [27] (Fig.
`5(a)]. This figure shows that an inversion function F—I
`
`Allllllh ]
`}
`”4‘, m ”9
`‘ All!
`gm,”
`l‘""”
`
`M
`
`3mm
`\\
`
`lCtti
`l‘l
`no: 1&qu
`cm == um + .-ifF(l’)_i
`is)
`
`2
`
`'{n-ti'I'F'lit-‘ei
`'
`‘5.
`
`NI” ‘
`-’ ”j
`
`it"
`:3inme
`“'1ng A,”
`E) “4"”
`_r
`i
`{ll [it
`(bi
`ram gather operation. (b) Token Relabeling gather.
`
`H
`
`Fig. 5.
`
`This demonstrates that, without recourse to the calculation
`of F", the proper relabeling of the A elements has been
`effectively produced.
`This algorithm requires no intermediary storage, does
`not need array operations, and imposes smaller hardware
`and execution overhead. This relabeling approach elimi-
`nates a large portion of the overhead associated with the
`production and consumption ofarrayA. Pipelining between
`the source and the sink of a data structure is the goal of this
`unknown at compiletime would be needed to perform the
`relabeling of data-flow tokens. Such acalculation is nottruly
`necessary. Instead, we introduce (Fig. 5(bi)asequence gen-
`erator'vvhich producesthe Fiji's, tagged byfinn exchanger
`actor [called xi swaps the tag and the data value and pro—
`duces ,imm. Both streams (the A’s and finial are input to a
`special relabeling actorfi which only modifies the iteration
`portion of the tag. By the principles of the U—interpreter,
`only tokens which bear the same tag will be matched as
`proper inputs to the 5 actor. In other words, the mate of
`
`GAUDiO‘I': DATA-DRIVEN h-lULTlCOMPUT'EltS IN use
`
`1223
`
`Petitioner Microsoft Corporation - Ex. 1010, p. 1223
`Petitioner Microsoft Corporation - Ex. 1010, p. 1223
`—\
`
`
`
`their domain of application
`to design or compile time and
`refer to them as data—driven systenis._\fife thus init
`ially exam-
`ine multiprocessor systems where dat
`; a dependencies have
`been I'rozen at design time (systolic arrays}. We then con»
`sider programmable
`systolic arrays (the Wavefront Array
`Processor) and multiprocessors scheduled at c
`ompiie time
`by the use of data~flow program graphs (the ESL polycyclic
`processor). Finally, we study systems where th
`dencies provide scheduling information at
`Hughes Data-Flow Machine) and examine the '
`
`cessing problems, matrix operations, image processi
`In summary, a systolic array is simply a collection of inter-
`connected Processing
`Elements tPEs).
`porate as many processors as possible, t
`
`_ W ”rm”. me special actor f: is
`u ieiaoelmg actor which takes the A input and relabels it
`with the data carried by the token on the other input an".
`in other words, it outputs Ati‘tr'i)-,. Since i is a dummy van
`fable, and since Fis bijective, it can be said that, on a global
`
`point of view:
`
`AiFillis.
`*3 Alkljr lat-it-
`scheme, just as it was the idea behind the design of the
`l-structures. Hottvever, thetoken relabeling approach brings
`a better runtime memory management since toltens cor-
`_
`tents of the array still exist
`and must still be temporarily stored, they need not go
`through an additional storage as a data structure. Also, there
`is no need for "requests” for data as would be the case in
`an l~structure environment: When an I-structure is created,
`actors which need data from it must demand the element
`in question until it has arrived. This may introduce heavy
`overheads as unsatisfied requests must be queued in the
`structure itself. Garbage collection is automatically han-
`dled since when the “array token" is thatched, it
`is auto—
`matically removed from the arc. In other words, when it has
`been used, it is swallowed byapplying data-flow principles.
`U. High-Level Data-Flow Languages
`ln addition to the low~levei mechanisms of execution
`which were described earlier, special high—level data-flow
`languages have been designed for easier translation into
`data—[lowgraphs. Tobe sure. these high—level languages are
`not a necessity: the Texas lnstruments data—flow project [.31]
`upon Fortran programming through the use of a
`modified version of a vectorizing compi
`
`are VAL {Value Algorithmic Languages) for the MIT static
`data-flow project [37], [‘t], id llrvine Datailow} for the AMT
`tagged token data-flow architecture [4], LUCID [8], [30], etc.
`SISAL (Streams and Iterations in Single Assignment Lan—
`guage) has been designer! by McGraw and Skedzieiewski
`[38] and is intended as the definition of a "universal” lan~
`guage for the programming of future multiprocessors.
`Data—flow languages have also been defined for the spe—
`cific purpose of programming signal processing applica—
`tions. These include the SlGNAL language designed by
`Le Cuernic et all. [36]. The intent of the language is to pro—
`vide a tormal specrttcatton of signal processing problems
`
`
`
`Fig. 6. A linear systolic array.
`
`opposed to asynchronous languages such as CSP and
`Occam[101.5DFt5ynchronous Data Flow) is another formal
`description of signal processing algorithms based on data-
`driven principles of execution proposed by Lee and Mes-
`
`serschmitt [35].
`
`DATAvDR