throbber
EEE
`@ THE INSTITUTE OF ELECTRICALAND ELECTRONlCS ENGINEERS september 1987
`
`“SS" ONE-92$)
`
`0
`
`{PECIAL ISSUE ON
`
`‘33} - Ifi-i‘I 31331.1 -
`71“."1'1:
`E
`.-
`I
`-1'
`I!‘I
`n:-
`,.
`
`B 4
`
`'9II
`
`ag6ga
`
`s.,.
`'3
`In.
`4!Q
`
`
`
`g 5
`
`..
`
`lat-"Int" {IJM'KII
`
`I 33-3 2" 1-..
`NEW.
`1'-m.1-L._ (:11?-
`;a.=.-';'-:[+».L:s omit
`T
`'
`I
`“-
`‘.
`L'Vi‘i1i‘l’
`'-
`fihaéaificth J
`
`{MN l 35
`
`fi 02
`
`_
`
`'
`I
`fifl
`
`.04110
`
`
`
`Petitioner Microsoft Corporation - Ex. 1010, Cover 1
`Petitioner Microsoft Corporation - EX. 1010, Cover 1
`
`

`

`proceedings MEEE . '
`
`
`
`published monthly by the institute of Electrical and Electronics Engineers, inc.
`september 1987
`
`SPECIAL ISSUE ON
`
`HARDWARE AND SOFTWARE FOR DIGITAL SIGNAL PROCESSING
`
`Edited by Saniit K. Mitre and Kalyan Mondal
`
`1139
`
`SCANNING mr lSSUE
`
`PAPERS
`
`V15!
`
`The TMSBZB Family oi Digital Signal Processors, K's. lit), 6. A. Frantz, and R. Simar, Jr.
`1143
`1160 VLSI Processor for Image Processing, M. Strgai, A. Kanmna, K. Suzuki, and M. Kuhn
`116? Digital Signal Processor for Test and Measurement Environment, A. Kareem, C. L Saxe,
`f. Etheridge, and l). McKinney.
`i172 The Graph Search Machine (GEM): A VLSI Architecture For Connected Spec-ch Recognition and
`Other Applications, 5. C. Clinski, T. M. Laltrmia, D. R. Cassidatr, T. Koh, C. Cerveshi, G. A. Wilson,
`and ,-'. Kumar
`
`1185 DSPfirflUtt: An Algorithm-Specific Digital Signal Processor Peripheral, C. D. Hillman
`1192
`Parallel Bit-Level Pipelined VLSI Designs for High-Speed Signal Processing, M. Hatamian and
`G. l. Cash
`
`Systems
`
`1203
`
`A h >3 Ell-MHZ 'ln2al-Cl1annel FFF Cross-Spectrum Analyzer for Radio Astronomy, 1’. Citikada,
`M. lshiguro, H. Hiraliayashi, M. Morimoto, K-l. Morita, T. Kanzawa, H. twashita. K. Nakazima,
`fi-t. lsnikatm, T. Takahashi, K. Handa, T. Kasuga. S. Dkumura. T. t'tzliyazawa,
`'l'. Nakazurtr, K. Miura,
`and 5. Nagasawa
`1211 MUPSI: A Multiprocessor I'or Signal Processing, C. Bolch, F. Hol'inann. 8. Hoppe, C—U. Unster,
`R. Polzer, H. W. Schiissler, G. l-Vackersreutlter, and F-X. Worm
`1220 Data‘Driven Multicomputers in Digital Signal Processing, ,l-l.. Gatidi‘ot
`1235
`Synchronous Data Flow, E. A. Lee and D. G. Messerschmitr
`1246 Multiple Digital Signal Processor Environment for Intelligent Signal Processing, W. S. Cass,
`R. T.
`'l'arrant. B.
`l. Paware. M. Cammel, P. K. Raiasekarao, R. H. tt’iggms, and C. D. Cont-rngton
`CAD
`
`126i] Computer-Aided Design of VLSI FIR Filters. P. R. Cappello and CW. Wu
`1272
`A Silicon Fotnpiler for Digital Signal Processing: lt-lelhodology, lmpiementation, and Applications,
`F. F. Yassa, ,t. R. lasica, R.
`l'. Hartley, and S. E. Noufar'm
`Algorithms
`
`1283 Vectorized Mixed Radix Discrete Fourier Transform Algorithms, R. C. Agarwal and l. W. Cooley
`1293
`Implementation of Digital Filtering Algorithms Using Pipelinecl Vector Processors, W. Song and
`5. K. Mitre
`
`1304 Array Architectures for Iterative Algorithms, H. v. tagadr‘slt, S. K. Rao, and T. Kailath
`Software
`
`1322
`
`SlL‘.-A General-Purpose Signal Processing Program, 0. t. lager and S. C}. Azevedo
`
`PROCEEDINGS LETTERS
`
`1333 Cumulants: A Powerful Tool in Signal Pmcessing, C. B. Ciannakis
`1335
`A Novel Design of a Combinational Network to Facilitate FaLIIt Detection, P. R. Bhattacharjee,
`S. K. Basu, and l. C. Paul
`
`1336 Asynchronous Fiber Optic Local Area Network Using CDMA and Optical Correlation,
`M. A. Santoro and P. R. Prticnal
`1338 On a Constrained LMS Dither Algorithm, K. Cho and N. Ahmed
`
`llllllllllllllllllllllllllllllllllllllllllllSEPTEMBERllllllllllllllllllllllllllllllllll
`
`Petitioner Microsoft Corporation - Ex. 1010, Cover 2
`Petitioner Microsoft Corporation - Ex. 1010, Cover 2
`———.____________________________
`
`

`

`BOOK REVIEWS
`
`Spread Spectririii Systems. 2nd ed. by R. C. Dixon. reviewed by L H. Sibiii
`134i
`I342 RC Active Filter Design Handbook. by F. W. Stephenson. reviewed by 5. Nata
`1342 Book Alert
`1344
`ruruke srectat. IssuEs-‘srsctat. sections or THE PROCEEDINGS
`
`rain ii
`
`COVER A family of successive digital signal processors (overlaying aprogrammabie bit—level pipelined MAC
`chip) representing an ll‘l'l
`al issue.
`portant development within the topic area of this speci
`
`T987 PROCEEDINGS EDITORlAL BOARD
`M. l. Skolnik. Editor
`
`R L' Abrams
`C N. Bergltind
`G. M. Borsuk
`K. R. Carver
`R. E. Ci'ochiere
`J. C). Dirnmock
`R C. Dixon
`Tse—yun Feng
`A. L. Frisiani
`G. T. Hawle
`W. K. Kahn
`Sherman Karp
`5. 5. Lam
`R. D. Masieilo
`
`Ruey-wen Liu
`
`1' L' Melsa
`R. M. Mersereati
`l. D. Musa
`T. C. Pilkington
`M. B. Pursley
`R. l. Ringlee
`A. C. Schell
`Gene Stroll
`Yasuharu Sueinatsu
`Kiyo Ton‘it'yasti
`Sven Treitel
`Harry Urkowitz
`A. S. Willsky
`M. ]. Wozny
`
`I. C. Wiltse
`
`PROCEEDINGS STAFF
`Hans P. Leander. Technical Editor
`Nela Rybowicz. Associate Editor
`
`1987 IEEE PUBLICATIONS BOARD
`
`C. H. House, Chairman
`N. R. Dixon, Vice Chairman
`
`D. L. St'aiger, StaffSecretary
`J. i. Baldini
`G. F. McClure
`Donald Christiansen
`H. P. Meisel
`Robert Coteilessa
`T. Pavlidis
`S. H. Durrani
`H. B. Rigas
`Bruce Eisenstein
`T. E. Sharp
`living Engelson
`D. H. Sheingold
`W. C. Farrell
`Marwan Simaan
`Sheldon Gruber
`M. |. Skolnilr
`W. K. Kahn
`Jerome Suran
`Philip Lopresti
`R. C. Williamson
`
`HEADQUARTERS STAFF
`
`Eric Herz, Executive Director and General
`Manager
`
`Elwood K. Gannett, Deputy General
`Manager
`PUBLISHINEI SERVICES
`David L. Staiger. Staff Director
`Elizabeth Braharn, Manager
`Database/information Services
`W. R. Crone Manager,
`tEEE PRESSRPit-ocrtomosor m! iEEE
`Patricia H. Penick. Manager, Publication
`Administrative Services
`Otto W. Vathke. Publication Business
`Manager
`Ann H. Burgrneyer. Gail S. Peri-inc,
`
`Carolvne Tamney. Production Managers
`onrsnsirrc
`William R. Saunders, Advertising Director
`Ralph Obert. Advertising Production
`Manager
`
`bythe'tstofamo
`mailing label slit:
`
`nth to be effective tor the following month‘s issue. Send new atltiress. plus
`tiring old address. to the IEEE Service Center.
`
`eaaiaaeaasmlHillllllllllllllllllllllllll
`llIlIlIlIllllllllllllllllllllllllllllllllllll
`
`PROCEEDINGS OF THE IEEE is published monthly by The Institute of £Ieclrica| and Elec-
`tronics Engineers. Inc. IEEE Headquarters: 345 East Nth Street. New York, NY lat] lit-23%. iEEE
`Service Center [For orders. subscriptions. and address changesi: 445 Hoes Lane. I’iscatawriy.
`Ni 03054-4 I50. Telephones: Technical Editor 2 I2J05-79t16; Publishing Services 2 rues-75w;
`IEEE Service Center 20I-‘Jt1'I-IJIJ6I}: Advertising 2i2-705-i‘579. Copyright and Reprint Permis-
`sioits: Abstracting is permitted with credit to the source. Librariesare permitted to photocopy
`beyond the limits of the U.S, Copyright Law for private use at patrons: ti] those post-19??
`articles that carry a code at the bottom of the first page. provider] the per-copy tee indicator!
`in the code is paid lhrousit the Copyright Clearanre Center. 2‘) Congress Street. Salem. MA
`I] will); IZI pred‘ifll articles without fee. Instructors are permitted to photocopy isolated arti—
`cles ior noncominerciai classroom use without t‘ee. For all other copying. reprint or repubv
`iicatinn permission. u'rlle to Copyrights and Permissions Department. IEEIE Publishing Ser-
`vices. 343 East 47th Street. New York. NY TDD Ill—2394. Copyright E 195? by The Institute oi
`Electrical and Electronics Engineers. Inc. All rights reserved. Printed in U-S.r\. Second-class
`postage paid at New York. NY and at additional mailing offices. Postmaster: Send address
`outfit-415:1.
`changes to PROCEEDINGS OF THE lEEE. IEEL’ Setvicl’ Center. 445 Hoes Lane. Piscataway, NJ
`
`Headquarters.
`Advertising correspondence should he addressed to the Advertising Department at
`
`iEEE
`
`ofinstrtrctionslm
`Manuscripts should be submitted in triplicate lo the Editor at iEEI.’ Headquarters. A summer-,-
`preparation is found in the most recent tan uaryissue oilhis in urnal. Detailed
`instructions are contained in "Information for IEEEAttthors," available on i‘L‘thr-SI. See note
`at beginning at “l'rDCc-crlinss Letters
`' for special instructions for this section. Alter a mark
`inscripl has been accepted for publication, the author's organization will be requester! to pay
`a voluntary charge oi Stilt per printed page to rover pat! oi IlII' publication cost. Respon-
`sibility tor contents of papers rests upon the authors and not the IEtE or its members.
`Copyright: It
`is th
`publishes on beh
`e. policy at the IEEE to own IllE copyright to the technical t‘LInIfilJUlIfli’lS it
`the appropriate r
`aliol' the interests of the IEEE. its authors and employers. and to facilitate
`AnnualSuhscriplion: Member and nonmemher subscription pricesavailable on request. SinA
`authors are rcqui
`euse at this material by others. To comply with the US. Copyright Law.
`gle Copies: IEEE members SIOJJD tfirst copy only]. nonmembers 520.00 per copy. tNute: Add
`red to sign an IEEE copyright transfer torn: heiore publication. This term.
`EWIJJIJ for postage and handling charge to any order from $1.00 to $50.00. including prepaid
`a copy oiwhirli is tound in the most recent January issue of this journal. returns lo authors
`orders.) Other: Available in microfiche and microfilm. Change of address must be received
`and their employers full rights to reuse their material for their own purposes. Authors must
`submit a signed copy of this form with their manuscripts.
`
`CONTRIBUTED Papers
`The PROCEEDJNGSDFTHE tEEEwelcornes iorconsideration technical
`the topic and
`papers on topics of broad significance and long-range interest in
`posed paper, how it meets the above criteria. an outline oi the pro-
`all areas of electrical, electronics, and computer engineering. In-
`and a brief biography showing the author‘s quali—
`depth reviews and lutorials are particularly appropriate, although
`fications to write the paper {including reference to previously pub-
`lished material as well as the author’s relation to the to
`pic). II' the
`proposal receives a favorable review.
`the author will be encour~
`aged to prepare the paper. which after submittal will go through
`the normal review proeess.
`Please send papers and propOsals to the Technical Editor.
`PRDCEEDINGS or THE IEEE, 345 East 47th Street, New York, NY 10017-
`2394. USA (Telephone: 212-705-7906}.
`
`
`Petitioner Microsoft Corporation - Ex. 1010, Cover 3
`etitioner Microsoft Corporation - Ex. 1010, Cover 3
`P
`_
`________________________________
`
`

`

`Data-Driven Multicomputers in Digital
`Signal Processing
`
`
`
`JEAN-LUC GAUDIOT, MEMBER,
`
`lEEE
`
`New technologies of integration allow the design of powerful
`systems which may include several thousands of elementary pro—
`cessors. These multiprocesSors may be used (or a range of appli-
`cations in signal and data processing. However, assuring the proper
`interaction of a large number of processors and the ultimate safe
`execution of the user programs presents a crucial scheduling prob-
`lem. The scheduling of operations upon the availability of their
`operands has been termed the data-driven mode of execution and
`offers an elegant solution to the issue. This approach is described
`in this paper and several architectures which have been proposed
`or implemented (systolic arrays, data-flow machines, etc.) are
`examined in detail. The problems associated with data-driven exev
`cation are also studied. A multi-level approach to high-speed dig-
`ital signal processing is then evaluated.
`
`l.
`
`lNTRODUCTION
`
`if we are to approach the computational throughputs
`equivalent to billions of instructions per second which will
`be required from the processing systems of the future,
`improvements on all levels of computer design must be
`made. Fastertechnologv and better packaging methods can
`be applied to raise clock rates. However, a one billion
`instructions per second machine would require a clock
`period as low as a nanosecond. This approach is inevitably
`bounded by physical limits such as the speed of light.
`Therefore,
`instead of considering the technological
`approach to performance improvement, we emphasize
`here the architectural method. indeed, instead of merely
`increasing the clock frequency for a corresponding increase
`in overall throughput, performance can also be improved
`by allowing multiple processing elements to collaborate on
`the same program. This inevitably introduces synchroniz—
`ation problems, and issues of resource allocation and shar-
`ing must be solved. Programmability is indeed the central
`problem. In one solution, a conventional language such as
`Fortran is used to program the application. A' sophisticated
`compiler is relied upon to partition a sequential program
`for execution on a multiprocessor. This approach has the
`
`Manuscript received September rt, 1986,- revised JanuaryI 23, 1987.
`This work was supported in part bvthe Department olEnergy under
`Grant DE-FGO3-HFER 25043. The views expressed in this paper are
`not necessarily endorsed by the U.S. Department of Energy.
`The author is with the Computer Research Institute, Department
`of Electrical Engineering—Systems, University of Southern Cali-
`iornia, Los Angeles, CA 90059, USA.
`iEEE Log Number 8716208.
`
`advantage of imposing no “software retooling.” However,
`complex numerical applications will not be easily parti-
`tioned and much potential parallelism may remain unde~
`tected by the compiler.
`Ada, CSP [26}, extended Fortran le.g., HEP, Sequent], on
`the other hand, allow the programmer to deal with parallel
`processes by the use of primitives for parallel task spawn—
`ing, synchronization, and message passing. However, while
`the'programmer can express some of the parallelism char-
`acteristic of the application, much potential concurrency
`may never be uncovered because of the inherent sequen-
`tial concepts of the language which must be countered
`through the use of Special ”parallelism spawning" instruc-
`tions.Also,developmenttimebecomesimportantsincethe
`programmer must ”juggle“ with many parallel tasks to svn~
`chronize.
`in addition, debugging becomes correspond—
`ingly more difficult clue to the sometimes undeterministic
`appearance of errors.
`For these reasons, an implicitapproach must be devised.
`In the above two methods, instruction scheduling is based
`upon a central program counter. We propose to demon—
`strate here the data-driven approach to programming mul-
`tiprocessors: instructions can be scheduled by the avail:
`ability oftheiroperands. This model of execution is a subset
`of the functional model of execution [91. it provides a sig—
`nificant improvement to the programmabilitv of multipro-
`cessors by excluding the notion of global state and intro-
`ducing the notion of values applied to functions instead of
`instructions fetching the contents of memory cells as they
`are in the conventional ”control-flow” model.
`The overall objective of this paper is to demonstrate the
`applicability of data-driven principles of execution to the
`design of high-performance signal and data processing
`architectures. Several approaches will be demonstrated and
`their particular domain of application will be contrasted.
`The description of low—level processing systems is beyond
`the scope ofthis paper and the interested reader is referred
`to an excellent survey by Allen {3]. instead, we will con-
`centrate here on the i55ues related to building l1igl1~per—
`formance multiprocessors for signal processing applica—
`tions.
`in Section ll, we show the type of problems
`considered in signal processing. The data-flow principles
`of execution as they relate to digital signal processing prob-
`lems are described in detail in Section ill while several exist-
`
`lill|B-9219l3?l09Dfl-122l15lli.DD '1' 19B? lEEE
`
`'l22fl
`
`PROCEEDlNCS OF THE IEEE. VOL. 75, NO. 9, SEPTEMBER 1983’
`
`Petitioner Microsoft Corporation - Ex. 1010, p. 1220
`Petitioner Microsoft C0 oration - Ex. 1010, p. 1220
`
`

`

`int: data-driven architectures are described in Section N.
`In Section V, we analyze a multi~level data~driven archi-
`nit-lure and examine its programming environment. Con~
`t'lusions are drawn in Section VI.
`
`ll. Tut. Risoumrxttms or SIGNAL Pnocsssrno
`
`Digital signal processing techniques are applied to many
`different technical problems. These include radar and sonar
`systems, image processing, speech recognition. etc. The
`elementary building blocks of these were originally con—
`wrnrated on such tasks as convolution, correlation, and
`Fourier transform. More complex algorithms (matrix open
`ations, linear systems solvers, etc.) are now considered.
`Higher order operations include not only simple problems
`such as elementary filtering (llR, FlR, etc.], but also more
`complex functions such as adaptive and Kalman filtering
`[45]. Also, such complex problems as Computer—Aided
`Tomography or Synthetic Aperture Radar can be consid-
`ered (391, [16]. Signal processing algorithms are very approv
`priale for description by functional languages. Indeed, a
`signal processing algorithm is often represented in a graph
`form [36] and can be decomposed in two levels:
`
`1
`
`-
`
`a regular level which can be implemented by a vector
`operation tie, a loop in which all iterations present no
`dependencies among themselves);
`a level which contains conditional operations and heu-
`ristic decision making.
`
`this description shows that the lower operational levels
`can easily deliver parallelism (by compiler analysis or pro-
`grammer inspection}. This layer usually consists of simple
`constructs [arithmetic instructions, FFT butterfly networks,
`simple filters, etc}. However, the higher levels will require
`more complex problem insight and even runtime depen—
`dency detection in orderto allow maximum parallelism. We
`will now describe principles of execution which will allow
`us to deliver this concurrency.
`
`III. DATA-FLOW PRINCIPLES
`
`The data-flow solution to the programmability problems
`of large-scale multipmcessors [5] has been pioneered by
`Adams [2], Chamberlin [l1], and Rodriguez [43}. {t is now
`described in detail in this section.
`
`A, Basic Principles of Execution
`
`in the conventional von Neumann model of execution,
`an instruction is declared executable when a Program
`Counter of the machine points to it. This event is usually
`underdirectprogrammercontrol.Whileacontrol—flowpro-
`gram is a sequential listing of instructions, a data-flow pro—
`gram can be represented as a graph where the nodes are
`the instructions (actors) which communicate with other
`nodes over arcs (Fig. 1). An instruction is declared execut~
`able when it has all its operands. In the graph represen—
`tation chosen above, this means that all the input arcs to
`an actor must carry data values (referred to as tokens) before
`this actor can be executed. Execution proceeds by first
`absorbing the input tokens, processing the input values
`according to the 0p. code ofthe actor, and accordingly pro-
`ducing result tokens on the output arcs. in summary, it can
`
`C-AUUlfJT: oATA-DRNEN .x-tutricompurtas'in ose,
`
`
`
`Fig. l. A simple data—flow graph.
`
`be said that the data-flow model of execution obeys two
`fundamental principles:
`
`' Asynchrony of operations: The executability of an
`instruction is decided by a local criterion only. The pres-
`ence of the operands can be sensed ”locally” by each
`instruction. This is an attractive property for an implemen»
`tation in a distributed environment where no central con—
`
`troller should be used for global scheduling.
`- Functionality of the operations: The effect of each
`operation is limited to the production of results to be con-
`sumed byaspecific number ofother actors. This precludes
`the existence of ”side-effects." These side-effects may be
`long—ranging in that the execution of an instruction may
`effect the state of a cell of memory which will be used only
`much later by another unrelated operation.
`
`B. Data-Flow interpreters
`
`When iterations are executed, the underlying principle
`of data-flow (single assignmen l ofvariables) must invariably
`be violated. Indeed, for an actor to be repeatedly evaluated
`as in an iteration, its input arcs must carry several tokens
`(from different iterations). Several solutions have been pro—
`posed to allow the controlled violation of these rules with—
`out compromising the safe execution of
`the program.
`Among these,
`the Acknowledgment scheme and the
`U—interpreter have been given the most consideration.
`1‘) Acknowledgment Scheme I'M].- Proper matching of the
`tokens can be observed by ordering the token production.
`Thiswould be done byacareful design ofthe program graph
`so as to insure that tokens of two different iterations can
`
`never overtake each other. In addition, it must be guar-
`anteed that no token pileup is encountered on any one arc.
`This condition can be verified by allowing the firing of an
`actor when tokens are on all input arcs and there are no
`tokens on any output arcs. In order to enforce this last conv
`dition, an acknowledgment must be sent by the succes-
`sortsl to the predecessor when the token has been conv
`sumed (Fig. 2). Note that an actor is executable when it has
`received its input arguments as well as all acknOwIedg—
`ments. The parallelism which can be exploited from this
`scheme is mostly pipeli ning between the actors of different
`iterations. Thus when the number of instructions in the
`
`body of an iteration is the same as the number of available
`processors, the speedup observed by this mechanism of
`execution is maximal. However, for small iterations (com-
`
`pared to the size of the machine), the exploited parallelism
`
`Petitioner Microsoft Corporation - Ex. 1010, p. 1221
`Petitioner Microsoft Corporation - Ex. 1010, p. 1221
`_.__________________________________
`
`

`

`‘ACK‘
`
`Fig. 2. The acknowledgment scheme.
`
`
`W}
`
`
`
`Ir':
`
`(UJnterpreter): The
`lnterpreter
`U—interpreter [4] provides the most asynchronous possible
`operation. In order to allow safe execution of actors in an
`iterative construct, tokens are tagged with information per-
`
`be found. This tag includes the iteration number. indeed,
`the U-interpreter closely follows these principles: to each
`data token is attached a tag of the form u.P.s.l, where Piden-
`tifies the procedure name of the destination actor, while 3
`
`graph. The function of this actor is to create a new context
`for the execution of the iteration: the input tokens are
`tagged by u.P.s.l while the output tokens are identical but
`are tagged with u'.P.t’.t where u’ is itselfu.l. Note that this
`mechanism is sufficient to create an entirely different set
`of tokens fortwo nested iterations. lndeed, assume that two
`tokens both belong to the same inner iteration l‘but belong
`to the f, and,"2 outer iterations respectively. The first token
`
`'[Tl‘z
`
`Fig. 3. A typical iterative construct in the U-interpreter.
`
`would be tagged u1.P.s.itu1 = ml.) while the second is
`tagged with u2.P.s.l(u2 = uJZ}. This shows that an appro—
`priate differentiation has been made between the two
`instances. The originalcontext u is retrieved bythe l ' ' actor
`before exiting.
`Contrarily to the acknowledgment scheme, this dynamic
`data-flow scheme allows full asynchronous execution of the
`
`without compiler-inducer! replication of the graph. Like-
`wise, multiple function calls and more particularly recur—
`sions are allowed since each newactor instantiation receives
`a different tag. This means that the U-interpreter would be
`
`expense of added hardware complexity. lndeed, it will be
`shown in Section lV-D that implementation of the U-inter—
`
`ESL DDSP [28], the University of Manchester machine [23],
`the ETL Sigma-1 [25], etc.
`
`C. Structure Handling
`This is a crucial issue in signal processing for this kind
`of application requires that many data elements which
`belong to the same structure be processed in a parallel or
`pipelined fashion. One of the basic premises of data—flow
`principles states that an output is a function of its inputs
`only, regardless of the state of the machine at the time of
`
`several states. Instead, it any updates are needed, a new
`array which contains the new elements must be created.
`Copying of all elements must be undertaken for the mod-
`ification of a single one. This solution imposes an inordi—
`
`F‘ROCEEDINCS OF THE lEEE. VOL, 75. NO. 9, SEPTEMBER 1937
`
`Petitioner Microsoft Corporation - Ex. 1010, p. 1222
`P
`etitioner Microsoft Corporation - Ex. 1010, p. 1222
`__________________________—__—__——
`
`

`

`preserving lhemeaning ofthe program during array update
`operations»
`y)
`l-ieaps.‘ Dennis [14] has proposed 10 represent arrays
`by directed acyclic graphs (also called heaps). Each value
`is represented as the leaf of a graph tree. The modification
`of a single element in a heap is represented in Fig. 4. Note
`
`A
`
`a"
`
`L:
`Li
`Fig. 4. A heap update.
`
`1.3
`
`2.!
`
`2.2
`
`23
`
`'3?
`
`that the complexity of the modification of asingle element
`of the array '5 0th for 3 COPY operation, While 't
`*5
`Otlog ”l tor the heap. Several instructions are EXClUS'VElY
`devoted to the access of heaps [15]: SELECT receives a
`pointer to the root node, an index value, and returns a
`pointerto the substructure (which may bea leaf node) asso—
`ciated with the index; APPEND also needs the same two
`operands in addition to the value of the element to append
`to the structure.
`2} i-Structures: A heap must be entirely ready before it
`can beconsumed because noconsumption (SELECT actors)
`can take place until the pointer token appears ii.e., the cre-
`ation of the array is completed). In the I-structure scheme
`[7] constraints on the creation of arrays allow the selection
`of individual elements (or substructures) from the array
`before its complete production. One possible implemen—
`tation of l-structures makes use of a “presence” bit which
`indicates when an element of an array has been calculated
`and is ready for consumption. An attempt to read an empty
`cell would cause the read to be deferred until such time that
`the cell presence bit is marked. Conversely, a write into a
`cell, the presence bit of which indicates valid stored data,
`could be cause for the generation of an error signal. The
`advantages of this scheme are:
`' better performance because pipelining is allowed
`between i—structu re consumers and producers;
`'
`less "serialization” of operations such as APPENDs,
`because they are allowed to occur independently on
`the same structure.
`
`3) HDFMArrays:A special schemefor handling arrays in
`a VAL high-level environment has been designed for the
`Hughes Data—Flow MachinetHDFM)[2‘l]. ltusesthefactthat
`data-flow arrays as described above are overly ”asynchro-
`nous," i.e., they do not take advantage of the data depen~
`clency information carried by the program graph. Safety of
`accesses is respected by not allowing the updating of an
`array before all the reads from the current version of the
`array have been performed. Only then can the array be
`directly modified. Safety and correct execution of WRITE
`operations are a compile~time task. This has the advantage
`of reducing the number of memory accesses too complex
`graph of pointers must be travered as in heaps) as well as
`of offeringa better possibility ofdistribution ofan arraytno
`root node). However, spurious data dependencies may be
`introduced because the compiler is not necessarily aware
`
`of the possibility of parallelism that can be detected only
`at runlinie. For instance, dependencies: on A and 8 related
`by rltFtiii = b‘iil may be artificially imposed. However, the
`applications largettecl by the HDFM include some amount
`of regularity which can be easily detected by the compiler
`and implemented as conventional arrays.
`4} Token Reiabeiing {18}: in the U-lnterpreter, the notion
`of array can be entirely ignored at the lowest level of exe-
`cution. instead, the tag associated with each token under
`the rules of the U-interpretation is used as identification of
`the index of the arrayeiement of the high-level language.
`In other Words, it can be simply said that, when an array A
`is created, its Ail) element will be tagged with i thereafter
`denoted/40h”), if the elements are produced in the logical
`order. in the ”production” of a unidimensional array, the
`iteration number can usually be directly interpreted as the
`index of the array element just produced by the iteration.
`Special token relabeling program graphs can be created
`to handle scatter and gather program constructs [27] (Fig.
`5(a)]. This figure shows that an inversion function F—I
`
`Allllllh ]
`}
`”4‘, m ”9
`‘ All!
`gm,”
`l‘""”
`
`M
`
`3mm
`\\
`
`lCtti
`l‘l
`no: 1&qu
`cm == um + .-ifF(l’)_i
`is)
`
`2
`
`'{n-ti'I'F'lit-‘ei
`'
`‘5.
`
`NI” ‘
`-’ ”j
`
`it"
`:3inme
`“'1ng A,”
`E) “4"”
`_r
`i
`{ll [it
`(bi
`ram gather operation. (b) Token Relabeling gather.
`
`H
`
`Fig. 5.
`
`This demonstrates that, without recourse to the calculation
`of F", the proper relabeling of the A elements has been
`effectively produced.
`This algorithm requires no intermediary storage, does
`not need array operations, and imposes smaller hardware
`and execution overhead. This relabeling approach elimi-
`nates a large portion of the overhead associated with the
`production and consumption ofarrayA. Pipelining between
`the source and the sink of a data structure is the goal of this
`unknown at compiletime would be needed to perform the
`relabeling of data-flow tokens. Such acalculation is nottruly
`necessary. Instead, we introduce (Fig. 5(bi)asequence gen-
`erator'vvhich producesthe Fiji's, tagged byfinn exchanger
`actor [called xi swaps the tag and the data value and pro—
`duces ,imm. Both streams (the A’s and finial are input to a
`special relabeling actorfi which only modifies the iteration
`portion of the tag. By the principles of the U—interpreter,
`only tokens which bear the same tag will be matched as
`proper inputs to the 5 actor. In other words, the mate of
`
`GAUDiO‘I': DATA-DRIVEN h-lULTlCOMPUT'EltS IN use
`
`1223
`
`Petitioner Microsoft Corporation - Ex. 1010, p. 1223
`Petitioner Microsoft Corporation - Ex. 1010, p. 1223
`—\
`
`

`

`their domain of application
`to design or compile time and
`refer to them as data—driven systenis._\fife thus init
`ially exam-
`ine multiprocessor systems where dat
`; a dependencies have
`been I'rozen at design time (systolic arrays}. We then con»
`sider programmable
`systolic arrays (the Wavefront Array
`Processor) and multiprocessors scheduled at c
`ompiie time
`by the use of data~flow program graphs (the ESL polycyclic
`processor). Finally, we study systems where th
`dencies provide scheduling information at
`Hughes Data-Flow Machine) and examine the '
`
`cessing problems, matrix operations, image processi
`In summary, a systolic array is simply a collection of inter-
`connected Processing
`Elements tPEs).
`porate as many processors as possible, t
`
`_ W ”rm”. me special actor f: is
`u ieiaoelmg actor which takes the A input and relabels it
`with the data carried by the token on the other input an".
`in other words, it outputs Ati‘tr'i)-,. Since i is a dummy van
`fable, and since Fis bijective, it can be said that, on a global
`
`point of view:
`
`AiFillis.
`*3 Alkljr lat-it-
`scheme, just as it was the idea behind the design of the
`l-structures. Hottvever, thetoken relabeling approach brings
`a better runtime memory management since toltens cor-
`_
`tents of the array still exist
`and must still be temporarily stored, they need not go
`through an additional storage as a data structure. Also, there
`is no need for "requests” for data as would be the case in
`an l~structure environment: When an I-structure is created,
`actors which need data from it must demand the element
`in question until it has arrived. This may introduce heavy
`overheads as unsatisfied requests must be queued in the
`structure itself. Garbage collection is automatically han-
`dled since when the “array token" is thatched, it
`is auto—
`matically removed from the arc. In other words, when it has
`been used, it is swallowed byapplying data-flow principles.
`U. High-Level Data-Flow Languages
`ln addition to the low~levei mechanisms of execution
`which were described earlier, special high—level data-flow
`languages have been designed for easier translation into
`data—[lowgraphs. Tobe sure. these high—level languages are
`not a necessity: the Texas lnstruments data—flow project [.31]
`upon Fortran programming through the use of a
`modified version of a vectorizing compi
`
`are VAL {Value Algorithmic Languages) for the MIT static
`data-flow project [37], [‘t], id llrvine Datailow} for the AMT
`tagged token data-flow architecture [4], LUCID [8], [30], etc.
`SISAL (Streams and Iterations in Single Assignment Lan—
`guage) has been designer! by McGraw and Skedzieiewski
`[38] and is intended as the definition of a "universal” lan~
`guage for the programming of future multiprocessors.
`Data—flow languages have also been defined for the spe—
`cific purpose of programming signal processing applica—
`tions. These include the SlGNAL language designed by
`Le Cuernic et all. [36]. The intent of the language is to pro—
`vide a tormal specrttcatton of signal processing problems
`
`
`
`Fig. 6. A linear systolic array.
`
`opposed to asynchronous languages such as CSP and
`Occam[101.5DFt5ynchronous Data Flow) is another formal
`description of signal processing algorithms based on data-
`driven principles of execution proposed by Lee and Mes-
`
`serschmitt [35].
`
`DATAvDR

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket