`
`PATENT APPLICATION
`TRANSMITTAL
`{Only {or new nenpmvisional applications under
`3'." CFR 1 .53ib11
`
`First Inventor
`
`Title
`
`Jon M. Huppenthal et at.
`
`"'I’iitlJLTl-ADAPTNE PROCESSING
`srsrems AND TECHNIQUES FOR
`ENHANClNG PARALLELiSM AND
`
`PERFORMANCE OF COMPUTATIONafiL.
`FUNCTIONS
`
`_
`
`llllllllllllllllllllllllllll2.},
`
`APPLICATION ELEMENTS
`
`Fee Transmittal Form
`[submit an original and a duplicate for tee processing}
`Applicant claims small entity status.
`See 3? CFR 1.27
`total pages _ ‘34__
`Specification [
`[preferred Arrangemenl set Ionh below]
`— Descriptive title ot the Invention
`— Cross References to Related Applications
`- Statement Regarding Fed sponsored Raid]
`- Reference to sequence listing. a table, or a
`computer program listing appendix
`. Background of the invention
`- Brief Summary of the invention
`- Brief Description of the Drawings
`— Detailed Description
`- Claimts)
`- Abstract of the Disciosure
`
`4.
`5.
`
`IE Drawing(s)[ total Sheets 20 ]
`Oath or Declaration [
`total pages _,3__
`lZl UNEXECUTED [original or copy)
`a.
`13. 1:] COpy from prior appl. (3? C.F.R. §1.63{d)}
`{for continuationfdlvisionnl with Box 13 completed)
`i. DDELETION OF INVENTOELSJ
`Signed statement attached deleting
`inventorts] named in prior application,
`see a? c.r-.R. §§ t 53mg; andt.33(bj.
`
`Assistant Commissioner for Patents
`Box Patent Application
`Washington. DC 20231
`
`G. I: Application Data Sheet. {See 3? CFR 1,?6]
`
`i'. E] CID—ROM or CD~R in duplicate, large table
`or Computer Program (Appendix)
`8. Nucleotide andior Amino Acid Sequence Submission
`(if applicable, all necessary)
`Computer Readable Forrn
`
`Spcciiication Sequence Listing on:
`i.
`[:l (ID—ROM or CD-R (2 copies): or
`ii. [:I paper
`Statements verifying identity of above copies
`ACCOMPANYING APPLICATION PARTS
`9. E] Assignment Papers (coversheefldocumenttsll
`10_ [:1
`3? CFR. 3.73m) Statement
`E]
`iner of
`{when there is an assignee)
`Attorney
`English Translation Docu merit
`
`iDS & Form PTOJSBEOSA
`
`CI Copies 01 IDS
`Citations
`
`11D
`12.13
`
` 10/31/02
`
`Preliminary Amendment
`13. El
`14.
`[2] Return Receipt Postcard (MPEP 503)
`15. 1:] Certified Copy ol Priority Documentls)
`16.
`[:1 Nonpublication Request Under 35 USC
`122(h}(2)(B)(i).Applicant must attach form
`PTOESBI'BS
`Other: Certificate of Mailin-
`
`1?.
`
`If a CONTINUING APPLICATION. check appropriate box. and supply the requisite information below and in a
`18.
`preliminary amendment, or in an Application Data Sheet under 37 CFR 1.76:
`
`{:1 Continuation El Divisional El Continuation-in-parHClP)
`Prior application information:
`Examiner:
`
`of prior apptication No.: _ i
`Group/Arr Um};
`59LCON—lawn”““79" QB_DIVlSlDNfl..APP§ only. The enllre disclosure of the [MIDI applicalinn_ lrorn which an mm or ueuarairon Is supplied under Box 5h. ‘5 CDI’ISJdQKEt'J a
`part or Inn disclosure at the occonmanying :mt-nuatiun rir dwisional application and is hereby inc
`Gilt-“Hated by relercnce The incorporat-on gt)" only he relied upon when a
`- w nor: has been inadvertent! onwtted Form the suornilleu a treatise one.
`
`Customer Number or Bar Code Label 25235
`
`or [I Correspondence addeS below
`
`19. CORRESPONDENCE ADDRESS
`
`
`
`
`
`\'\\(_,’5 - SD-It'l-INIDIS 56592 vl
`
`Petitioner Microsoft Corporation - Ex. 1002, p. 1
`
`Petitioner Microsoft Corporation - EX. 1002, p. 1
`
`
`
` .
`
`PATENT
`EXPRESS MAIL NO. EV035493558US
`Attorney Docket No. SR0015
`Clienthatter No. 804040018
`IN THE UNITED STATES PATENT AND TRADEMARK OFFICE
`
`In re Application of:
`
`Jon M. Huppcnthal and David E. Caliga
`
`Serial No. NEW
`
`Filed: Herewith
`
`For: MULTI—ADAPTIVE PROCESSING
`
`SYSTEMS AND TECHNIQUES FOR
`ENHANCING PARALLELISM AND
`PERFORMANCE OF
`COMPUTATIONAL FUNCTIONS
`
`
`
`Box PATENT APPLICATION
`Assistant Commissioner for Patents
`
`Washington, DC. 20231
`
`Sir:
`
`msesww
`
`The undersigned hereby certifies that the following documents:
`Utility Patent Application Transmittal;
`Utility Patent Application;
`Unexecuted Declaration for Utility Patent Application;
`20 sheets of drawings;
`Return postcard; and
`Certificate of Mailing By Express Mail
`.
`relating to the above application, were deposited as "Express Mail", Mailing Label
`No. EVO35493558US, with the United States Postal Service, addressed to Box PATENT
`AgPL5CATION, The Assistant Commissioner for Patents, Washington, D.C., 20231,
`i
`gflhg 7032- .
`
`
`
`bl O$fiw ¢mL
`Date
`
`Date
`
` _
`434’ @zo/icw 2,001;
`
`F 2..
`
`. _‘_‘. 4'
`
`ubida, Reg. No. 29,664
`William J.
`HOGAN & HARTSON LL?
`One Tabor Center
`
`1200 17th Street, Suite 1500
`Denver. Colorado 80202
`(719) 448-5909 Tel
`(303) 899—7333 Fax
`
`\\\C5 - 8040410018 . 55552 v]
`
`
`
`Petitioner Microsoft Corporation - Ex. 1002, p. 2
`
`Petitioner Microsoft Corporation - EX. 1002, p. 2
`
`
`
`a? 5'” -,;..=
`"
`.iwlzfii iii-I... 4...:.
`"
`
`
`Attorney Docket No. SRC015
`Client NO. 80404.0018
`Express Mail No. EVO3549355805
`
`PATENT
`
`MULTIwADAPTIVE PROCESSING SYSTEMS AND TECHNIQUES FOR
`ENHANCING PARALLELISM AND PERFORMANCE OF COMPUTATIONAL
`FUNCTIONS
`
`CROSS REFERENCE TO RELATED PATENT APPLICATIONS
`
`The present
`
`invention is related to the subject
`
`matter of United States Patent Application Ser. No.
`
`09/755,744 filed January 5, 2001 for: “Multiprocessor
`
`Computer Architecture Incorporating a Plurality of
`
`Memory Algorithm Processors in the Memory Subsystem”
`
`and is further related to the subject matter of United
`
`States Patent No. 6,454,687 for: “System and Method
`
`for Accelerating Web Site Access and Processing
`
`Utilizing a Computer System Incorporating
`
`Reconfigurable Processors Operating Under a Single
`
`Operating System Image", all of which are assigned to
`
`SRC Computers,
`
`Inc., Colorado Springs, Colorado and
`
`the disclosures of which are herein specifically
`
`incorporated in their entirety by this reference-
`
`COPYRIGHT NOTICE/PERMISSION
`
`A portion of the disclosure of this patent
`
`document may contain material which is subject to
`
`copyright protection.
`
`The copyright owner has no
`
`objection to the facsimile reproduction by anyone of
`
`the patent document or the patent disclosure as it
`
`appears in the United States Patent and Trademark
`
`Office patent file or records, but otherwise, reserves
`
`all copyright rights whatsoever.
`
`The following notice
`
`applies to the software and data and described below,
`
`inclusive of the drawing figures where applicable:
`
`Copyright © 2000, SRC Computers,
`
`Inc.
`
`10
`
`15
`
`20
`
`25
`
`\\\C5 - 8040610018 - Self-E v1.
`
`Petitioner Microsoft Corporation - Ex. 1002, p. 3
`
`Petitioner Microsoft Corporation - EX. 1002, p. 3
`
`
`
`m aa
`
`BACKGROUND OF THE INVENTION
`
`The present invention relates,
`
`in general,
`
`to the
`
`field of computing systems and techniques. More
`
`particularly,
`
`the present invention relates to multi—
`
`adaptive processing systems and techniques for
`
`enhancing parallelism and performance of computational
`functions.
`
`Currently, most
`
`large software applications
`
`achieve high performance operation through the use of
`
`parallel processing. This technique allows multiple
`
`processors to work simultaneously on the same problem
`to achieve a solution in a fraction of the time
`
`required for a single processor to accomplish the same
`
`result.
`
`The processors in use may be performing many
`
`copies of the same operation, or may be performing
`
`totally different operations, but
`
`in either case all
`
`prooessors are working simultaneously.
`
`The use of such parallel processing has led to
`
`the proliferation of both multi—processor boards and
`
`large scale clustered systems. However, as more and
`
`more performance is required,
`
`so is more paralleliSm,
`
`resulting in ever larger systems. Clusters exist
`
`today that have tens of thousands of processors and
`
`can occupy football fields of space.
`
`Systems of such
`
`a large physical size present many obvious downsides,
`
`including,
`
`among other factors, facility requirements,
`
`power, heat generation and reliability.
`SUMMARY OF THE INVENTION
`
`However,
`
`if a processor technology could be
`
`employed that offers orders of magnitude more
`
`parallelism per processor,
`
`these systems could be
`
`reduced in size by a comparable factor-
`
`Such a
`
`processor or processing element is possible through
`
`10
`
`15
`
`20
`
`25
`
`30
`
`\\\Cfi
`
`* BOiO‘fUtIIH - 56156 V1
`
`Petitioner Microsoft Corporation - Ex. 1002, p. 4
`
`Petitioner Microsoft Corporation - EX. 1002, p. 4
`
`
`
`:3;
`'Er‘v- '13-: £3.11; aft: 12? W7.
`=4?
`mam-ml no... min 40“ Wu -.u.--4...:~
`
`_
`
`a»
`
`'_ -
`
`
`the use of a reconfigurable processor. Reconfigurable
`
`processors instantiate only the functional units
`
`needed to solve a particular application, and as a
`
`result, have available space to instantiate as many
`
`functional units as may be required to solve the
`
`problem up to the total capacity of the integrated
`
`circuit chips they employ.
`
`At present,
`
`reconfigurable processors, such as
`
`multi—adaptive processor elements {MAPm, a trademark
`
`of SRC Computers,
`
`Inc.) can achieve two to three
`
`orders of magnitude more parallelism and performance
`
`than state—of—the—art microprocessors.
`
`Through the
`
`advantageous application of adaptive processing
`
`techniques as disclosed herein,
`
`this type of
`
`reconfigurable processing parallelism may be employed
`
`in a variety of applications resulting in
`
`significantly higher performance than that which can
`
`now be achieved while using significantly smaller and
`
`less expensive computer systems.
`
`However,
`
`in addition to these benefits,
`
`there is
`
`an additional much less obvious one that can have even
`
`greater impact on certain applications and has only
`
`become available with the advent of multi—million gate
`
`reconfigurable chips. Performance gains are also
`
`realized by reconfigurable processors due to the much
`
`tighter coupling of the parallel functional units
`
`within each chip than can be accomplished in a
`
`microprocessor based computing system.
`
`In a multi—processor, microprocessor—based
`
`system, each processor is allocated but a relatively
`
`small portion of the total problem called a cell.
`
`However,
`
`to solve the total problem, results of one
`
`processor are often required by many adjacent cells
`
`because their cells interact at the boundary and
`
`10
`
`15
`
`20
`
`25
`
`30
`
`\\\C5 ‘- EOQDGIDOIB — 56166 \rl
`
`Petitioner Microsoft Corporation - Ex. 1002, p. 5
`
`Petitioner Microsoft Corporation - EX. 1002, p. 5
`
`
`
`
`
`upwards of six or more cells, all having to interact
`
`to compute results, would not be uncommon.
`
`Consequently,
`
`intermediate results must be passed
`
`around the system in order to complete the computation
`
`of the total problem. This, of necessity,
`
`involves
`
`numerous other chips and busses that run at much
`
`slower speeds than the microprocessor thus resulting
`
`in system performance often many orders of magnitude
`
`lower than the raw computation time.
`
`On the other hand,
`
`in the use of an adaptive
`
`processor—based system, since ten to one thousand
`
`times more computations can be performed within a
`
`single chip, any boundary data that is shared between
`
`these functional units need never leave a single
`
`integrated circuit chip. Therefore, data moving
`
`around the system, and its impact on reducing overall
`
`system performance, can also be reduced by two or
`
`three orders of magnitude. This will allow both
`
`significant improvements in performance in certain
`
`applications as well as enabling certain applications
`
`to be performed in a practical timeframe that could
`
`not previously be accomplished.
`
`Particularly disclosed herein is a method for
`
`data processing in a reconfigurable computing system
`
`comprising a plurality of functional units.
`
`The
`
`method comprises: defining a calculation for the
`
`reconfigurable computing system;
`
`instantiating at
`
`least two of the functional units to perform the
`
`calculation; utilizing a first of the functional units
`
`to operate upon a subsequent data dimension of the
`
`calculation and substantially Concurrently utilizing a
`second of the functional units to operate upon a
`-
`previous data dimension of the calculation.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`\\\CS - BOdO-fllDOlB - 56166 It!
`
`Petitioner Microsoft Corporation - Ex. 1002, p. 6
`
`Petitioner Microsoft Corporation - EX. 1002, p. 6
`
`
`
`
`
`Further disclosed herein is a method for data
`
`processing in a reconfigurable computing system
`
`comprising a plurality of functional units. The method
`
`comprises: defining a first systolic wall comprising
`
`rows of cells forming a subset of the plurality of
`
`functional units; computing a value at each of the
`
`cells in at
`
`least a first row of the first systolic
`
`wall; communicating the values between cells in the
`
`first row of the cells to produce updated values;
`
`10
`
`communicating the updated values to a second row of
`
`the first systolic wall; and substantially
`
`concurrently providing the updated values to a first
`
`row of a second systolic wall of
`
`rows of cells in the
`
`subset of the plurality of functional units.
`Also disclosed herein is a method for data
`
`15
`
`processing in a reconfigurable processing system which
`
`includes setting up a systolic processing form
`
`employing a speculative processing strategy.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`2O
`
`25
`
`The aforementioned and other features and objects
`
`of the present invention and the manner of attaining
`
`them will become more apparent and the invention
`
`itself will be best understood by reference to the
`
`following de5cription of a preferred embodiment
`
`taken
`
`in conjunction with the accompanying drawings,
`wherein:
`
`Fig.
`
`l is a simplified functional block diagram
`
`of typical clustered intereprocessor communications
`
`path in a conventional multieprocessor computing
`
`30
`
`system;
`
`.Fig.
`
`2 is a functional block diagram of an
`
`adaptive processor communications path illustrating
`
`the many functional units {“FU”)
`
`interconnected by
`
`\\\(.’5 - 5040430015 - 56166 v'l
`
`Petitioner Microsoft Corporation - Ex. 1002, p. 7
`
`Petitioner Microsoft Corporation - EX. 1002, p. 7
`
`
`
`:
`
`EFF: f"!-
`
`reconfigurable routing resources within the adaptive
`processor chip;
`
`Fig.
`
`3A is a graph of the actual performance
`
`improvement versus the number of processors utilized
`
`and illustrating the deviation from perfect
`
`scalability of a particular application utilizing a
`
`conventional multi—processor computing system such as
`
`that illustrated in Fig. 1;
`
`Fig.
`
`3B is a corresponding graph of the actual
`
`performance improvement versus the number of
`
`processors utilized and illustrating the performance
`
`improvement over a conventional multi—processor
`
`computing system utilizing an adaptive processor—based
`
`computing system such as that illustrated in Fig. 2:
`
`Fig.
`
`4A is a simplified logic flowchart
`
`illustrating a conventional sequential processing
`
`operation in which nested Loops A and B are
`
`alternately active on different phases of the process;
`
`Fig. 48 is a comparative, simplified logic
`
`flowchart illustrating multi—dimensional processing in
`
`accordance with the technique of the present
`
`invention
`
`wherein multiple dimensions of data are processed by
`
`both Loops A and B such that the computing system
`
`logic is operative on every clock cycle;
`
`Fig.
`
`5A is illustrative of a general process for
`
`performing a representative multi—dimensional pipeline
`
`operation in the form of a seismic migration imaging
`
`function utilizing the parallelism available in the
`
`utilization of the adaptive processing techniques of
`
`the present
`
`invention;
`
`Fig. SE is a follow—on illustration of the
`
`computation phases emplOyed in implementing the
`
`exemplary seismic migration imaging function of the
`
`10
`
`15
`
`20
`
`25
`
`30
`
`preceding figure;
`
`\\\E5 . BuflndanIE - 56166 v]
`
`Petitioner Microsoft Corporation - Ex. 1002, p. 8
`
`Petitioner Microsoft Corporation - EX. 1002, p. 8
`
`
`
`
`
`Fig.
`6A is a simplified logic flowchart for a
`particular seismic migration imaging application
`
`illustrative of the parallelism provided in the use of
`
`an adaptive processor—based computing system;
`
`Fig 63 illustrates the computational process
`
`which may be employed by a microprocessor in the
`
`execution of the seismic imaging application of the
`
`preceding figure;
`
`10
`
`15
`
`20
`
`Fig.
`
`6C illustrates the first step in the
`
`computational process which may be employed by an
`
`adaptive processor in the execution of the seismic
`
`imaging application of Fig.
`
`6A in which a first shot
`
`(81)
`
`is started;
`
`Fig.
`
`6D illustrates the second step in the same
`
`Computational process for the execution of the seismic
`
`imaging application of Fig.
`
`6A in which a second shot
`
`(S2)
`
`is started;
`
`Fig.
`
`6E illustrates the third step in the same
`
`computational process for the execution of the seismic
`
`imaging application of Fig.
`
`6A in which the operation
`
`on the first and second shots is continued through
`
`compute;
`
`Fig.
`
`6F illustrates the fourth step in the same
`
`computational process showing the subsequent operation
`
`25
`
`on shots 81 and 52;
`
`Fig.
`
`6G illustrates the fifth step in the same
`
`computational process followed by the continued
`
`downward prepagation of shots Si and 32 over all of
`
`the depth slices;
`
`30
`
`Fig.
`
`7A illustrates a process for performing a
`
`representative systolic wavefront operation in the
`
`form of a reservoir simulation function also utilizing
`
`the parallelism available in the utilization of the
`
`\\\C5 - 30110420018 - 56166 \fl
`
`Petitioner Microsoft Corporation - Ex. 1002, p. 9
`
`Petitioner Microsoft Corporation - EX. 1002, p. 9
`
`
`
`adaptive processing techniques of the present
`invention;
`
`'Fig. 73 illustrates the general computation of
`
`fluid flow properties in the reservoir simulation of
`
`the preceding figure which are communicated to
`
`neighboring cells:
`
`Fig.
`
`7C illustrates the creation of a systolic
`
`wall of computation at Time Set
`
`1 which has been
`
`started for a vertical wall of cells and in which
`
`communication of values between adjacent rows in the
`
`vertical wall can OCCur without storing values to
`
`memory:
`
`Fig.
`
`7D is a follow on illustration of the
`
`creation of a systolic wall of computation at Time Set
`
`1 and Time Set 2 showing how a second vertical wall of
`
`cells is started after the computation for cells in
`
`the corresponding row of the first wall has been
`
`completed:
`
`Fig.
`
`8A illustrates yet another process for
`
`performing a representative systolic wavefront
`
`operation in the form of the systolic processing of
`
`bioinformatics also utilizing the parallelism
`
`available in the utilization of the adaptive
`
`processing techniques of the present
`
`invention;
`
`Fig. 88 illustrates a systolic wavefront
`
`processing operation which further incorporates a
`
`speculative processing strategy based upon an
`
`evaluation of the rate of Change of X3;
`
`Fig.
`
`8C is a further illustration of the systolic
`
`wavefront processing operation of the preceding figure
`
`incorporating speculative processing:
`
`Fig.
`
`9A illustrates still another process for
`
`performing a representative systolic wavefront
`
`operation in the form of structure codes calculating
`
`10
`
`15
`
`20
`
`25
`
`30
`
`\\\CS - aoqouoon - 56165 v1
`
`Petitioner Microsoft Corporation - Ex. 1002, p. 10
`
`Petitioner Microsoft Corporation - EX. 1002, p. 10
`
`
`
`
`
`-'ME. (“1'Pusan—aw" as“ ”in ,..1|...-.‘..§\w «ma-2mg. .9- =9.1_ ..
`
`
`
`
`
`
`
`
`
`:7-9.“w «3' 17343“:.4 .1 '44- MJ“ ‘W‘m
`
`
`
`polynomials at grid intersections, again utilizing the
`
`parallelism available in the utilization of the
`
`adaptive processing techniques of the present
`
`invention;
`
`Fig. QB illustrates the computation start for a
`
`vertical wall of grid points at Time Set 1 for a
`
`polynomial evaluation performed on grid intersections
`wherein calculations between rows are done in a
`
`stochastic fashion using values from a previous row;
`and
`
`10
`
`9C is a further illustration of the
`Fig.
`polynomial evaluation performed on grid intersections
`
`of the preceding figure wherein a second wall is
`
`started after the cells in the corresponding row of
`
`the first wall have been completed.
`
`DESCRIPTION OF A REPRESENTATIVE EMBODIMENT
`
`This application incorporates by reference the
`
`entire disclosure of Caliga, D. et a1- “Delivering
`Acceleration: “The Potential for Increased HPC
`
`Application Performance Using Reconfigurable Logic”,
`8C2001, November 2001Ir ACM 1—58113—293—X/01/0011.
`
`With reference now to Fig.
`
`l,
`
`a simplified
`
`functional block diagram of typical clustered inter—
`
`processor communications path in a conventional multi-
`
`processor computing system 100 is shown.
`
`The computer
`
`system comprises a number of memory and input/output
`
`(“I/O” controller integrated circuits {“ICs”) 1020
`
`15
`
`20
`
`25
`
`through 102”,
`
`(e.g. “North Bridge”) 102 such as the
`
`P4X333/P4X400 devices available from VIA Technologies,
`
`30
`
`Inc.;
`
`the M1647 device available from Acer Labs,
`
`Inc.
`
`and the 824430X device available from Intel
`
`Corporation.
`
`The North Bridge IC 102 is coupled by
`
`means of a Front Side Bus
`
`(“F8B”)
`
`to one or more
`
`i\\CS — nomamma — $6166 v1
`
`Petitioner Microsoft Corporation - Ex. 1002, p. 11
`
`Petitioner Microsoft Corporation - EX.1002, p. 11
`
`
`
`
`
`10
`
`microprocessors 10400 though 10403 and lOdNO through
`
`104“; such as one of the Pentium® series of processors
`
`also available frOm Intel Corporation.
`
`The North Bridge ICs 1020 through 102" are coupled
`
`to respective blocks of memory 1060 through 106“ as
`
`well as to a corresponding I/O bridge element 1080
`
`through 108“.
`
`A network interface card (“NIC”)
`
`llOQ
`
`through 210“ couples the I/O bus of the respective I/O
`
`bridge 1080 through lOBN to a cluster bus coupled to a
`
`common clustering hub (or Ethernet Switch) 112.
`
`Since typically a maximum of four microprocessors
`
`104, each with two or four functional units, can
`
`reside on a Single Front Side Bus, any communication
`
`to more than four must pass over the Front Side Bus,
`
`inter-bridge bus,
`
`input/output
`
`(“I/O”) bus, cluster
`
`interconnect {e.g. an Ethernet clustering hub 112} and
`
`then back again to the receiving processor 104.
`
`The
`
`1/0 bus is typically an order of magnitude lower in
`
`bandwidth than the Front Side Bus, which means that
`
`any processing involving more than the four processors
`
`104 will be significantly throttled by the loose
`
`coupling caused by the interconnect. All of this is
`
`eliminated with a reconfigurable processor having
`
`hundreds or thousands of functional units per
`
`10
`
`15
`
`20
`
`25
`
`processor-
`
`With reference additionally now to Fig. 2, a
`
`functional block diagram of an adaptive processor 200
`
`communications path for implementing the technique of
`
`the present
`
`invention is shown.
`
`The adaptive
`
`30
`
`processor 200 includes an adaptive processor chip 202
`
`incorporates a
`
`large number of functional units (“F0")
`
`204 interconnected by reconfigurable routing
`
`resources.
`
`The adaptive processor chip 202 is coupled
`
`to a memory element 206 as well as an interconnect 208
`
`\\\CS - 80404.!0018 - 56166 v1
`
`Petitioner Microsoft Corporation - Ex. 1002, p. 12
`
`Petitioner Microsoft Corporation - EX. 1002, p. 12
`
`
`
`11
`
`and a number of additional adaptive processor chips
`210.
`
`As shown, each adaptive processor chip 202 can
`contain thousands of functional units 204 dedicated to
`
`the particular problem at hand.
`
`Interconnect between
`
`these functional units is created by reconfigurable
`
`routing resources inside each chip 202- As a result,
`
`the functional units 204 can share or exchange data at
`
`much higher data rates and lower latencies than a
`
`standard microprocessor 104
`
`(Fig. 1).
`
`In addition,
`
`the adaptive processor chips 202 can connect directly
`
`to the inter—processor interconnect 208 and do not
`
`require the data to be passed through multiple chips
`
`in a chipset
`
`in order to communicate. This is because
`
`the adaptive processor can implement whatever kind of
`
`interface is needed to accomplish this connection.
`
`with reference additionally now to Fig. 3A,
`
`a
`
`graph of the actual performance improvement versus the
`
`number of processors utilized in a conventional multi—
`
`processor computing system 100 (Fig.
`
`l}
`
`is shown.
`
`In
`
`this figure,
`
`the deviation from perfect scalability of
`
`a particular application is illustrated for such a
`
`system.
`
`With reference additionally now to Fig.
`
`3B,
`
`a
`
`corresponding graph of
`
`the actual performance
`
`imprOVement versus the number of processors utilized
`
`in an adaptive processor—based computing system 200
`
`(Fig. 2)
`
`is shown.
`
`In this figure,
`
`the performance
`
`improvement provided with an adaptive processor-based
`
`computing system 200 over that of a conventional
`
`multi—processor computing system 100 is illustrated.
`
`With reference additionally now to Fig. 4A,
`
`a
`
`simplified logic flowchart is provided illustrating a
`
`conventional sequential processing operation 400 in
`
`10
`
`15
`
`20
`
`25
`
`30
`
`\\\CS — 80404.!0018 - 56166 v1
`
`Petitioner Microsoft Corporation - Ex. 1002, p. 13
`
`Petitioner Microsoft Corporation - EX. 1002, p. 13
`
`
`
`
`
`12
`
`which nested Loops A (first loop 402) and B
`
`(second
`
`loop 404) are alternately active on different phases
`
`of the process.
`
`As shown,
`
`the standard implementation of
`
`applications that have a set of nested loops 402,404
`
`is to complete the processing of the first_loop 402
`
`before proceeding to the second loop 404.
`
`The problem
`
`inherent
`
`in this approach, particularly when utilized
`
`in conjunction with field programmable gate arrays
`
`(“FPGAs”)
`
`is that all of the logic that has been
`
`instantiated is not being completely utilized.
`
`With reference additionally now to Fig. 4B,
`
`a
`
`comparative, simplified logic flowchart is shown
`
`illustrating a multi—dimensional process 410 in
`
`accordance with the technique of the present
`
`invention.
`
`The multiedimensional process 410 is
`
`effectuated such that multiple dimensions of data are
`
`processed by both Loops A (first loop 412} and B
`
`(second loop 414}
`
`such that the computing system logic
`
`is operative on every clock cycle.
`
`In contrast to the sequential processing
`
`operation 400 (Fig.
`
`4A)
`
`the solution to the problem of
`
`most effectively utiliZing available resources is to
`
`have an application evaluate a problem in a data flow
`
`sense. That is, it will “pass” a subsequent dimension
`
`of a given problem through the first loop 412 of logic
`
`concurrently with the previous dimension of data being
`
`processed through the second loop 414.
`
`In practice,
`
`a
`
`“dimension” of data can be: multiple vectors of a
`
`problem, multiple planes of a problem, multiple time
`
`steps in a problem and so forth-
`
`with reference additionally now to Fig.
`
`5A,
`
`a
`
`general process for performing a representative multi—
`
`dimensional pipeline operation is shown in the form of
`
`10
`
`15
`
`2O
`
`25
`
`3O
`
`\\\CS - BOHDAKDDKH -- 56166 v1
`
`Petitioner Microsoft Corporation - Ex. 1002, p. 14
`
`Petitioner Microsoft Corporation - EX. 1002, p. 14
`
`
`
`13
`
`a seismic migration imaging function 500.
`
`The process
`
`500 can be adapted to utilize the parallelism
`
`available in the utilization of the adaptive
`
`processing techniques of the present
`
`invention in the
`
`form of a multi—adaptive processor
`
`(MAPW,
`
`a trademark
`
`of SRC Computers,
`
`Inc.r assignee of the present
`
`invention) STEPBd routine 502.
`
`The MAP STEP3d routine
`
`502 is operation to utilize velocity data 504, source
`
`data 506 and receiver data 508 to produce a resultant
`
`image 510 as will be more fully described hereinafter.
`
`With reference additionally now to Fig. SB,
`
`the
`
`MAP STEPBd routine 502 of the preceding figure is
`
`shown in the various computational phases of: MAPTRI_x
`
`520, MAPTRI_y 522, MAPTRI_d+ 524 and MAPTRl#d— 526.
`
`With reference additionally now to Fig.
`
`6A,
`
`a
`
`simplified logic flowchart for a particular seismic
`
`migration imaging application 600 is shown.
`
`The
`
`seismic migration imaging application 600 is
`
`illustrative of the parallelism provided in the use of
`
`an adaptive processor—based computing system 200 such
`
`as that shown in Fig. 2.
`
`The representative
`
`application 600 demonstrates a nested loop parallelism
`
`in the tri—diagonal solver and the same logic can be
`
`implemented for the multiple tri-diagonal solvers in
`
`the x, y, d+ and d— directions.
`
`The computational
`
`phases of: MAPTRI_x 520, MAPTRI_y 522, MAPTRI#d+ 524
`
`and MAPTRI_d— 526 are again illustrated.
`With reference additionally now to Fig. SE,
`
`a
`
`computational process 610 is shown which may be
`
`employed by a microprocessor {“mP")
`
`in the execution
`
`of the seismic imaging application 600 of the
`
`The process 610 includes the step
`preceding figure.
`612 of reading the source field [SIZOJ] and receiver
`
`field [RtZOJ] as well as the velocity field [VtZo)] at
`
`10
`
`15
`
`20
`
`25
`
`30
`
`\\\CS — 80404;?0018 - 56166 It!
`
`Petitioner Microsoft Corporation - Ex. 1002, p. 15
`
`Petitioner Microsoft Corporation - EX. 1002, p. 15
`
`
`
`-.J-L 15441 L2;— \L'.Ji; mil .fhl n»!— 145.1!
`
`II nil-y ‘lln—l‘ ”LLB-“AL. lib-I“ M
`
`14
`
`step 614. At step 616 values are computed for
`
`S(ZM},R(ZM) which step is followed by the phases
`
`MAPTRI_X 520 and MAPTRI_y 522. At step 618,
`
`the image
`
`of Z1” is computed. This is followed by the phases
`
`MAPTRI_d+ 524 and MAPTRI_d— 526 to produce the
`
`resultant image Z at step 620.
`
`The process 610 loops
`
`over the depth slices as indicated by reference number
`
`622 and loops over the shots as indicated by reference
`number 624.
`
`With reference additionally now to Fig. 6C,
`
`the
`
`first step in a computational process 650 in
`
`accordance with the technique of the present
`
`invention
`
`is shown in which a first shot
`
`(Si)
`
`is started.
`
`The
`
`process 650 may be employed by an adaptive processor
`
`(e.g.
`
`a MA?“ adaptive processor} as disclosed herein
`
`in the execution of the seismic imaging application
`
`600 of Fig. 6A.
`
`As
`
`indicated by the shaded block,
`
`the
`
`phase MAPTRI_x 520 is active.
`
`With reference additionally now to Fig.
`
`6D,
`
`the
`
`second step in the computational process 650 is shown
`
`at a point at which a second shot
`
`(82)
`
`is started.
`
`Again, as indicated by the shaded blocks,
`
`the phase
`
`the phase MAPTRIHy 522
`MAPTRI_x 520 is active for 82,
`is active for 81 and image 31m has been produced at
`
`step 618. As shown, adaptive processors in accordance
`
`with the disclosure of the present
`
`invention support
`
`computation pipelining in multiple dimensions and the
`
`parallelism in Z and shots is shown at step 612.
`
`With reference additionally now to Fig. 6E,
`
`the
`
`third step in the computational process 650 is shown
`
`in which the operation on the first and second shots
`
`is continued through compute. As
`
`indicated by the
`
`shaded blocks,
`
`the phase MAPTRI_d+ 524 is active for
`
`10
`
`15
`
`20
`
`25
`
`30
`
`\\\CS - 3010410018 - 56166 v]
`
`Petitioner Microsoft Corporation - Ex. 1002, p. 16
`
`Petitioner Microsoft Corporation - EX. 1002, p. 16
`
`
`
`15
`
`51,
`
`the phase MAPTRI_y 522 is active for S2 and image
`
`31m has been produced at step 618.
`
`With reference additionally now to Fig. 6?,
`
`the
`
`fourth step in the computational process 650 is shown
`
`illustrating the subsequent operation on shots 81 and
`
`52.
`
`The phase MAPTRI_d+ 524 is active for S2,
`
`the
`
`phase MAPTRI_d— 526 is active for 81 and image Z has
`
`been produced at step 620.
`
`With reference additionally now to Fig.
`
`6G,
`
`the
`
`fifth step in the computational process 650 is shown
`
`as followed by the continued downward propagation of
`
`shots 81 and 52 over all of the depth slices.
`
`The
`
`phase MAPTRI_x 520 is active for 81,
`
`the phase
`
`MAPTRI_d— 526 is active for 82 and image Z has been
`
`produced at step 620.
`
`with reference additionally now to Fig. 7A,
`
`a
`
`process 700 for performing a representative systolic
`
`wavefront operation in the form of a reservoir
`
`simulation function is shown which utilizes the
`
`parallelism available in the adaptive processing
`
`techniques of the present
`
`inVention.
`
`The process 700
`
`includes a “k” loop 702, “j" loop 704 and “1" loop 706
`as shown.
`
`With reference additionally now to Fig. 7B,
`
`the
`
`general computation of fluid flow properties in the
`
`reservoir simulation process 700 of the preceding
`
`figure are illustrated as values are communicated
`
`between a group of neighboring cells 710.
`
`The group
`
`of neighboring cells 710 comprises,
`
`in the simplified
`
`illustration shown, first,
`
`second and third walls of
`
`cells 712, 714 and 716 respectively.
`
`Each of the
`
`walls of cells includes a corresponding number of
`
`first, second,
`
`third and fourth rows 718, 720, 722 and
`
`10
`
`15
`
`2O
`
`25
`
`3O
`
`724 respectively.
`
`\\\<2:5 — 8040120018 - 56166 n].
`
`Petitioner Microsoft Corporation - Ex. 1002, p. 17
`
`Petitioner Microsoft Corporation - EX. 1002, p. 17
`
`
`
`16
`
`As shown,
`
`the computation of fluid flow
`
`properties are communicated to neighboring cells 710
`
`and,
`
`importantly,
`
`this computation can be scheduled to
`
`eliminate the need for data storage.
`
`In accordance
`
`with the technique of the present invention,
`
`a set of
`
`cells can reside in an adaptive processor and the
`
`pipeline of computation can extend across multiple
`adaptive processors. Communication overhead between
`
`multiple adaptive processors may be advantageously
`
`minimized through the use of MA Fm adaptive processor
`
`chain ports as disclosed in 0.8- Patent No. 6,339,819
`
`issued on January 15, 2002 for: “Multiprocessor With
`
`Each Processor Element Accessing O