`(Only for new nonprovisional applications under
`37 CFR 1.53(b))
`
`
`
`2.
`
`[I
`
`3. E
`
`38
`
`]
`
`7. CI CD—ROM or CD—R in duplicate, large table
`or Computer Program (Appendix)
`8. Nucleotide and/or Amino Acid Sequence Submission
`(if applicable, all necessary)
`a.
`CI
`Computer Readable Form
`
`b.
`
`CI
`
`Specification Sequence Listing on:
`i. E] CD—ROM or CD—R (2 copies): or
`ii. CI paper
`
`]
`
`t
`
`PTO/SB/OS (04-04)
`A. uroved for use throu-h 07/31/2006
`
`
`
`
`
`Attorney Docket No.
`SRC015 CON
`UTILITY
`PATENT APPLICATION
`
`
`
`
`First Inventor
`Jon M. Huppenthal et aI.
`
`MULTIADAPTIVE PROCESSING SYSTEMS AND
`
`
`TECHNIQUES FOR ENHANCING PARALLELISM AND
`
`PERFORMANCE OF COMPUTATIONAL FUNCTIONS
`
`
`Commissioner for Patents
`
`
`PO. Box 1450
` APPLICATION ELEMENTS
`
`
`Alexandria, VA 22313-1450
`
`
`1.
`[I
`Fee Transmittal Form
`6. E Application Data Sheet. (See 37 CFR 1.76
`(submit an original and a duplicate for fee processing)
`Applicant claims small entity status.
`See 37 CFR 1.27
`total pages
`Specification [
`(preferred Arrangement set forth below)
`- Descriptive title of the Invention
`
`
`
`
`
`
`
`
`
`- Cross References to Related Applications
`
`
`- Statement Regarding Fed sponsored R&D
`
`
`
`- Reference to sequence listing, a table, or a
`
`
`computer program listing appendix
`
`
`- Background of the Invention
`III
`Statements verifying identity of above copies
`
`
`
`- Brief Summary of the Invention
`
`
`
`
`
`— Brief Description of the Drawings
`9. CI Assignment Papers (coversheet/document(s))
`10. D 37 CFR. 3130)) Statement
`_ Detailed Description
`E Power Of
`
`
`(when there is an assignee)
`‘ Claim(s)
`.
`Attorney
`
`
`English Translation Document
`11~ E]
`‘ Abstract 0f the Disclosure
`
`
`
`IDS 8‘ Form PTO/SB/OBA
`12. CI
`4. E Drawing(s )[ IOIaI sheets 20 ]
`I:I Copies of IDS
`
`.
`.
`.
`.
`5. E Oath or Declaration [
`C'tatlons
`Preliminary Amendmen
`total pages 3
`
`
`
`
`a. CI Newly executed (original or copy)
`
`
`14. [I Return Receipt Postcard (MPEP 503)
`b. X Copy from prior appl. (37 C.F.R.§1.63(d))
`(for continuation/divisional with Box 18 completed)
`15. CI Certified Copy of Priority Document(s)
`
`
`
`i. DDELETION OF INVENTOR(S)
`16. CI Nonpublication Request Under 35 USC 122(b)(2)(B)(i).Applicant
`
`
`
`Signed statement attached deleting
`.
`_
`,
`,
`,
`must attach form PTO/SB/35
`
`inventor(s) named in pnor application,
`
`
`17. CI Other: Certificate of Mailin- b Exaress Mail
`See 37 CFR, .. 1.53 d 2 andisg b,
`
`
`18.
`If a CONTINUING APPLICATION, check appropriate box, and supply the requisite information below and in a preliminary
`
`amendment, or in an Application Data Sheet under 37 CFR 1.76:
`
`III Continuation-in-part(C|P)
`
`of prior application No.: 10/285,318
`
`
`
`E Continuation III Divisional
` Prior application information:
`
`
`Group/Art Unit: 2183
`Examiner: Colemanl Eric
`FOR CONTINUATION OR DIVISIONAL APPS only: The entire disclosure of the prior application, from which an oath or declaration is supplied under Box 5b, is conSidered a part of the disclosure of
`
`
`Ihe accompanying continuation or dwisional application and is hereby incorporated by reference. The incorporation @n only be relied upon when a portion has been Inadvertently omitted form the
`submitted aulication narts.
`
`
`
`19. CORRESPONDENCE ADDRESS
`
`
`or E] Correspondence address below
`E Customer Number 25235
`__
`Address
`
`
`
`
`
`w.-—
`
`
`
`Nametprint/Type) Michael .Martensen
`_egistrationN-.4_901
`WM
`
`
`
`
`\\\CS - 080404/000018 . 89550 v1
`
`
`
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 1
`
`Petitioner Microsoft Corporation - EX. 1006, p. 1
`
`
`
`Attorney Docket No. SRC015 CON
`Client No. 80404 0018.001
`EFS—Web
`
`PATENT
`
`MULTI—ADAPTIVE PROCESSING SYSTEMS AND TECHNIQUES FOR
`ENHANCING PARALLELISM AND PERFORMANCE OF COMPUTATIONAL
`FUNCTIONS
`
`CROSS REFERENCE TO RELATED PATENT APPLICATIONS
`
`The present application is a Continuation of U.S.
`
`Patent Application Serial No. 10/285,318 filed
`
`October 31, 2002 which is related to the subject
`
`matter of United States Patent Application Ser. No.
`
`09/755,744 filed January 5, 2001 for: “Multiprocessor
`
`Computer Architecture Incorporating a Plurality of
`
`Memory Algorithm Processors in the Memory Subsystem”
`
`and is further related to the subject matter of
`
`United States Patent No. 6,434,687 for: “System and
`
`Method for Accelerating Web Site Access and
`
`Processing Utilizing a Computer System Incorporating
`
`Reconfigurable Processors Operating Under a Single
`
`Operating System Image”, all of which are assigned to
`
`SRC Computers,
`
`Inc , Colorado Springs, Colorado and
`
`the disclosures of which are herein specifically
`
`incorporated in their entirety by this reference.
`
`COPYRIGHT NOTICE/PERMISSION
`
`A portion of the disclosure of this patent document
`
`may contain material which is subject
`
`to copyright
`
`protection.
`
`The copyright owner has no objection to the
`
`facsimile reproduction by anyone of the patent document
`
`or the patent disclosure as it appears in the United
`
`States Patent and Trademark Office patent file or
`
`records, but otherwise,
`
`\\\CS , 030404/000013 7 39540 v1
`
`reserves all copyright rights
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 2
`
`Petitioner Microsoft Corporation - EX. 1006, p. 2
`
`
`
`whatsoever.
`
`The following notice applies to the
`
`software and data and described below,
`
`inclusive of the
`
`drawing figures where applicable: Copyright © 2000, SRC
`
`Computers,
`
`Inc.
`
`BACKGROUND OF THE INVENTION
`
`The present
`
`invention relates,
`
`in general,
`
`to the
`
`field of computing systems and techniques. More
`
`particularly,
`
`the present
`
`invention relates to multi-
`
`adaptive processing systems and techniques for enhancing
`
`parallelism and performance of computational functions.
`
`Currently, most
`
`large software applications achieve
`
`high performance operation through the use of parallel
`
`processing. This technique allows multiple processors
`
`to work simultaneously on the same problem to achieve a
`
`solution in a fraction of the time required for a single
`
`processor to accomplish the same result.
`
`The processors
`
`in use may be performing many copies of the same
`
`operation, or may be performing totally different
`
`operations, but
`
`in either case all processors are
`
`working simultaneously.
`
`The use of such parallel processing has led to the
`
`proliferation of both multi-processor boards and large
`
`scale clustered systems. However, as more and more
`
`performance is required,
`
`so is more parallelism,
`
`resulting in ever larger systems. Clusters exist today
`
`that have tens of thousands of processors and can occupy
`
`football fields of space.
`
`Systems of such a large
`
`physical size present many obvious downsides,
`
`including,
`
`among other factors, facility requirements, power, heat
`
`generation and reliability.
`
`\\\CS 7 060404/000013 7 39540 v1
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 3
`
`Petitioner Microsoft Corporation - EX. 1006, p. 3
`
`
`
`SUMMARY OF THE INVENTION
`
`However,
`
`if a processor technology could be
`
`employed that offers orders of magnitude more
`
`parallelism per processor,
`
`these systems could be
`
`reduced in size by a comparable factor.
`
`Such a
`
`processor or processing element
`
`is possible through the
`
`use of a reconfigurable processor. Reconfigurable
`
`processors instantiate only the functional units needed
`
`to solve a particular application, and as a result, have
`
`available space to instantiate as many functional units
`
`as may be required to solve the problem up to the total
`
`capacity of the integrated circuit chips they employ.
`
`At present,
`
`reconfigurable processors, such as
`
`multi—adaptive processor elements (MAPTM,
`
`a trademark of
`
`SRC Computers,
`
`Inc.) can achieve two to three orders of
`
`magnitude more parallelism and performance than state—
`
`of—the—art microprocessors.
`
`Through the advantageous
`
`application of adaptive processing techniques as
`
`disclosed herein,
`
`this type of reconfigurable processing
`
`parallelism may be employed in a variety of applications
`
`resulting in significantly higher performance than that
`
`which can now be achieved while using significantly
`
`smaller and less expensive computer systems.
`
`However,
`
`in addition to these benefits,
`
`there is an
`
`additional much less obvious one that can have even
`
`greater impact on certain applications and has only
`
`become available with the advent of multi—million gate
`
`reconfigurable chips. Performance gains are also
`
`realized by reconfigurable processors due to the much
`
`tighter coupling of the parallel functional units within
`
`\\\CS ~ 080404/000018 ~ 39540 v1
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 4
`
`Petitioner Microsoft Corporation - EX. 1006, p. 4
`
`
`
`each chip than can be accomplished in a microprocessor
`
`based computing system.
`
`In a multi—processor, microprocessor—based system,
`
`each processor is allocated but a relatively small
`
`portion of the total problem called a cell. However,
`
`to
`
`solve the total problem, results of one processor are
`
`often required by many adjacent cells because their
`
`cells interact at
`
`the boundary and upwards of six or
`
`more cells, all having to interact to compute results,
`
`would not be uncommon. Consequently,
`
`intermediate
`
`results must be passed around the system in order to
`
`complete the computation of the total problem. This, of
`
`necessity,
`
`involves numerous other chips and busses that
`
`run at much slower speeds than the microprocessor thus
`
`resulting in system performance often many orders of
`
`magnitude lower than the raw computation time.
`
`On the other hand,
`
`in the use of an adaptive
`
`processor—based system, since ten to one thousand times
`
`more computations can be performed within a single chip,
`
`any boundary data that
`
`is shared between these
`
`functional units need never leave a single integrated
`
`circuit chip. Therefore, data moving around the system,
`
`and its impact on reducing overall system performance,
`
`can also be reduced by two or three orders of magnitude.
`
`This will allow both significant improvements in
`
`performance in certain applications as well as enabling
`
`certain applications to be performed in a practical
`
`timeframe that could not previously be accomplished.
`
`Particularly disclosed herein is a method for data
`
`processing in a reconfigurable computing system
`
`comprising a plurality of functional units.
`
`The method
`
`\\\CS 7 080404/000018 7 89540 vl
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 5
`
`Petitioner Microsoft Corporation - EX. 1006, p. 5
`
`
`
`comprises: defining a calculation for the reconfigurable
`
`computing system;
`
`instantiating at least two of the
`
`functional units to perform the calculation; utilizing a
`
`first of the functional units to operate upon a
`
`subsequent data dimension of the calculation and
`
`substantially concurrently utilizing a second of the
`
`functional units to operate upon a previous data
`
`dimension of the calculation.
`
`Further disclosed herein is a method for data
`
`processing in a reconfigurable computing system
`
`comprising a plurality of functional units. The method
`
`comprises: defining a first systolic wall comprising
`
`rows of cells forming a subset of the plurality of
`
`functional units; computing a value at each of the cells
`
`in at least a first row of the first systolic wall;
`
`communicating the values between cells in the first row
`
`of the cells to produce updated values; communicating
`
`the updated values to a second row of the first systolic
`
`wall; and substantially concurrently providing the
`
`updated values to a first row of a second systolic wall
`
`of rows of cells in the subset of the plurality of
`
`functional units.
`
`Also disclosed herein is a method for data
`
`processing in a reconfigurable processing system which
`
`includes setting up a systolic processing form employing
`
`a speculative processing strategy.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The aforementioned and other features and objects
`
`of the present
`
`invention and the manner of attaining
`
`them will become more apparent and the invention itself
`
`will be best understood by reference to the following
`
`\\\CS - 080404/000018 - 89540 v1
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 6
`
`Petitioner Microsoft Corporation - EX. 1006, p. 6
`
`
`
`description of a preferred embodiment
`
`taken in
`
`conjunction with the accompanying drawings, wherein:
`
`Fig.
`
`l
`
`is a simplified functional block diagram of
`
`typical clustered inter—processor communications path in
`
`a conventional multi—processor computing system;
`
`Fig.
`
`2
`
`is a functional block diagram of an adaptive
`
`processor communications path illustrating the many
`
`functional units (“FU”)
`
`interconnected by reconfigurable
`
`routing resources within the adaptive processor chip;
`
`Fig.
`
`3A is a graph of the actual performance
`
`improvement versus the number of processors utilized and
`
`illustrating the deviation from perfect scalability of a
`
`particular application utilizing a conventional multi—
`
`processor computing system such as that illustrated in
`
`Fig. 1;
`
`Fig.
`
`3B is a corresponding graph of the actual
`
`performance improvement versus the number of processors
`
`utilized and illustrating the performance improvement
`
`over a conventional multi—processor computing system
`
`utilizing an adaptive processor—based computing system
`
`such as that illustrated in Fig. 2;
`
`Fig.
`
`4A is a simplified logic flowchart
`
`illustrating a conventional sequential processing
`
`operation in which nested Loops A and B are alternately
`
`active on different phases of the process;
`
`Fig.
`
`4B is a comparative, simplified logic
`
`flowchart illustrating multi—dimensional processing in
`
`accordance with the technique of the present
`
`invention
`
`wherein multiple dimensions of data are processed by
`
`both Loops A and B such that the computing system logic
`
`is operative on every clock cycle;
`
`\\\CS . 080404/000018 7 89540 v1
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 7
`
`Petitioner Microsoft Corporation - EX. 1006, p. 7
`
`
`
`Fig.
`
`5A is illustrative of a general process for
`
`performing a representative multi—dimensional pipeline
`
`operation in the form of a seismic migration imaging
`
`function utilizing the parallelism available in the
`
`utilization of the adaptive processing techniques of the
`
`present
`
`invention;
`
`Fig.
`
`5B is a follow—on illustration of the
`
`computation phases employed in implementing the
`
`exemplary seismic migration imaging function of the
`
`preceding figure;
`
`Fig.
`
`6A is a simplified logic flowchart for a
`
`particular seismic migration imaging application
`
`illustrative of the parallelism provided in the use of
`
`an adaptive processor—based computing system;
`
`Fig 6B illustrates the computational process which
`
`may be employed by a microprocessor in the execution of
`
`the seismic imaging application of the preceding figure;
`
`Fig.
`
`6C illustrates the first step in the
`
`computational process which may be employed by an
`
`adaptive processor in the execution of the seismic
`
`imaging application of Fig.
`
`6A in which a first shot
`
`(81)
`
`is started;
`
`Fig.
`
`6D illustrates the second step in the same
`
`computational process for the execution of the seismic
`
`imaging application of Fig.
`
`6A in which a second shot
`
`(82)
`
`is started;
`
`Fig.
`
`6E illustrates the third step in the same
`
`computational process for the execution of the seismic
`
`imaging application of Fig.
`
`6A in which the operation on
`
`the first and second shots is continued through compute;
`
`\\\CS . 080404/000018 . 89540 vi
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 8
`
`Petitioner Microsoft Corporation - EX. 1006, p. 8
`
`
`
`Fig. 6F illustrates the fourth step in the same
`
`computational process showing the subsequent operation
`
`on shots S1 and S2;
`
`Fig.
`
`6G illustrates the fifth step in the same
`
`computational process followed by the continued downward
`
`propagation of shots S1 and S2 over all of the depth
`
`slices;
`
`Fig.
`
`7A illustrates a process for performing a
`
`representative systolic wavefront operation in the form
`
`of a reservoir simulation function also utilizing the
`
`parallelism available in the utilization of the adaptive
`
`processing techniques of the present
`
`invention;
`
`Fig. 78 illustrates the general computation of
`
`fluid flow properties in the reservoir simulation of the
`
`preceding figure which are communicated to neighboring
`
`cells;
`
`Fig.
`
`7C illustrates the creation of a systolic wall
`
`of computation at Time Set
`
`1 which has been started for
`
`a vertical wall of cells and in which communication of
`
`values between adjacent
`
`rows in the vertical wall can
`
`occur without storing values to memory;
`
`Fig.
`
`7D is a follow on illustration of the creation
`
`of a systolic wall of computation at Time Set
`
`1 and Time
`
`Set
`
`2 showing how a second vertical wall of cells is
`
`started after the computation for cells in the
`
`corresponding row of the first wall has been completed;
`
`Fig.
`
`8A illustrates yet another process for
`
`performing a representative systolic wavefront operation
`
`in the form of the systolic processing of bioinformatics
`
`also utilizing the parallelism available in the
`
`\\\CS 7 080404/000018 7 89540 v1
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 9
`
`Petitioner Microsoft Corporation - EX. 1006, p. 9
`
`
`
`utilization of the adaptive processing techniques of the
`
`present
`
`invention;
`
`Fig.
`
`8B illustrates a systolic wavefront processing
`
`operation which further incorporates a speculative
`
`processing strategy based upon an evaluation of the rate
`
`of change of XB;
`
`Fig.
`
`8C is a further illustration of the systolic
`
`wavefront processing operation of the preceding figure
`
`incorporating speculative processing;
`
`Fig.
`
`9A illustrates still another process for
`
`performing a representative systolic wavefront operation
`
`in the form of structure codes calculating polynomials
`
`at grid intersections, again utilizing the parallelism
`
`available in the utilization of the adaptive processing
`
`techniques of the present
`
`invention;
`
`Fig.
`
`9B illustrates the computation start for a
`
`vertical wall of grid points at Time Set
`
`1 for a
`
`polynomial evaluation performed on grid intersections
`
`wherein calculations between rows are done in a
`
`stochastic fashion using values from a previous row; and
`
`Fig.
`
`9C is a further illustration of the polynomial
`
`evaluation performed on grid intersections of the
`
`preceding figure wherein a second wall
`
`is started after
`
`the cells in the corresponding row of the first wall
`
`have been completed.
`
`DESCRIPTION OF A REPRESENTATIVE EMBODIMENT
`
`This application incorporates by reference the
`
`entire disclosure of Caliga, D. et al. “Delivering
`
`Acceleration:
`
`“The Potential for Increased HPC
`
`Application Performance Using Reconfigurable Logic”,
`
`SC200l, November 2001, ACM l—58113-293-X/Ol/OOll.
`
`\\\Cs A 080404/000018 u 89540 v1
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 10
`
`Petitioner Microsoft Corporation - EX. 1006, p. 10
`
`
`
`10
`
`With reference now to Fig. 1,
`
`a simplified
`
`functional block diagram of typical clustered inter—
`
`processor communications path in a conventional multi-
`
`processor computing system 100 is shown.
`
`The computer
`
`system comprises a number of memory and input/output
`
`(“I/O” controller integrated circuits (“ICs”)
`
`1020
`
`through 102N,
`
`(e.g. “North Bridge”) 102 such as the
`
`P4X333/P4X4OO devices available from VIA Technologies,
`
`Inc.;
`
`the M1647 device available from Acer Labs,
`
`Inc.
`
`and the 824430X device available from Intel Corporation.
`
`The North Bridge IC 102 is coupled by means of a Front
`
`Side Bus
`
`(“FSB”)
`
`to one or more microprocessors 104“
`
`though 10403 and 104NO through 1O4N3 such as one of the
`
`Pentium® series of processors also available from Intel
`
`Corporation.
`
`The North Bridge ICs 1020 through 102N are coupled
`
`to respective blocks of memory 1060 through 106N as well
`
`as to a corresponding I/O bridge element 1080 through
`
`108N.
`
`A network interface card (“NIC”)
`
`1100 through 210N
`
`couples the I/O bus of the respective I/O bridge 1080
`
`through 108N to a cluster bus coupled to a common
`
`clustering hub (or Ethernet Switch) 112.
`
`Since typically a maximum of four microprocessors
`
`104, each with two or four functional units, can reside
`
`on a single Front Side Bus, any communication to more
`
`than four must pass over the Front Side Bus,
`
`inter—
`
`bridge bus,
`
`input/output
`
`(“I/O”) bus, cluster
`
`interconnect
`
`(e.g. an Ethernet clustering hub 112) and
`
`then back again to the receiving processor 104.
`
`The I/O
`
`bus is typically an order of magnitude lower in
`
`bandwidth than the Front Side Bus, which means that any
`
`\\\CS . 030404/000013 . 39540 v1
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 11
`
`Petitioner Microsoft Corporation - EX. 1006, p. 11
`
`
`
`ll
`
`processing involving more than the four processors 104
`
`will be significantly throttled by the loose coupling
`
`caused by the interconnect. All of this is eliminated
`
`with a reconfigurable processor having hundreds or
`
`thousands of functional units per processor.
`
`With reference additionally now to Fig. 2,
`
`a
`
`functional block diagram of an adaptive processor 200
`
`communications path for implementing the technique of
`
`the present
`
`invention is shown.
`
`The adaptive processor
`
`200 includes an adaptive processor chip 202 incorporates
`
`a large number of functional units (“FU”) 204
`
`interconnected by reconfigurable routing resources.
`
`The
`
`adaptive processor chip 202 is coupled to a memory
`
`element 206 as well as an interconnect 208 and a number
`
`of additional adaptive processor chips 210.
`
`As shown, each adaptive processor chip 202 can
`
`contain thousands of functional units 204 dedicated to
`
`the particular problem at hand.
`
`Interconnect between
`
`these functional units is created by reconfigurable
`
`routing resources inside each chip 202.
`
`As a result,
`
`the functional units 204 can share or exchange data at
`
`much higher data rates and lower latencies than a
`
`standard microprocessor 104
`
`(Fig. 1).
`
`In addition,
`
`the
`
`adaptive processor chips 202 can connect directly to the
`
`inter—processor interconnect 208 and do not require the
`
`data to be passed through multiple chips in a chipset
`
`in
`
`order to communicate. This is because the adaptive
`
`processor can implement whatever kind of interface is
`
`needed to accomplish this connection.
`
`With reference additionally now to Fig. 3A,
`
`a graph
`
`of the actual performance improvement versus the number
`
`\\\Cs 7 080404/000018 - 89540 vl
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 12
`
`Petitioner Microsoft Corporation - EX. 1006, p. 12
`
`
`
`12
`
`of processors utilized in a conventional multi—processor
`
`computing system 100 (Fig.
`
`l)
`
`is shown.
`
`In this figure,
`
`the deviation from perfect scalability of a particular
`
`application is illustrated for such a system.
`
`With reference additionally now to Fig. 3B,
`
`a
`
`corresponding graph of the actual performance
`
`improvement versus the number of processors utilized in
`
`an adaptive processor—based computing system 200 (Fig.
`
`2)
`
`is shown.
`
`In this figure,
`
`the performance
`
`improvement provided with an adaptive processor—based
`
`computing system 200 over that of a conventional multi~
`
`processor computing system 100 is illustrated.
`
`With reference additionally now to Fig. 4A,
`
`a
`
`simplified logic flowchart
`
`is provided illustrating a
`
`conventional sequential processing operation 400 in
`
`which nested Loops A (first loop 402) and B
`
`(second loop
`
`404) are alternately active on different phases of the
`
`process.
`
`As shown,
`
`the standard implementation of
`
`applications that have a set of nested loops 402,404 is
`
`to complete the processing of the first loop 402 before
`
`proceeding to the second loop 404.
`
`The problem inherent
`
`in this approach, particularly when utilized in
`
`conjunction with field programmable gate arrays
`
`(“FPGAs”)
`
`is that all of the logic that has been
`
`instantiated is not being completely utilized.
`
`With reference additionally now to Fig. 4B,
`
`a
`
`comparative, simplified logic flowchart is shown
`
`illustrating a multi—dimensional process 410 in
`
`accordance with the technique of the present
`
`invention.
`
`The multi—dimensional process 410 is effectuated such
`
`\\\CS ,, 080404/000018 ~ 89540 V1
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 13
`
`Petitioner Microsoft Corporation - EX. 1006, p. 13
`
`
`
`13
`
`that multiple dimensions of data are processed by both
`
`Loops A (first loop 412) and B (second loop 414)
`
`such
`
`that the computing system logic is operative on every
`
`clock cycle.
`
`In contrast to the sequential processing operation
`
`400 (Fig.
`
`4A)
`
`the solution to the problem of most
`
`effectively utilizing available resources is to have an
`
`application evaluate a problem in a data flow sense.
`
`That is,
`
`it will “pass” a subsequent dimension of a
`
`given problem through the first loop 412 of logic
`
`concurrently with the preVious dimension of data being
`
`processed through the second loop 414.
`
`In practice,
`
`a
`
`“dimension” of data can be: multiple vectors of a
`
`problem, multiple planes of a problem, multiple time
`
`steps in a problem and so forth.
`
`With reference additionally now to Fig.
`
`5A,
`
`a
`
`general process for performing a representative multi—
`
`dimensional pipeline operation is shown in the form of a
`
`seismic migration imaging function 500.
`
`The process 500
`
`can be adapted to utilize the parallelism available in
`
`the utilization of the adaptive processing techniques of
`
`the present
`
`invention in the form of a multi—adaptive
`
`processor
`
`(MAFm,
`
`a trademark of SEC Computers,
`
`Inc.,
`
`assignee of the present
`
`invention) STEPBd routine 502.
`
`The MAP STEPBd routine 502 is operation to utilize
`
`velocity data 504, source data 506 and receiver data 508
`
`to produce a resultant image 510 as will be more fully
`
`described hereinafter.
`
`With reference additionally now to Fig. SB,
`
`the MAP
`
`STEPBd routine 502 of the preceding figure is shown in
`
`\\\CS , 050404/000015 7 69540 v1
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 14
`
`Petitioner Microsoft Corporation - EX. 1006, p. 14
`
`
`
`14
`
`the various computational phases of: MAPTRI_X 520,
`
`MAPTRI_y 522, MAPTRI_d+ 524 and MAPTRI_d— 526.
`
`With reference additionally now to Fig. 6A,
`
`a
`
`simplified logic flowchart for a particular seismic
`
`migration imaging application 600 is shown.
`
`The seismic
`
`migration imaging application 600 is illustrative of the
`
`parallelism provided in the use of an adaptive
`
`processor—based computing system 200 such as that shown
`
`in Fig. 2.
`
`The representative application 600
`
`demonstrates a nested loop parallelism in the tri—
`
`diagonal solver and the same logic can be implemented
`
`for the multiple tri—diagonal solvers in the x, y, d+
`
`and d— directions.
`
`The computational phases of:
`
`MAPTRI_X 520, MAPTRI_y 522, MAPTRI_d+ 524 and MAPTRI_d-
`
`526 are again illustrated.
`
`With reference additionally now to Fig. 6B,
`
`a
`
`computational process 610 is shown which may be employed
`
`by a microprocessor
`
`(“mP”)
`
`in the execution of the
`
`seismic imaging application 600 of the preceding figure.
`
`The process 610 includes the step 612 of reading the
`
`source field [S(Zo)] and receiver field [R(Zo)] as well
`
`as the velocity field [V(Zo)] at step 614. At step 616
`
`values are computed for 8(Zm),R(Zm) which step is
`
`followed by the phases MAPTRI_X 520 and MAPTRI_y 522.
`
`At step 618,
`
`the image of 21” is computed. This is
`
`followed by the phases MAPTRI_d+ 524 and MAPTRI_d— 526
`
`to produce the resultant image Z at step 620.
`
`The
`
`process 610 loops over the depth slices as indicated by
`
`reference number 622 and loops over the shots as
`
`indicated by reference number 624.
`
`\\\CS ~ 080404/000018 . 89540 V1
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 15
`
`Petitioner Microsoft Corporation - EX. 1006, p. 15
`
`
`
`15
`
`With reference additionally now to Fig. 6C,
`
`the
`
`first step in a computational process 650 in accordance
`
`with the technique of the present
`
`invention is shown in
`
`which a first shot
`
`(81)
`
`is started.
`
`The process 650 may
`
`be employed by an adaptive processor (e.g.
`
`a MAPTM
`
`adaptive processor) as disclosed herein in the execution
`
`of the seismic imaging application 600 of Fig. 6A.
`
`As
`
`indicated by the shaded block,
`active.
`
`the phase MAPTRI_X 520 is
`
`With reference additionally now to Fig.
`
`6D,
`
`the
`
`second step in the computational process 650 is shown at
`
`a point at which a second shot
`
`(82)
`
`is started. Again,
`
`as indicated by the shaded blocks,
`
`the phase MAPTRI_X
`
`520 is active for 82,
`
`the phase MAPTRI_y 522 is active
`
`for 81 and image Z1” has been produced at step 618.
`
`As
`
`shown, adaptive processors in accordance with the
`
`disclosure of the present
`
`invention support computation
`
`pipelining in multiple dimensions and the parallelism in
`
`Z and shots is shown at step 612.
`
`With reference additionally now to Fig. 6E,
`
`the
`
`third step in the computational process 650 is shown in
`
`which the operation on the first and second shots is
`
`continued through compute.
`
`As
`
`indicated by the shaded
`
`blocks,
`
`the phase MAPTRI_d+ 524 is active for 81,
`
`the
`
`phase MAPTRI_y 522 is active for S2 and image Z1” has
`
`been produced at step 618.
`
`With reference additionally now to Fig. 6F,
`
`the
`
`fourth step in the computational process 650 is shown
`
`illustrating the subsequent operation on shots 81 and
`
`S2.
`
`The phase MAPTRI_d+ 524 is active for 82,
`
`the phase
`
`\\\CS , 080404/000018 , 89540 v1
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 16
`
`Petitioner Microsoft Corporation - EX. 1006, p. 16
`
`
`
`16
`
`MAPTRI_d- 526 is active for S1 and image Z has been
`
`produced at step 620.
`
`With reference additionally now to Fig.
`
`6G,
`
`the
`
`fifth step in the computational process 650 is shown as
`
`followed by the continued downward propagation of shots
`
`S1 and 82 over all of the depth slices.
`
`The phase
`
`MAPTRI_X 520 is active for 81,
`
`the phase MAPTRI_d- 526
`
`is active for S2 and image Z has been produced at step
`620.
`
`With reference additionally now to Fig.
`
`7A,
`
`a
`
`process 700 for performing a representative systolic
`
`wavefront operation in the form of a reservoir
`
`simulation function is shown which utilizes the
`
`parallelism available in the adaptive processing
`
`techniques of
`
`the present
`
`invention.
`
`The process 700
`
`includes a “k”
`as shown.
`
`loop 702, “j” loop 704 and “i” loop 706
`
`With reference additionally now to Fig. 7B,
`
`the
`
`general computation of fluid flow properties in the
`
`reservoir simulation process 700 of the preceding figure
`
`are illustrated as values are communicated between a
`
`group of neighboring cells 710.
`
`The group of
`
`neighboring cells 710 comprises,
`
`in the simplified
`
`illustration shown, first,
`
`second and third walls of
`
`cells 712, 714 and 716 respectively.
`
`Each of the walls
`
`of cells includes a corresponding number of first,
`
`second,
`
`third and fourth rows 718, 720, 722 and 724
`
`respectively.
`
`As shown,
`
`the computation of fluid flow properties
`
`are communicated to neighboring cells 710 and,
`
`importantly,
`
`this computation can be scheduled to
`
`\\\Cs - 080404/000018 - 59540 v1
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 17
`
`Petitioner Microsoft Corporation - EX. 1006, p. 17
`
`
`
`l7
`
`eliminate the need for data storage.
`
`In accordance with
`
`the technique of the present
`
`invention,
`
`a set of cells
`
`can reside in an adaptive processor and the pipeline of
`
`computation can extend across multiple adaptive
`
`processors. Communication overhead between multiple
`
`adaptive processors may be advantageously minimized
`
`through the use of MAPTM adaptive processor chain ports
`
`as disclosed in U.S. Patent No. 6,339,819 issued on
`
`January 15, 2002 for: “Multiprocessor With Each
`
`Processor Element Accessing Operands in Loaded Input
`
`Buffer and Forwarding Results to FIFO Output Buffer”,
`
`assigned to SRC Computers,
`
`Inc., assignee of the present
`
`invention,
`
`the disclosure of which is herein
`
`specifically incorporated by this reference.
`
`With reference additionally now to Fig. 7C,
`
`the
`
`creation of a systolic wall 712 of computation at Time
`
`Set
`
`1 is shown.
`
`The systolic wall 712 has been started
`
`for a vertical wall of cells and communication of values
`
`between adjacent
`
`rows 718 through 724 in the vertical
`
`wall can occur without storing values to memory.
`
`With reference additionally now to Fig.
`
`7D,
`
`a
`
`follow on illustration of the creation of a systolic
`
`wall 712 of computation at Time Set
`
`1 and a second
`
`systolic wall 714 at Time Set
`
`2
`
`is shown.
`
`In operation,
`
`a second vertical wall of cells is started after the
`
`computation for cells in the corresponding row of the
`
`first wall has been completed.
`
`Thus,
`
`for example, at
`
`time to,
`
`the first row 718 of systolic wall 712 is
`
`completed and the results passed to the first row 718 of
`
`the second systolic wall 714. At
`
`time t1,
`
`the second
`
`row 720 of the first systolic wall 712 and the first row
`
`\\\Cs — 080404/000018 w 89540 v1
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 18
`
`Petitioner Microsoft Corporation - EX. 1006, p. 18
`
`
`
`18
`
`718 of the second systolic wall 714 are computed.
`
`Thereafter, at
`
`time t2,
`
`the third row 722 of the first
`
`systolic wall 712 and the second row 720 of the second
`
`systolic wall