`Approved for use through 07/31/2006
`
`
`
`
`Attorney Docket No.
`SRCO15 CON
`UTILITY
`
`
`PATENT APPLICATION
`First Inventor
`Jon M. Huppenthal et al.
`
`
`
`
`MULTI-ADAPTIVE PROCESSING SYSTEMS AND
`
`
`TRANSMITTAL
`(Only for new nonprovisional applications under
`TECHNIQUES FOR ENHANCING PARALLELISM AND
`
`
`PERFORMANCE OF COMPUTATIONAL FUNCTIONS
`37 CFR 1.53{b))
`
`
`
`
`Commissioner for Patents
`P.O. Box 1450
`
`2, O
`
`3. &
`
`38
`
`]
`
`i.
`
`on
`
`[J
`
`
`
`7. L) CD-ROM or CD-Rin duplicate, large table
`or Computer Program (Appendix)
`8. Nucleotide and/or Amino Acid Sequence Submission
`(if applicable, all necessary)
`a.
`[1
`Computer Readable Form
`b.
`CT]
`Specification Sequence Listing on:
`i.
`[J] CD-ROM or CD-R (2 copies); or
`ii. (1 paper
`
`Alexandria, VA 22313-1450 APPLICATION ELEMENTS
`
`
`
`
`10 Fee Transmittal Form
`6. &] Application Data Sheet. (See 37 CFR 1.76
`(submit an original and a duplicate for fee processing)
`Applicant claims small entity status.
`
`
`See 37 CFR 1.27
`total pages
`Specification [
`
`
`(preferred Arrangementsetforth below}
`- Descriptive title of the Invention
`
`
`
`
`- Cross References to Related Applications
`
`
`- Statement Regarding Fed sponsored R&D
`
`- Reference to sequencelisting, a table, or a
`
`
`computer program listing appendix
`
`- Background of the Invention
`Statements verifying identity of above copies
`
`
`- Brief Summary of the Invention
`
`
`ACCOMPANYING APPLICATION PARTS
`
`
`
`9. L] Assignment Papers (coversheet/document(s))
`~ Brief Description of the Drawings
`
`
`& Powerof
`10.0
`37 CFR. 3.73(b) Statement
`- Detailed Description
`
`
`- Claim(s)
`(when thereis an assignee)
`Attorney
`
`
`- Abstract of the Disclosure 11.——English Translation Document
`
`
`
`
`
`
`
`
`
`4. 12.1)='!DS & Form PTO/SB/08A[J Drawing(s)[ total sheets 20 ] J Copiesof IDS
`Preliminary Amendment
`itati
`
`5.
`[Oath or Declaration[
`total pages 3
`]
`13.0
`iminary
`d
`Citations
`
`
`a. L] Newly executed (original or copy)
`
`
`
`14. (] Return Receipt Postcard (MPEP 503)
`b.
`EX] Copy from prior appl. (37 C.F.R. § 1.63(d))
`15. L] Certified Copy of Priority Document(s)
`
`
`
`(for continuation/divisional with Box 18 completed}
`[LJBELETION OF INVENTOR(S)
`16. [] Nonpublication Request Under 35 USC 122(b)(2)(B)(i).Applicant
`
`
`
`Signed statementattached deleting
`must attach form PTO/SB/35
`
`
`
`inventor(s} named in prior application,
`
`
`2
`
`see 37 CER. §§ 1.63(d
`17. LC] Other: Certificate of Mailing by Express Mail
`
`
`If a CONTINUING APPLICATION, check appropriate box, and supply the requisite information below andin a preliminary
`18.
`
`amendment, or in an Application Data Sheet under 37 CFR 1.76:
`
`[1 Continuation-in-part (CIP)
`
`of prior application No.: 10/285,318
`
`
`
`&X] Continuation [] Divisional
`
`
` Prior application information:
`Group/Art Unit: 2183
`Examiner: Coleman, Eric
`
`
`FOR CONTINUATION OR DIVISIONAL APPS only: The entire disclosure of the prior application, from which an aath or declaration is supplied under Box 5b, is considered a part of the disclosure of
`Ihe accompanying continuation or divisional application and is hereby incorporated by reference. The incorporation can only be relied upon whena portion has been inadvertently omitted form the
`submitted application parts.
`
`
`
`
`or] Correspondence address below
`& Customer Number 25235
`[omeOOOCOCSCSCSCSC*Y
`
`
`[Country|i Telephone RameeateName(Print/Type) MichaelG . Martensen|RegistrationNo.||RegistrationNo.|fast
`
`
`
`
`AMela.||Ae2sur
`
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 1
`
`19. CORRESPONDENCE ADDRESS
`
`
`
`
`Address
`
`fey]CdSedCi
`
`\WCS - 080404/000018 - 89550 v1
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 1
`
`
`
`PATENT
`SRC0O15 CON
`Attorney Docket No.
`Client No. 80404.0018.001
`EFS-Web
`
`MULTI-ADAPTIVE PROCESSING SYSTEMS AND TECHNIQUES FOR
`ENHANCING PARALLELISM AND PERFORMANCE OF COMPUTATIONAL
`FUNCTIONS
`
`CROSS REFERENCE TO RELATED PATENT APPLICATIONS
`
`The present application is a Continuation of U.S.
`
`Patent Application Serial No. 10/285,318 filed
`
`October 31, 2002 which is related to the subject
`matter of United States Patent Application Ser. No.
`09/755,744 filed January 5, 2001 for: “Multiprocessor
`Computer Architecture Incorporating a Plurality of
`Memory Algorithm Processors in the Memory Subsystem”
`and is further related to the subject matter of
`United States Patent No. 6,434,687 for: “System and
`Method for Accelerating Web Site Access and
`Processing Utilizing a Computer System Incorporating
`Reconfigurable Processors Operating Under a Single
`Operating System Image”, all of which are assigned to
`SRC Computers,
`Inc., Colorado Springs, Colorado and
`the disclosures of which are herein specifically
`incorporated in their entirety by this reference.
`
`COPYRIGHT NOTICE/ PERMISSION
`
`A portion of the disclosure of this patent document
`may contain material which is subject
`to copyright
`protection.
`The copyright owner has no objection to the
`facsimile reproduction by anyone of the patent document
`or the patent disclosure as it appears in the United
`States Patent and Trademark Office patent file or
`records, but otherwise,
`reserves all copyright rights
`
`\\\cs - 080404/000018 - 89540 vi
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 2
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 2
`
`
`
`The following notice applies to the
`whatsoever.
`software and data and described below,
`inclusive of the
`drawing figures where applicable: Copyright © 2000, SRC
`Computers,
`Inc.
`
`BACKGROUND OF THE INVENTION
`
`in general,
`invention relates,
`The present
`field of computing systems and techniques. More
`particularly,
`the present
`invention relates to multi-
`adaptive processing systems and techniques for enhancing
`parallelism and performance of computational functions.
`
`to the
`
`large software applications achieve
`Currently, most
`high performance operation through the use of parallel
`processing. This technique allows multiple processors
`to work simultaneously on the same problem to achieve a
`solution in a fraction of the time required for a single
`processor to accomplish the same result.
`The processors
`in use may be performing many copies of the same
`operation, or may be performing totally different
`operations, but
`in either case all processors are
`working simultaneously.
`The use of such parallel processing has led to the
`proliferation of both multi-processor boards and large
`scale clustered systems. However, as more and more
`performance is required,
`so is more parallelism,
`resulting in ever larger systems. Clusters exist today
`that have tens of thousands of processors and can occupy
`football fields of space.
`Systems of such a large
`physical size present many obvious downsides,
`including,
`among other factors, facility requirements, power, heat
`generation and reliability.
`
`\\\CS - 080404/000018 - 89540 vi
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 3
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 3
`
`
`
`SUMMARY OF THE INVENTION
`
`However,
`
`if a processor technology could be
`
`employed that offers orders of magnitude more
`
`parallelism per processor,
`
`these systems could be
`
`reduced in size by a comparable factor.
`
`Such a
`
`processor or processing element
`
`is possible through the
`
`use of a reconfigurable processor. Reconfigurable
`
`processors instantiate only the functional units needed
`
`to solve a particular application, and as a result, have
`
`available space to instantiate as many functional units
`as may be required to solve the problem up to the total
`Capacity of the integrated circuit chips they employ.
`
`At present,
`
`reconfigurable processors, such as
`
`multi-adaptive processor elements (MAP™,
`
`a trademark of
`
`SRC Computers,
`
`Inc.) can achieve two to three orders of
`
`magnitude more parallelism and performance than state-
`of-the-art microprocessors.
`Through the advantageous
`application of adaptive processing techniques as
`disclosed herein,
`this type of reconfigurable processing
`parallelism may be employed in a variety of applications
`resulting in significantly higher performance than that
`which can now be achieved while using Significantly
`smaller and less expensive computer systems.
`
`However,
`
`in addition to these benefits,
`
`there is an
`
`additional much less obvious one that can have even
`
`greater impact on certain applications and has only
`become available with the advent of multi-million gate
`reconfigurable chips. Performance gains are also
`realized by reconfigurable processors due to the much
`
`tighter coupling of the parallel functional units within
`
`\A\A\CS
`
`- 080404/000018 - 89540 v1
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 4
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 4
`
`
`
`each chip than can be accomplished in a microprocessor
`
`based computing system.
`
`In a multi-processor, microprocessor-based system,
`each processor is allocated but a relatively small
`portion of the total problem called a cell. However,
`solve the total problem, results of one processor are
`often required by many adjacent cells because their
`
`to
`
`cells interact at
`
`the boundary and upwards of six or
`
`more cells, all having to interact to compute results,
`would not be uncommon. Consequently,
`intermediate
`results must be passed around the system in order to
`complete the computation of the total problem. This, of
`necessity,
`involves numerous other chips and busses that
`
`run at much slower speeds than the microprocessor thus
`resulting in system performance often many orders of
`magnitude lower than the raw computation time.
`
`in the use of an adaptive
`On the other hand,
`processor-based system, since ten to one thousand times
`
`more computations can be performed within a single chip,
`any boundary data that
`is shared between these
`functional units need never leave a single integrated
`circuit chip. Therefore, data moving around the system,
`and its impact on reducing overall system performance,
`can also be reduced by two or three orders of magnitude.
`This will allow both significant improvements in
`performance in certain applications as well as enabling
`certain applications to be performed in a practical
`timeframe that could not previously be accomplished.
`Particularly disclosed herein is a method for data
`processing in a reconfigurable computing system
`comprising a plurality of functional units.
`The method
`
`\\\CS - 080404/000018 - 89540 vl
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 5
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 5
`
`
`
`comprises: defining a calculation for the reconfigurable
`
`computing system;
`
`instantiating at least two of the
`
`functional units to perform the calculation; utilizing a
`first of the functional units to operate upon a
`
`subsequent data dimension of the calculation and
`
`substantially concurrently utilizing a second of the
`
`functional units to operate upon a previous data
`
`dimension of the calculation.
`
`Further disclosed herein is a method for data
`
`processing in a reconfigurable computing system
`
`comprising a plurality of functional units. The method
`
`comprises: defining a first systolic wall comprising
`rows of cells forming a subset of the plurality of
`functional units; computing a value at each of the cells
`
`in at least a first row of the first systolic wall;
`
`communicating the values between cells in the first row
`
`of the cells to produce updated values; communicating
`the updated values to a second row of the first systolic
`wall; and substantially concurrently providing the
`updated values to a first row of a second systolic wall
`of rows of cells in the subset of the plurality of
`functional units.
`
`Also disclosed herein is a method for data
`
`processing in a reconfigurable processing system which
`includes setting up a systolic processing form employing
`a speculative processing strategy.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The aforementioned and other features and objects
`of the present
`invention and the manner of attaining
`them will become more apparent and the invention itself
`will be best understood by reference to the following
`
`\\\CS - 080404/000018 - 89540 v1
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 6
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 6
`
`
`
`description of a preferred embodiment
`
`taken in
`
`conjunction with the accompanying drawings, wherein:
`Fig.
`1
`is a simplified functional block diagram of
`typical clustered inter-processor communications path in
`a conventional multi-processor computing system;
`Fig.
`2
`is a functional block diagram of an adaptive
`processor communications path illustrating the many
`Functional units (“FU”)
`interconnected by reconfigurable
`routing resources within the adaptive processor chip;
`Fig.
`3A is a graph of the actual performance
`improvement versus the number of processors utilized and
`illustrating the deviation from perfect scalability of a
`particular application utilizing a conventional multi-
`
`processor computing system such as that illustrated in
`
`Fig. 1;
`
`3B is a corresponding graph of the actual
`Fig.
`performance improvement versus the number of processors
`utilized and illustrating the performance improvement
`over a conventional multi-processor computing system
`utilizing an adaptive processor-based computing system
`such as that illustrated in Fig. 2;
`Fig.
`4A is a simplified logic flowchart
`illustrating a conventional sequential processing
`operation in which nested Loops A and B are alternately
`active on different phases of the process;
`Fig.
`4B is a comparative, simplified logic
`flowchart illustrating multi-dimensional processing in
`accordance with the technique of the present
`invention
`wherein multiple dimensions of data are processed by
`both Loops A and B such that the computing system logic
`is operative on every clock cycle;
`
`\\\CS - 080404/000018 - 89540 vil
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 7
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 7
`
`
`
`Fig.
`
`5A is illustrative of a general process for
`
`performing a representative multi-dimensional pipeline
`
`operation in the form of a seismic migration imaging
`function utilizing the parallelism available in the
`
`utilization of the adaptive processing techniques of the
`
`present
`
`invention;
`
`Fig.
`
`5B is a follow-on illustration of the
`
`computation phases employed in implementing the
`exemplary seismic migration imaging function of the
`
`preceding figure;
`
`Fig.
`
`6A is a simplified logic flowchart for a
`
`particular seismic migration imaging application
`illustrative of the parallelism provided in the use of
`an adaptive processor-based computing system;
`Fig 6B illustrates the computational process which
`may be employed by a microprocessor in the execution of
`
`the seismic imaging application of the preceding figure;
`Fig.
`6C illustrates the first step in the
`computational process which may be employed by an
`adaptive processor in the execution of the seismic
`
`imaging application of Fig.
`
`6A in which a first shot
`
`(S1)
`
`is started;
`
`Fig.
`
`6D illustrates the second step in the same
`
`computational process for the execution of the seismic
`
`imaging application of Fig.
`
`6A in which a second shot
`
`(S2)
`
`is started;
`
`6E illustrates the third step in the same
`Fig.
`computational process for the execution of the seismic
`imaging application of Fig.
`6A in which the operation on
`the first and second shots is continued through compute;
`
`\\\CS - 080404/000018 - 89540 vi
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 8
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 8
`
`
`
`Fig. 6F illustrates the fourth step in the same
`
`computational process showing the subsequent operation
`
`on shots $1 and S2;
`
`Fig.
`
`6G illustrates the fifth step in the same
`
`computational process followed by the continued downward
`
`propagation of shots S1 and S2 over all of the depth
`
`slices;
`
`Fig.
`
`7A illustrates a process for performing a
`
`representative systolic wavefront operation in the form
`
`of a reservoir simulation function also utilizing the
`
`parallelism available in the utilization of the adaptive
`
`processing techniques of the present
`
`invention;
`
`Fig.
`
`7B illustrates the general computation of
`
`fluid flow properties in the reservoir simulation of the
`
`preceding figure which are communicated to neighboring
`
`cells;
`
`Fig.
`
`7C illustrates the creation of a systolic wall
`
`of computation at Time Set
`
`1 which has been started for
`
`a vertical wall of cells and in which communication of
`
`values between adjacent
`
`rows in the vertical wall can
`
`occur without storing values to memory;
`
`Fig.
`
`7D is a follow on illustration of the creation
`
`of a systolic wall of computation at Time Set
`
`1 and Time
`
`Set
`
`2 showing how a second vertical wall of cells is
`
`started after the computation for cells in the
`
`corresponding row of the first wall has been completed;
`
`Fig.
`
`8A illustrates yet another process for
`
`performing a representative systolic wavefront operation
`
`in the form of the systolic processing of bioinformatics
`
`also utilizing the parallelism available in the
`
`\\\CS - 080404/000018 - 89540 vi
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 9
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 9
`
`
`
`utilization of the adaptive processing techniques of the
`present
`invention;
`
`8B illustrates a systolic wavefront processing
`Fig.
`operation which further incorporates a speculative
`processing strategy based upon an evaluation of the rate
`
`of change of XB;
`
`8C is a further illustration of the systolic
`Fig.
`wavefront processing operation of the preceding figure
`incorporating speculative processing;
`Fig.
`9A illustrates still another process for
`performing a representative systolic wavefront operation
`in the form of structure codes calculating polynomials
`at grid intersections, again utilizing the parallelism
`available in the utilization of the adaptive processing
`techniques of the present
`invention;
`
`9B illustrates the computation start for a
`Fig.
`vertical wall of grid points at Time Set
`1 for a
`
`polynomial evaluation performed on grid intersections
`wherein calculations between rows are done in a
`
`stochastic fashion using values from a previous row; and
`Fig.
`9C is a further illustration of the polynomial
`evaluation performed on grid intersections of the
`
`preceding figure wherein a second wall
`is started after
`the cells in the corresponding row of the first wall
`have been completed.
`
`DESCRIPTION OF A REPRESENTATIVE EMBODIMENT
`
`This application incorporates by reference the
`entire disclosure of Caliga, D. et al. “Delivering
`Acceleration:
`“The Potential for Increased HPC
`Application Performance Using Reconfigurable Logic”,
`SC2001, November 2001, ACM 1-58113-293-X/01/0011.
`
`\\\cs - 080404/000018 - 89540 vl
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 10
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 10
`
`
`
`10
`
`With reference now to Fig. 1,
`
`a simplified
`
`functional block diagram of typical clustered inter-
`
`processor communications path in a conventional multi-
`
`processor computing system 100 is shown.
`
`The computer
`
`system comprises a number of memory and input/output
`(“I/O” controller integrated circuits (*ICs”) 102,
`
`through 102y,
`(e.g. “North Bridge”) 102 such as the
`P4X333/P4X400 devices available from VIA Technologies,
`
`Inc.;
`
`the M1647 device available from Acer Labs,
`
`Inc.
`
`and the 824430X device available from Intel Corporation.
`The North Bridge IC 102 is coupled by means of a Front
`
`to one or more microprocessors 104 9
`(“FSB”)
`Side Bus
`though 10453 and 104yo through 104y; such as one of the
`Pentium® series of processors also available from Intel
`
`Corporation.
`
`through 102y are coupled
`The North Bridge ICs 102)
`to respective blocks of memory 106)
`through 106y as well
`
`through
`as to a corresponding I/O bridge element 108,
`108y.
`A network interface card (“NIC”) 110)
`through 210,
`couples the I/O bus of the respective I/O bridge 108,
`through 108, to a cluster bus coupled to a common
`clustering hub (or Ethernet Switch) 112.
`
`Since typically a maximum of four microprocessors
`104, each with two or four functional units, can reside
`
`on a single Front Side Bus, any communication to more
`
`than four must pass over the Front Side Bus,
`
`inter-
`
`input/output
`(“I/O”) bus, cluster
`bridge bus,
`(e.g. an Ethernet clustering hub 112) and
`interconnect
`then back again to the receiving processor 104.
`The I/O
`bus is typically an order of magnitude lower in
`bandwidth than the Front Side Bus, which means that any
`
`\\\CS - 080404/000018 - 89540 v1
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 11
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 11
`
`
`
`11
`
`processing involving more than the four processors 104
`
`will be significantly throttled by the loose coupling
`
`caused by the interconnect. All of this is eliminated
`
`with a reconfigurable processor having hundreds or
`
`thousands of functional units per processor.
`
`With reference additionally now to Fig. 2,
`
`a
`
`functional block diagram of an adaptive processor 200
`communications path for implementing the technique of
`the present
`invention is shown.
`The adaptive processor
`200 includes an adaptive processor chip 202 incorporates
`a large number of functional units (“FU”) 204
`
`interconnected by reconfigurable routing resources.
`adaptive processor chip 202 is coupled to a memory
`element 206 as well as an interconnect 208 and a number
`
`The
`
`of additional adaptive processor chips 210.
`
`As shown, each adaptive processor chip 202 can
`contain thousands of functional units 204 dedicated to
`
`the particular problem at hand.
`
`Interconnect between
`
`these functional units is created by reconfigurable
`routing resources inside each chip 202.
`As a result,
`the functional units 204 can share or exchange data at
`much higher data rates and lower latencies than a
`
`the
`In addition,
`(Fig. 1).
`standard microprocessor 104
`adaptive processor chips 202 can connect directly to the
`inter-processor interconnect 208 and do not require the
`data to be passed through multiple chips in a chipset
`in
`order to communicate. This is because the adaptive
`processor can implement whatever kind of interface is
`
`needed to accomplish this connection.
`
`a graph
`With reference additionally now to Fig. 3A,
`of the actual performance improvement versus the number
`
`\\\CS - 086404/000018 - 89540 vl
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 12
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 12
`
`
`
`12
`
`of processors utilized in a conventional multi-processor
`
`computing system 100 (Fig.
`
`1)
`
`is shown.
`
`In this figure,
`
`the deviation from perfect scalability of a particular
`
`application is illustrated for such a system.
`
`With reference additionally now to Fig. 3B,
`
`a
`
`corresponding graph of the actual performance
`
`improvement versus the number of processors utilized in
`
`an adaptive processor-based computing system 200 (Fig.
`
`2)
`
`is shown.
`
`In this figure,
`
`the performance
`
`improvement provided with an adaptive processor-based
`
`computing system 200 over that of a conventional multi-
`
`processor computing system 100 is illustrated.
`
`With reference additionally now to Fig. 4A,
`a
`simplified logic flowchart
`is provided illustrating a
`
`conventional sequential processing operation 400 in
`which nested Loops A (first loop 402) and B
`(second loop
`404) are alternately active on different phases of the
`
`process.
`
`As shown,
`
`the standard implementation of
`
`applications that have a set of nested loops 402,404 is
`
`to complete the processing of the first loop 402 before
`
`proceeding to the second loop 404.
`
`The problem inherent
`
`in this approach, particularly when utilized in
`
`conjunction with field programmable gate arrays
`(“FPGAS”)
`is that all of the logic that has been
`
`instantiated is not being completely utilized.
`
`With reference additionally now to Fig. 4B,
`
`a
`
`comparative, simplified logic flowchart is shown
`
`illustrating a multi-dimensional process 410 in
`
`accordance with the technique of the present
`
`invention.
`
`The multi-dimensional process 410 is effectuated such
`
`\\\CS - 080404/000018 - 89540 vi
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 13
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 13
`
`
`
`13
`
`that multiple dimensions of data are processed by both
`Loops A (first loop 412) and B (second loop 414)
`such
`that the computing system logic is operative on every
`clock cycle.
`
`In contrast to the sequential processing operation
`400 (Fig.
`4A)
`the solution to the problem of most
`
`effectively utilizing available resources is to have an
`
`application evaluate a problem in a data flow sense.
`
`it will “pass” a subsequent dimension of a
`That is,
`given problem through the first loop 412 of logic
`concurrently with the previous dimension of data being
`processed through the second loop 414.
`In practice,
`a
`“dimension” of data can be: multiple vectors of a
`problem, multiple planes of a problem, multiple time
`steps in a problem and so forth.
`
`a
`5A,
`With reference additionally now to Fig.
`general process for performing a representative multi-
`dimensional pipeline operation is shown in the form of a
`seismic migration imaging function 500.
`The process 500
`can be adapted to utilize the parallelism available in
`the utilization of the adaptive processing techniques of
`the present
`invention in the form of a multi-adaptive
`processor
`(MAP™,
`a trademark of SRC Computers,
`Inc.,
`assignee of the present
`invention) STEP3d routine 502.
`
`The MAP STEP3d routine 502 is operation to utilize
`velocity data 504, source data 506 and receiver data 508
`to produce a resultant image 510 as will be more fully
`described hereinafter.
`
`the MAP
`With reference additionally now to Fig. 5B,
`STEP3d routine 502 of the preceding figure is shown in
`
`\\\CS - 080404/000018 - 89540 vi
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 14
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 14
`
`
`
`14
`
`the various computational phases of: MAPTRI_x 520,
`
`MAPTRI_y 522, MAPTRI_ d+ 524 and MAPTRI d- 526.
`
`With reference additionally now to Fig. 6A,
`
`a
`
`Simplified logic flowchart for a particular seismic
`
`migration imaging application 600 is shown.
`
`The seismic
`
`migration imaging application 600 is illustrative of the
`
`parallelism provided in the use of an adaptive
`
`processor-based computing system 200 such as that shown
`
`in Fig. 2.
`
`The representative application 600
`
`demonstrates a nested loop parallelism in the tri-
`
`diagonal solver and the same logic can be implemented
`
`for the multiple tri-diagonal solvers in the x, y, d+
`
`and d- directions.
`
`The computational phases of:
`
`MAPTRI_x 520, MAPTRI_y 522, MAPTRI_ d+ 524 and MAPTRI_ d-
`526 are again illustrated.
`
`With reference additionally now to Fig. 6B,
`
`a
`
`computational process 610 is shown which may be employed
`by a microprocessor
`(“mP”)
`in the execution of the
`
`seismic imaging application 600 of the preceding figure.
`The process 610 includes the step 612 of reading the
`source field [S(Z))] and receiver field [R(Z,.)] as well
`
`as the velocity field [V(Z))] at step 614. At step 616
`values are computed for S(Znz),R(Znz) which step is
`followed by the phases MAPTRI_x 520 and MAPTRI_ y 522.
`At step 618,
`the image of Zi;2 is computed. This is
`followed by the phases MAPTRI_d+ 524 and MAPTRI_ d- 526
`to produce the resultant image Z at step 620.
`The
`
`process 610 loops over the depth slices as indicated by
`reference number 622 and loops over the shots as
`
`indicated by reference number 624.
`
`\\\CS - 080404/000018 - 89540 v1
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 15
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 15
`
`
`
`15
`
`the
`With reference additionally now to Fig. 6C,
`first step in a computational process 650 in accordance
`
`with the technique of the present
`invention is shown in
`which a first shot
`(S1)
`is started.
`The process 650 may
`be employed by an adaptive processor (e.g.
`a MAP™
`adaptive processor) as disclosed herein in the execution
`
`As
`of the seismic imaging application 600 of Fig. 6A.
`indicated by the shaded block,
`the phase MAPTRI_x 520 is
`active.
`
`the
`6D,
`With reference additionally now to Fig.
`second step in the computational process 650 is shown at
`a point at which a second shot
`(S82)
`is started. Again,
`as indicated by the shaded blocks,
`the phase MAPTRI_x
`520 is active for S2,
`the phase MAPTRI_y 522 is active
`for Sl and image 21/2 has been produced at step 618.
`As
`shown, adaptive processors in accordance with the
`disclosure of the present
`invention support computation
`pipelining in multiple dimensions and the parallelism in
`4 and shots is shown at step 612.
`the
`With reference additionally now to Fig. 6E,
`third step in the computational process 650 is shown in
`which the operation on the first and second shots is
`
`indicated by the shaded
`As
`continued through compute.
`blocks,
`the phase MAPTRI_d+ 524 is active for 81,
`the
`phase MAPTRI_y 522 is active for $2 and image Zi;2 has
`been produced at step 618.
`the
`With reference additionally now to Fig. 6F,
`fourth step in the computational process 650 is shown
`illustrating the subsequent operation on shots $1 and
`S2.
`The phase MAPTRI_ d+ 524 is active for S2,
`the phase
`
`\\\CS - 080404/000018 - 89540 vi
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 16
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 16
`
`
`
`16
`
`MAPTRI_d- 526 is active for S1 and image Z has been
`
`produced at step 620.
`
`With reference additionally now to Fig.
`
`6G,
`
`the
`
`fifth step in the computational process 650 is shown as
`
`followed by the continued downward propagation of shots
`
`Sl and S2 over all of the depth slices.
`The phase
`MAPTRI_x 520 is active for S1,
`the phase MAPTRI_d- 526
`is active for S2 and image Z has been produced at step
`620.
`
`With reference additionally now to Fig.
`
`7A,
`
`a
`
`process 700 for performing a representative systolic
`wavefront operation in the form of a reservoir
`
`simulation function is shown which utilizes the
`
`parallelism available in the adaptive processing
`techniques of
`the present
`invention.
`The process 700
`includes a “k”
`loop 702, “j" loop 704 and “i” loop 706
`as shown.
`
`the
`With reference additionally now to Fig. 7B,
`general computation of fluid flow properties in the
`reservoir simulation process 700 of the preceding figure
`are illustrated as values are communicated between a
`
`The group of
`group of neighboring cells 710.
`in the simplified
`neighboring cells 710 comprises,
`illustration shown, first,
`second and third walls of
`
`Each of the walls
`cells 712, 714 and 716 respectively.
`of cells includes a corresponding number of first,
`second,
`third and fourth rows 718, 720, 722 and 724
`
`respectively.
`
`the computation of fluid flow properties
`As shown,
`are communicated to neighboring cells 710 and,
`importantly,
`this computation can be scheduled to
`
`\\\cs - 080404/000018 - 89540 vi
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 17
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 17
`
`
`
`17
`
`eliminate the need for data storage.
`
`In accordance with
`
`the technique of the present
`
`invention,
`
`a set of cells
`
`can reside in an adaptive processor and the pipeline of
`computation can extend across multiple adaptive
`
`processors. Communication overhead between multiple
`adaptive processors may be advantageously minimized
`through the use of MAP™ adaptive processor chain ports
`as disclosed in U.S. Patent No. 6,339,819 issued on
`
`January 15, 2002 for: “Multiprocessor With Each
`
`Processor Element Accessing Operands in Loaded Input
`Buffer and Forwarding Results to FIFO Output Buffer’,
`assigned to SRC Computers,
`Inc., assignee of the present
`invention,
`the disclosure of which is herein
`
`specifically incorporated by this reference.
`the
`With reference additionally now to Fig. 7C,
`creation of a systolic wall 712 of computation at Time
`Set
`1 is shown.
`The systolic wall 712 has been started
`
`for a vertical wall of cells and communication of values
`
`rows 718 through 724 in the vertical
`between adjacent
`wall can occur without storing values to memory.
`a
`With reference additionally now to Fig.
`7D,
`follow on illustration of the creation of a systolic
`wall 712 of computation at Time Set
`1 and a second
`
`In operation,
`is shown.
`2
`systolic wall 714 at Time Set
`a second vertical wall of cells is started after the
`
`computation for cells in the corresponding row of the
`first wall has been completed.
`Thus,
`for example, at
`time to,
`the first row 718 of systolic wall 712 is
`completed and the results passed to the first row 718 of
`
`the second systolic wall 714. At
`
`time ti,
`
`the second
`
`row 720 of the first systolic wall 712 and the first row
`
`\\ACS - 080404/000018 - 89540 vi
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 18
`
`Petitioner Microsoft Corporation - Ex. 1006, p. 18
`
`
`
`18
`
`718 of the second systolic wall 714 are computed.
`Thereafter, at
`time tz,
`the third row 722 of the first
`
`systolic wall 712 and the second row 720 of the second
`
`The process continues
`systolic wall 714 are computed.
`in this manner for all rows and all walls.
`
`With reference additionally now to Fig. 8A, yet
`another process 800 for performing a representative
`systolic wavefront operation is shown.
`The process 800
`is in the form of the systolic