`(10) Patent No.:
`a2) United States Patent
`Huppenthaletal.
`(45) Date of Patent:
`*Nov. 17, 2009
`
`
`US007620800B2
`
`(54) MULTI-ADAPTIVE PROCESSING SYSTEMS
`AND TECHNIQUES FOR ENHANCING
`PARALLELISM AND PERFORMANCE OF
`COMPUTATIONAL FUNCTIONS
`
`(75)
`
`Inventors: Jon M. Huppenthal, Colorado Springs,
`CO (US); David E. Caliga, Colorado
`Springs, CO (US)
`
`(73) Assignee: SRC Computers, Inc., Colorado
`Springs, CO (US)
`
`(*) Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 95 days.
`
`This patent is subject to a terminal dis-
`claimer.
`
`(21) Appl. No.: 11/733,064
`
`(22)
`
`Filed:
`
`Apr. 9, 2007
`
`(65)
`
`Prior Publication Data
`US2007/0204131 Al
`Aug. 30, 2007
`
`Related U.S. Application Data
`(63) Continuation of application No. 10/285,318, filed on
`Oct. 31, 2002, now Pat. No. 7,225,324.
`
`(51)
`
`Int. CL.
`GO6F 15/82
`(2006.01)
`712/226
`(52) US.CL
`(58) Field of ClassificationSearch— 712/226
`712/15, 19, 215
`See application file for complete search history.
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`4.727.503 A
`4.763.294 A
`4,872,133 A
`4,962,381 A
`
`21988 McWhirter
`8/1988 Fong
`10/1989 Leeland
`10/1990 Helbig, Sr.
`
`5,020,059 A
`5,072,371 A
`5,230,057 A
`5,274,832 A
`5,471,627 A
`5,477,221 A
`
`5/1991 Gorin etal.
`12/1991 Benneretal.
`7/1993 Shidoet al.
`12/1993 Khan
`11/1995 Meanset al.
`12/1995 Chang et al.
`
`(Continued)
`FOREIGN PATENT DOCUMENTS
`
`JP
`
`59-206972
`
`11/1984
`
`.
`(Continued)
`OTHER PUBLICATIONS
`
`Gaudiot, Jean-Luc, Data-Driven Multicomputers in Digital Signal
`Processing, 1987, IEEE, Proceedings of the IEEE,vol. 75,No.9, pp.
`1220-1234.*
`
`(Continued)
`Primary Examiner—Eric Coleman
`(74) Attorney, Agent, or Firm—Michael C. Martensen;
`William J. Kubida; Hogan & Hartson LLP
`(57)
`ABSTRACT
`
`:
`:
`.
`.
`for
`Multi-adaptive processing systems and techniques
`enhancing parallelism and performance of computational
`functions are disclosed which can be employed in a myriad of
`applications including multi-dimensional pipeline computa-
`tions for seismic applications, search algorithms, information
`security, chemical and biological applications, filtering and
`the like as well as for systolic wavefront computations for
`fluid flow and structures analysis, bioinformatics etc. Some
`applications may also employ both the multi-dimensional
`pipeline and systolic wavefront methodologies disclosed.
`
`52 Claims, 20 Drawing Sheets
`
`
`
`ADAPTIVE PROCESSOR CHIP
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 1
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 1
`
`
`
`US 7,620,800 B2
`
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`5,509,134 A
`5,570,040 A
`5,640,586 A
`5,715,453 A
`5,737,766 A
`5,784,108 A
`5,802,290 A
`5,892,962 A
`5,903,771 A
`5,915,123 A
`5,953,502 A
`5,956,518 A
`5,966,534 A
`6,023,755 A
`6,052,773 A
`6,061,706 A
`6,076,152 A
`6,128,663 A
`6,192,439 BI
`6,215,898 Bl
`6,226,776 Bl
`6,289,440 Bl
`6,385,757 Bl
`6,704,816 Bl
`6,721,884 Bl
`
`4/1996 Fandrichetal.
`10/1996 Lytle etal.
`6/1997 Pechaneket al.
`2/1998 Stewart
`4/1998 Tan
`7/1998 Skaletzky et al.
`9/1998 Casselman
`4/1999 Cloutier
`5/1999 Sgroet al.
`6/1999 Mirsky etal.
`9/1999 Helbig,Sr.
`9/1999 DeHonetal.
`10/1999 Cookeetal.
`2/2000 Casselman
`4/2000 DeHonetal.
`5/2000 Gaiet al.
`6/2000 Huppenthal etal.
`10/2000 Thomas
`2/2001 Grunewaldet al.
`4/2001 Woodfill et al.
`5/2001 Panchuletal.
`9/2001 Casselman
`5/2002 Guptaetal.
`3/2004 Burke
`4/2004 De Oliveira Kastrup Pereiraet al.
`
`FOREIGN PATENT DOCUMENTS
`
`JP
`
`63-086079
`
`4/1988
`
`OTHER PUBLICATIONS
`
`Dennis,J. B., Data Flow Supercomputers, Nov. 1980, IEEE, Com-
`puter, pp. 48-56.*
`Qunnn M.J., et al., Data-Parallel Programming on Multicomputers,
`Sep. 1990, IEEE, pp. 69-76.*
`Trevleaven, P.C., et al., Data-Driven and Demand-Driven Computer
`Architecture, 1982, ACM, Computiing Surveys vol. 14, No. 1, pp.
`93-143.*
`Webster. M., Webster’s Ninth New Collegiate Dictionary, 1985, Mer-
`riam-Webster pub., p. 627.*
`Miyamori, Takashi, “REMARC: Reconfigurable Multimedia Array
`Coprocessor”, IEICE Transactions on Information and Systems,
`Information & Systems Society, Tokyo, JP, vol. E82-D, No. 2, Feb.
`1999, pp. 389-397, XP000821922.
`Gross Thomas,et al., “Compilation for a High-performanceSystolic
`Array”, Sigplan Notices USA, vol. 21, No. 7, Jul. 1986, pp. 27-38,
`XP0024 18625.
`Rauchwerger, Lawrence,et al., “The LRPD Test: Speculative Run-
`Time Parallelization of Loops with Privatization and Reduction
`Parallelization”, IEEE Transactions on Parallel and Distributed Sys-
`tems, IEEE Service Center, Los Alamitos, CA, vol. 10, No. 2, Feb.
`1999, pp. 160-180, XP0009083 18.
`Arnold Jeffrey M. et al., “The Splash 2 Processor and Applications”,
`Computer Design: VLSI in Computers and Processors, 1993, ICCD
`°93 Proceedings, 1993 IEEE International Conference on Cam-
`bridge, MA, Oct. 3-6, 1993, Los Alamitos, CA, IEEE Comput. Soc.,
`Oct. 3, 1993, pp. 482-485, XP010134571.
`Hwang,Kai, “Computer Architecture and Parallel Processing”, Data
`Flow Computers and VLSI Computations, 1985, McGraw Hill,
`Chapter 10, pp. 732-807, XP-002418655.
`Hartenstein, Reiner W., et al. “A Synthesis System for Bus-based
`Wavefront Array Architectures”, Proceedings, International Confer-
`ence on Application-Specific Systems, Architectures and Processors,
`1996, pp. 274-283, XP002132819.
`Alexander, Thomas,et al. “A Reconfigurable Approach To A Systolic
`Sorting Architecture”, ISCAS 89, May 8, 1989, pp. 1178-1182,
`XP010084477.
`Wu,Youfeng,etal. “Better Exploration ofRegion-Level Value Local-
`ity with Integrated Computation Reuse and Value Prediction”, Pro-
`ceedings of the 28th International Symposium on Computer Archi-
`tecture,
`ISCA 2001, Goteborg, Sweden, Jun. 30-Jul. 4, 2001,
`
`International Symposium on Computer Architecture, (ISCA), Los
`Alamitos, CA, IEEE Comp. Soc, US, Jun. 30, 2001, pp. 93-103,
`XP010552866.
`Babb, Jonathan,et al., “Parallelizing applications into silicon”, ©
`1999 TEEE.
`Deshpande, Deepali, et al., “Hybrid data/configuration caching for
`striped FPGAs” © 1999 TEEE.
`Purna, Karthikeya,et al., “Temporal partitioning and scheduling data
`flow graphsfor reconfigurable computers”, © 1999 IEEE, Publ. No,
`0018-9340/99, pp. 579-590.
`Gibbs, W. Wayt, “Blitzing bits”, © 1999 Scientific American Pre-
`sents, pp. 57-61.
`Gonzalez, Ricardo, “Configurable and extensible processors change
`system design”, Aug. 15-17, 1999, Hot Chips 11 Tutorials, pp. 135-
`146.
`Graham,Paul,et al., “FPGA-based sonar processing”, © 1998 ACM
`0-89791-978-5/98, pp. 201-208.
`Hasebe, A., et al., “Architecture of SIPS,a real time image processing
`system,” © 1988 IEEE, Publ. No. CH2603-9/88/0000/0621, pp. 621-
`630.
`Hammond,Lance, etal., “The Stanford Hydra CMP”, Aug. 15-17,
`1999 H ot Chips 11 Tutorials, pp. 23-31.
`Jean, Jack, et al., “Dynamic reconfiguration to support concurrent
`applications”, © 1999 IEEE, Publ. No. 0018-9340/99, pp. 591-602.
`Kastrup, Bernardo, et al., “Concise: a compiler-driven CPLD-based.
`instruction set accelerator”, © 1999 IEFE.
`Motomura, Masato,et al., “An embedded DRAM-FPGAchip with
`instantaneous logic reconfiguration”, © 1998 IEEE, Publ. No.
`0-8186-8900-5/98, pp. 264-266.
`McConnell, Ray, “Massively parallel computing on the Fuzion chip”,
`Aug. 15-17, 1999, Hot Chips 11 Tutorials, pp. 83-94.
`McShane,Erik, et al., “Functionally integrated systems on a chip:
`technologies, architectures, CAD tools, and. applications”, © 1998
`IEEE,Publ. No. 8-8186-8424-0/98, pp. 67-75.
`Rupp, Charley, et al., “The NAPA adaptive processing architecture”,
`© 1998 the Authors, pp. 1-10.
`Saito, Osamu, et al., “A 1M synapse self learnin g digital neural
`network chip”, ©0 1998 IEEE, Publ. No. 0-7803-4344/1/98, pp.
`94-95,
`
`Schott, Brian, et al., “Architectures for system-level applications of
`adaptive computing”, © 1999 IEEE.
`Mencer, Oskar, et al., “PAM-Blox: High Performance FPGA Design
`for Adaptive Computing”, © 1998 IEEE, Conference Paper, INSPEC
`Abstract Nos. B9611-1265B-044, C9811-5210-009.
`Miyamori, Takashi,et al., “A quantitative analysis of reconfigurable
`coprocessors for multimedia applications”, © 1998 IEEE, Confer-
`ence Paper, INSPEC Abstract Nos. B9811-1265F-011, C 9811-53 10-
`010.
`
`Agarwal, A., et al., “The Raw Compiler Project”, pp. 1-12, http://cag-
`www.lcs.mit.edu/raw, Proceedings of the Second SUIF Compiler
`Workshop, Aug. 21-23, 1997.
`Albaharna, Osama,etal., “On the viability of FPGA-basedintegrated.
`coprocessors”, © 1996 IEEE, Publ. No. 0-8186-7548-9/96, pp. 206-
`215.
`
`Amerson, Rick, et al., “Teramac—Configurable Custom Comput-
`ing”, © 1995 IEEE, Publ. No. 0-8186-7086-X/95, pp. 32-38.
`Barthel, Dominique Aug. 25-26, 1997, “PVP a Parallel Video
`coProcessor”, Hot Chips IX, pp. 203-210.
`Bertin, Patrice, et al., “Programmable active memories: a perfor-
`mance assessment”, © 1993 Massachusetts Institute of Technology,
`pp. 88-102.
`Bittner, Ray, et al., “Computing kernels implemented with a worm-
`hole RTR CCM”, © 1997 IEEE, Publ. No. 0-8186-8159-4/97, pp.
`98-105.
`
`Buell, D., et al. “Splash 2: FPGAs in a Custom Computing
`Machine—Chapter 1—Custom Computing Machines: An Introduc-
`tion”, pp. 1-11, http:/Awww.computer.org/espress/catalog/bp074 13/
`spls-ch1 html (originally believed published in J. of Supercomput-
`ing, vol. IX, 1995, pp. 219-230.
`Casselman, Steven, “Virtual Computing and The Virtual Com-
`puter”,© 1993 IEEE, Publ. No. 0-8186-3890-7193, pp. 43-48.
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 2
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 2
`
`
`
`US 7,620,800 B2
`Page 3
`
`Chan, Pak,et al., “Architectural tradeoffs in field-programmable-
`device-based computing systems”, © 1993 IEEE, Publ, No. 0-8186-
`3890-7/93, pp. 152-161.
`Clark, David, et al., “Supporting FPGA microprocessors through
`retargetable software tools”, © 1996 IEEE, Publ. No. 0-8186-7548-
`9/96, pp. 195-103.
`Cuccaro, Steven, et al., “The CM-2X: a hybrid CM-2/Xilink proto-
`type”, © 1993 IEEE, Publ. No. 0-8186-3890-7/93, pp. 121-130.
`Culbertson, W. Bruce, et al., “Exploring architectures for volume
`visualization on the Teramac custom computer’, © 1996 IEEE, Publ.
`No. 0-8186-7548-9/96, pp. 80-88.
`Culbertson, W. Bruce,et al., “Defect tolerance on the Teramac cus-
`tom computer”, © 1997 TEEE, Publ. No. 0-8186-8159-4/97, pp.
`116-123.
`Dehon, Andre, DPGA-Coupled microprocessors: commodity IC for
`the early 21° century’,© 1994 IEEE, Publ. No. 0-8186-5490-2/94,
`pp. 31-39.
`Dehon,A., et al., “Matrix a Reconfigurable Computing Device with
`Configurable Instruction Distribution”, Hot Chips IX, Aug. 25-26,
`1997, Stanford, California, MIT Artificial Intelligence Laboratory.
`Dhaussy, Philippe, et al., “Global control synthesis for an MIMD/
`FPGA machine”, © 1994 IEEE, Publ. No. 0-8186-5490-2/94, pp.
`72-81.
`Elliott, Duncan, et al., “Computational Ram: a memory-SIMD
`hybrid andits application to DSP”,© 1992 TEEE, Publ. No. 0-7803-
`0246-X/92, pp. 30.6.1-30.6.4.
`Fortes, Jose, et al., “Systolic arrays, a survey of seven projects”, ©
`1987 IEEE, Publ. No. 0018-9162/87/0700-0091, pp. 91-103.
`Gokhale, M., et al., “Processing in Memory: The Terasys Massively
`Parallel PIM Array” © Apr. 1995, IEEE, pp. 23-31.
`Gunther, Bernard,et al., “Assessing Document Relevance with Run-
`Time Reconfigurable Machines”, © 1996 IEEE, Publ. No. 0-8186-
`7548-9/96, pp. 10-17.
`Hagiwara, Hiroshi, et al., “A dynamically microprogrammable com-
`puter with low-level parallelism’, © 1980 IEEE, Publ. No. 0018-
`9340/80/07000-0577, pp. 577-594.
`Hartenstein, R. W., et al. “A General Approach in System Design
`Integrating Reconfigurable Accelerators,” http://xputers.informatik.
`uni-kl.de/papers/paper026-1-html, IEEE 1996 Conference, Austin,
`TX, Oct. 9-11, 1996.
`Hartenstein, Reiner, et al., “A reconfigurable data-driven ALU for
`Xputers”, © 1994 IEEE, Publ. No. 0-8186-5490-2/94, pp. 139-146.
`Hauser, John,et al.: “GARP: a MIPSprocessor with a reconfigurable
`co-processor”, © 1997 IEEE, Publ. No. 0-08186-8159-4/97, pp.
`12-21.
`hypercube,
`“A microprocessor-based
`al.,
`et
`John,
`Hayes,
`supercomputer”, © 1986 IEEE, Publ. No. 0272-1732/86/1000-0006,
`pp. 6-17.
`Herpel, H. -J., et al., “A Reconfigurable Computer for Embedded
`Control Applications”, © 1993 IEEE, Publ. No. 0-8186-3890-7/93,
`pp. 111-120.
`Hogl, H., et al., “Enable++: A second generation FPGA processor’,
`© 1995 IEEE, Publ. No. 0-8186-7086-X/95, pp. 45-53.
`King, William, et al., “Using MORRPH in an industrial machine
`vision system”. © 1996 IEEE, Publ. No. 08186-7548-9/96, pp.
`18-26.
`Manohar, Swaminathan, et al., “A pragmatic approach to systolic
`design”, © 1988 IEEE, Publ. No. CH2603-9/88/0000/0463, pp. 463-
`472.
`Mauduit, Nicolas,et al., “Lneuro 1.0: a piece of hardware LEGO for
`building neural network systems,” © 1992 IEEE, Publ. No. 1045-
`9227/92, pp. 414-422.
`Mirsky, Ethan A., “Coarse-Grain Reconfigurable Computing”, Mas-
`sachusetts Institute of Technology, Jun. 1996.
`Mirsky, Ethan, et al., “Matrix: A Reconfigurable Computing Archi-
`tecture with Configurable Instruction Distribution and Deployable
`Resources”, © 1996 IEEE, Publ. No. 0-8186-7548-9/96, pp. 157-
`166.
`Morley, Robert E., Jr., et al., “A Massively Parallel Systolic Array
`Processor System”,© 1988 IEEE, Publ. No. CH2603-9/88/0000/
`0217, pp. 217-225.
`Patterson, David,et al., “A case for Intelligent DRAM: IRAM”, Hot
`Chips VII, Aug. 19-20, 1996, pp. 75-94.
`
`Peterson, Janes, et al., “Scheduling and partitloning ANSI-C pro-
`grams onto multi-FPGA CCMarchitectures”, © 1996 IEEE, Publ.
`No. 0-8186-7548-9/96, pp. 178-187.
`Schmit, Herman, “Incremental reconfiguration for pipelined appli-
`cations,” © 1997 IEEE, Publ. No. 0-8186-8159-4/97, pp. 47-55.
`Sitkoff, Nathan, et al., “Implementing a Genetic Algorithm on a
`Parallel Custom Computing Machine”, Publ. No. 0-8186-7086-X/
`95, pp. 180-187.
`Stone, Harold, “‘A logic-in-memory computer”, © 1970 IEEE, IEEE
`Transactions on Computers, pp. 73-78, Jan. 1990.
`Tangen, Uwe, et al., “A parallel hardware evolvable computer
`POLYPextended abstract”, © 1997 IEEE, Publ. No. 0-8186-8159/
`4/97, pp. 238-239.
`Thornburg, Mike,et al., “Transformable Computers”, © 1994 IEEE,
`Publ. No. 0-8186-5602-6/94, pp. 674-679.
`Tomita, Shinji, et al., “A computer low-level parallelism QA-2”, ©
`1986 IEEE, Publ. No. 0-0384-7495/86/0000/0280, pp. 280-289.
`Trimberger, Steve,et al., “A time-multiplexed FPGA”, © 1997 IEEE,
`Publ. No. 0-8186-8159-4/97, pp. 22-28.
`Ueda, Hirotada,et al., “A multiprocessor system utilizing enhanced
`DSP’s for Image processing”, © 1988 IEEE, Publ. No. CH2603-9/
`88/0000/0611, pp. 611-620.
`Villasenor, John,et al., “Configurable computing”, © 1997 Scientific
`American, Jun. 1997.
`Wang, Quiang, et al., “Automated field-programmable compute
`accelerator design using partial evaluation”, © 1997 IEEE, Publ. No.
`0-8186-8159-4/97, pp. 145-154.
`W.H. Manglone-Smith and B.L. Hutchings. Configurable comput-
`ing: The Road Ahead. In Proceedings of the Reconfigurable Archi-
`tectures Workshop (RAW’97), pp. 81-96, 1997.
`Wirthlin, Michael, et al., “The Nano processor: a low resource
`reconfigurable processor”, © 1994 IEEE, Publ. No. 0-8186-5490-2/
`94, pp. 23-30.
`Wirthlin, Michael, et al., “A dynamic instruction set computer’, ©
`1995 IEEE, Publ. No. 0-8186-7086-X/95, pp. 99-107.
`Wittig, Ralph, et al., “One Chip: An FPGA processor with
`reconfigurable logic”, © 1996 IEEE, Publ. No. 0-8186-7548-9/96,
`pp. 126-135.
`Yamauchi, Tsukasa,et al., “SOP: A reconfigurable massively parallel
`system andits control-data flow based compiling method”, © 1996
`IEEE,Publ. No. 0-8186-7548-9/96, pp. 148-156.
`“Information Brief’, PCI Bus Technology, © IBM Personal Com-
`puter Company, 1997, pp. 1-3.
`Yun, Hyun-Kyu and Silverman,H. F.; “A distributed memory MIMD
`multi-computer with reconfigurable custom computing capabilities”,
`Brown University, Dec. 10-13, 1997, pp. 7-13.
`Hoover, Chris and Hart, David; “San Diego Supercomputer Center,
`Timelogic and Sun Validate Ultra-Fast Hidden Markov Model Analy-
`sis-One DeCypher-accelerated Sun Fire 6800 beats 2,600 CPUsrun-
`ning Linux-”, San Diego Supercomputer Center, http://www.sdsc.
`edu/Press/02/050802__markovmodel html, May 8, 2002, pp. 1-3.
`Caliga, David and Barker, David Peter, “Delivering Acceleration:
`The Potential for Increased HPC Application Performance Using
`Reconfigurable Logic”, SRC Computers, Inc., Nov. 2001, pp. 20.
`Hammes,J.P., Rinker, R. E., McClure, D. M., Bohm,A.P. W., Najjar,
`W. A, “The SA-C Compiler Dataflow Description”, Colorado State
`University, Jun. 21, 2001, pp. 1-25.
`Callahan, Timothy J. and Wawrzynek, John, “Adapting Software
`Pipelining for Reconfigurable Computing”, University of California
`at Berkeley, Nov. 17-19, 2000, pp. 8.
`Ratha, Nalini K., Jain, Anil K. and Rover, Diane T., “An FPGA-based
`Point Pattern Matching Processor with Application to Fingerprint
`Matching”, Michigan State University, Department ofComputer Sci-
`ence, pp. 8.
`Dehon, André, “Comparing Computing Machines”, University of
`California at Berkeley, Proceedings of SPIE vol. 3526, Nov. 2-3,
`1998, pp. 11.
`Vemuri, Ranga R. and Harr, Randolph E., “Configurable Computing:
`Technology and Applications”, University of Cincinnati and
`Synopsys Inc., IEEE, Apr. 2000, pp. 39-40.
`Demon,André, “The Density Advantage of Configurable Comput-
`ing”, California Institute of Technology, IEEE, Apr. 2000. pp. 41-49.
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 3
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 3
`
`
`
`US 7,620,800 B2
`
`Page 4
`
`Haynes, Simon D., Stone, John, Cheung, PeterY.K. and Luk, Wayne,
`“Video Image Processing with the Sonic Architecture”, Sony Broad-
`cast & Professional Europe, Imperial College, University of London,
`IEEE,Apr. 2000, pp. 50-57.
`Platzner , Marco, “Reconfigurable Accelerators for Combinatorial
`Problems”, Swiss Federal Institute of Technology (ETH) Zurich,
`IEEE,Apr. 2000, pp. 58-60.
`Callahan, Timothy J., Hauser, John R. and Wawrzynek, John, “The
`Garp Architecture and C Compiler”, University of California, Ber-
`keley, IEEE, Apr. 2000. pp. 62-69.
`Goldstein, Seth Copen, Schmit, Herman, Budiu , Mihai, Cadambi,
`Srihari, Moe, Matt
`and Taylor, R. Reed,
`“PipeRench: A
`Reconfigurable Architecture and Compiler”, Carnegie Mellon Uni-
`versity, IEEE, Apr. 2000, pp. 70-76.
`Muchnick, Steven S., “Advanced Compiler Design and Implementa-
`tion”, Morgan Kaufmann Publishers, pp. 217.
`to
`SA-C_
`Hammes,
`Jeffrey
`P., Dissertation
`“Compiling
`Reconfigurable Computing Systems”, Colorado State University,
`Department of Computer Science, Summer 2000, pp. 179.
`
`Automatic Target Recognition, Colorado State University & USAF,
`http://www.cs.colostate.edu/cameron/applications.html, pp. 1-3.
`Chodowiec, Pawel, Khuon, Po, Gaj, Kris, Fast Implementations of
`Secret-Key Block Ciphers Using Mixed Inner- and Outer-Round
`Pipelining, George Mason University, Feb. 11-13, 2001, pp. 9.
`Hastie, Neil, et al., “The Implementation of Hardware Subroutines on
`Field Programmable Gate Arrays”, XP010005485, Plessey Semicon-
`ductors, Tamerton Rd., Plymouth, Devon, England, IEEE, May 13,
`1990, Custom Integrated Circuits Conference, pp. 314. 1-4. *the
`whole document*.
`Harbaum,Till, et al., “Design of a Flexible Coprocessor Unit”, Insti-
`tute
`of Operating
`Systems
`and Computer Networks,
`XP000879556TU Braunschweig, Germany, Proceedings of the
`Euromicro Conference, Sep. 1999, pp. 335-342. *whole document*.
`Mathias P C; Patnaik L M: “Systolic Evaluation of Polynomial
`Expressions,” IEEE Transactions on Computers,vol. 39, No. 5, May
`1, 1990, pp. 653-665, XP000116659.
`
`* cited by examiner
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 4
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 4
`
`
`
`U.S. Patent
`
`Nov. 17, 2009
`
`Sheet 1 of 20
`
`US 7,620,800 B2
`
`
`
`$N@3201SLNOYd
`
`
`
`Ngogna3alsLNOYS901
`
`poeyEyEe
`
`dn
`
`°rOL“FOLOOO£00
`
`vol
`
`NzoL(49dIMGHLYON)
`
`YATIOULNOD
`
`AYOWAW
`
`OlONY
`
`
`
`AHYHOWAWO/lONYAYOWSW
`
`YATIONLNOD
`
`91(A90INSHLYON)
`
`AYOWSAW
`
`O/|
`
`
`
`ADGA-YALNI
`
`
`BOLJ90IG-YSLNI80LOlt
`N0OLE
`
`UY40ldL‘big
`
`Chl
`
`ANHONIYALSN19 d901Na
`
`w3LSNIDysism9o
`
`
`
`(HOLIMSJANYSLH3)
`
`a9q1yg
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 5
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 5
`
`
`
`
`
`U.S. Patent
`
`Nov. 17, 2009
`
`Sheet 2 of 20
`
`US 7,620,800 B2
`
`atic
`
`Z‘blyWz
`
`802
`
`LOANNODYALMI
`
`
`
`dIHDYOSSIIOUdJALLdvOV
`
`WNOLIGOY
`
`yossa00ud
`
`SdH
`
`aAUdvayANOWS3W
`c0¢90¢
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 6
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 6
`
`
`
`U.S. Patent
`
`Nov. 17, 2009
`
`Sheet 3 of 20
`
`US 7,620,800 B2
`
`NOVOddV
`
`ALevIvas
`
`dVW
`
`SONVWYOINSd
`
`G3SV3SHONIee
`
`-ALIMavIvos
`
`1O034y43d
`
`_
`
`ALMAaVvIvos
`
`NOIWWONddY
`
`IN3SW3A0udW! JONVNOSeAd
`
`_
`
`—
`
`ALIMIEVIVOS
`
`NOILVONddv¥“
`
`Lo33u44d
`
`ALINEVTWOS
`
`geé‘big
`
`SYOSS3D0"8dON
`
`ve‘bly
`
`luvyoladd
`
`SYOSS390UdON
`Petitioner Microsoft Corporation - Ex. 1005, p. 7
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 7
`
`INSW3SA0uUdW! JONVYWYOsY 3d
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Nov. 17, 2009
`
`Sheet 4 of 20
`
`US 7,620,800 B2
`
`cASVHd
`
`PLY
`
`¥dO0O1
`
`ror|}
`Petitioner Microsoft Corporation - Ex. 1005, p. 8
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 8
`
`(eyepAwwnpoqAew)0NOISNSWIO
`
`NOSHYOMVd0O1-L3SVHd
`NOSHYOM8dOO1-tNOISNSAIG
`NOSHYOM§dOO1-cNOISN3WIO
`NOSHYOMVvdOO1-
`eeOS PEF OT TO SESS SSS S RS ETH ST ETE ESET HST TESST PeeSeeeeeeeeeee
`eteeeeeeeeeet
`»edALLOVNI.VWdOOT-@3SVHd
`»SALLOWNI.8dOOT-
`WAILIV.§dOO1-
`»dAILOV.¥dOO1-LASVHd
`eeeeeeTeT
`zor|}eeeeaeeeenemeeeeett
`gdoo)
`vdOO1
`
`VP‘biz
`
`UY40d
`
`Qo
`
`ov
`
`}NOISNAWIO
`
`8dQOO1
`
`
`
`U.S. Patent
`
`Nov. 17, 2009
`
`Sheet 5 of 20
`
`US 7,620,800 B2
`
`Ju
`
`w
`
`Rod)
`gs
`
`w
`
`
`
`tL
`
`©<2
`
`
`
`.
`
`o=
`Ww
`
`|
`
`y
`
`|
`o82
`\:
`|
`| S65
`Oy
`i
`
`
`
`
`VELOCITY
`
`©
`B
`
`SOURCE
`
`co
`8
`
`
`RECEIVER
`Petitioner Microsoft Corporation - Ex. 1005, p. 9
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 9
`
`
`
`U.S. Patent
`
`Nov. 17, 2009
`
`Sheet 6 of 20
`
`US 7,620,800 B2
`
`gs‘big
`
`WY
`
`
`
`S3ISVHdNOILViLNdANOD
`
`hy~‘
`
`“PTHIdvAWKWLdvN
`
`92Sccs
`
`+PIMIdVINX[HIidvwW
`
`™ozs
`
`Pedals
`
`ANILNOY
`
`dvi\c0S
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 10
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 10
`
`
`
`U.S. Patent
`
`Nov. 17, 2009
`
`Sheet 7 of 20
`
`US 7,620,800 B2
`
`dw
`
`veg
`
`
`
`
`
`SLOHSY3SA0dOOT
`
`cc9
`
`Q7Sl4Y3AI9934y=u
`
`Q1al4d3DYNOS=S
`
`
`O2)u'CZ)savay
`
`
`
`
`
`$29I1SH1dadY3A0doo?
`
`
`
`ALIDOTAAZA("2AGVad
`
`2a2s
`
`
`
`HLS
`
`919
`
`0
`cS
`
`Z2zg
`
`8
`
`ves
`
`Pcs
`
`0c9
`
`g9‘big
`
`
`
`609
`
`v9‘big
`
`XLV
`
`+PIMLVW
`
`-PMLV
`
`0¢sS
`
`ces
`
`tS
`
`9¢S
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 11
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 11
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Nov. 17, 2009
`
`Sheet 8 of 20
`
`US 7,620,800 B2
`
` 618
`MAPTRI_d-
`
`READV(Z,,)
`
`
`{8
`
`S(Z,,).R(Z,.)
`o
`ja
`=i}
`lz
`jAt
`
`Io
`IN|
`mo)
`
`18
`|S)
`5|
`le
`iW]
`r)
`
`
`x||2| te
`g)
`3)
`(5
`~
`|O)
`|¥)
`1S
`
`Mlwo|& 0
`
`OQ
`—!
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 12
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 12
`
`14
`
`522
`
`“
`o
`
`wy
`son
`
`o—
`
`_d
`
`—,
`
`= e
`
`p)
`~ee”
`
`— 5 T
`
`L
`Y)
`ke
`Ye
`<L[
`t~
`WOW
`
`
`
`MAPSTEP1
`
`OO
`
`o >L
`
`L
`
`N F
`
`IMAGEZ,,
`
`a©<2 6
`
`
`
`U.S. Patent
`
`Nov. 17, 2009
`
`Sheet 9 of 20
`
`US 7,620,800 B2
`
`dvW
`
`
`
`(ZS)ZLOHSLYVILSv9¢dals
`
`ccg
`
`c
`
`
`
`zsPZ)u'(°2)Saway
`
`
`
`
`
`SLOHSY¥3A0dOOT
`
`
`
`SSOMNSH1idaGYAAOdOO1
`
`
`
`
`
`ag‘big
`
`0¢c9
`
`9¢S5
`
`VLG919
`
`reat
`
`
`
`ceS-"LS-cSzs|]
`
`(J
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 13
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 13
`
`
`
`
`
`
`
`U.S. Patent
`
`Nov. 17, 2009
`
`Sheet 10 of 20
`
`US 7,620,800 B2
`
`
`
`CONTINUES1ANDS2THROUGHCOMPUTE
`
`oS
`
`Ae,
`
`622
`
`Fig.6E 614
` LOOPOVERDEPTHSLICES LOOPOVER
`
`SHOTS
`
`READV(Z_,)
` MAPTRI_d-
`
`
`
`
`S(Z,,).R(Z,,)
`MAPTRI_x
`
`
`
`
`
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 14
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 14
`
`
`
`U.S. Patent
`
`Nov. 17, 2009
`
`Sheet 11 of 20
`
`US 7,620,800 B2
`
`S
`
`e
`
` wo
`READVi(Z,,) STEP4
` S(Z,,).R(Z,,)
` LOOPOVERDEPTHSLICES LOOPOVER
`
`
`SHOTS
`
`dL
`
`L
`
`Oo
`
`;
`
`=
`to
`
`a
`Fe
`
`622
`
`
`MAPTRI_x
`
`624
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 15
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 15
`
`
`
`U.S. Patent
`
`Nov. 17, 2009
`
`Sheet 12 of 20
`
`US 7,620,800 B2
`
`
`
`49NOWVO0d04dGYUVMNMOG3H1SANILNOD
`
`
`
`
`
`
`
`SSONSH1idaG3H1JO1tvY3BA02SGNVIS
`
`éc9
`
`
`
`
`
`SSONSHid3GY¥3SA0dO01
`
`669v9919
`
`8L9ZZSozsLS
`
`
`
`(2AavaHY(2)u'"z)s
`
`+PIYLdVW
`
`C3
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 16
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 16
`
`Gdalsres
`
`
`
`SLOHSY3A0dOO)]
`
`
`
`
`
`
`Nov. 17, 2009
`
`Sheet 13 of 20
`
`US 7,620,800 B2
`
`U.S. Patent
`
`Ore
`
`PLZ ele
`
`
`
`
`
`SJHLNIO3SAS3NTWA
`
`
`
`WOdt+3LADWOD
`
`
`
`$1139ONINOSHOISN
`
`at'sNd
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 17
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 17
`
`
`
`
`U.S. Patent
`
`Nov. 17, 2009
`
`Sheet 14 of 20
`
`US 7,620,800 B2
`
`O~DL
`
`u
`
`TIMESET1
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 18
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 18
`
`
`
`Nov. 17, 2009
`
`Sheet 15 of 20
`
`US 7,620,800 B2
`
`U.S. Patent
`
`SWiL
`
`é13SSWIL L135
`
`4~\
`
`Bl2
`
`O¢eL
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 19
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 19
`
`
`
`U.S. Patent
`
`Nov. 17, 2009
`
`Sheet 16 of 20
`
`US 7,620,800 B2
`
` rf[|af/Flag
`
`=[peeefPPS
`
`
`:mtg|Sac
`Cebit
`
`/
`ne Ve
`ae
`BS
`6/4 Bo
`Le
`
`V/
`He)
`O/H
`
`f.le[y
`id
`a Rebelese
`PpafetePetedyUalededePt
`=
`
`
`
`Lf
`
`2
`
`,
`
`ez
`
`ltNM
`
`
`
`
`°}-|-Pot pats[oo
`xfe|SPERSEEEELI
`
`eTEr
`[-|efele}-|-jelelele
`ablile
`:©
`
`
`
`OELETE
`INSERT
`Petitioner Microsoft Corporation - Ex. 1005, p. 20
`
`ipoctane/
`PORE
`
`:
`fy
`
`STATE
`
`STATE
`
`, 2
`|
`UW
`
`
`
`TIME
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 20
`
`
`
`U.S. Patent
`
`Nov. 17, 2009
`
`Sheet 17 of 20
`
`US 7,620,800 B2
`
`
`
`3ZLAdGWOD3HLNIQ3SNSAMWA
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 21
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 21
`
`
`
`
`U.S. Patent
`
`Nov. 17, 2009
`
`Sheet 18 of 20
`
`US 7,620,800 B2
`
`'SL9_4
`[Bax|
`
`i]4'tt
`
`ti]
`
`love98‘big0—eax={reaxSi1S31
`wuOs43d
`
`P{islaxflorea|fore|fo'e+i|‘o1g-—~"
`[L-lax
`tw’La|
`ebjax
`=(LelaxSi-1S31WHOaU3d[tr+ax|
`
`{‘oze
`[eseteb+|
`fun")+I}
`[Wax_][flax|two|teh|few
`
`ed
`
`SWI
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 22
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 22
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Nov. 17, 2009
`
`Sheet 19 of 20
`
`US 7,620,800 B2
`
`oS
`
`ai
`
`C
`
`=
`
`ok RAR
`FELe
`ak ARR
`TREN
`ma
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 23
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 23
`
`
`
`U.S. Patent
`
`Nov. 17, 2009
`
`Sheet 20 of 20
`
`US 7,620,800 B2
`
`Petitioner Microsoft Corporation - Ex. 1005, p. 24
`
`
`
`US 7,620,800 B2
`
`1
`MULTI-ADAPTIVE PROCESSING SYSTEMS
`AND TECHNIQUES FOR ENHANCING
`PARALLELISM AND PERFORMANCE OF
`COMPUTATIONAL FUNCTIONS
`
`CROSS REFERENCE TO RELATED PATENT
`APPLICATIONS
`
`The present application is a Continuation of U.S. patent
`application Ser. No. 10/285,318 filed Oct. 31, 2002 which is
`related to the subject matter of U.S. patent application Ser.
`No. 09/755,744filed Jan. 5, 2001 for: “Multiprocessor Com-
`puterArchitecture Incorporating a Plurality ofMemory Algo-
`rithm Processors in the Memory Subsystem”andis further
`related to the subject matter of U.S. Pat. No. 6,434,687 for:
`“System and Method for Accelerating Web Site Access and
`Processing Utilizing a Computer System Incorporating
`Reconfigurable Processors Operating Under a Single Oper-
`ating System Image’, all of which are assigned to SRC Com-
`puters, Inc., Colorado Springs, Colo. and the disclosures of
`whichare herein specifically incorporatedin their entirety by
`this reference.
`
`COPYRIGHT NOTICE/PERMISSION
`
`A portion of the disclosure of this patent document may
`contain material which is subject to copyright protection. The
`copyright owner has no objection to the facsimile reproduc-
`tion by anyoneofthe patent documentorthe patent disclosure
`as it appearsin the United States Patent and Trademark Office
`patent file or records, but otherwise, reserves all copyright
`rights whatsoever. The following notice applies to the soft-
`ware and data and described below,inclusive of the drawing
`figures where applicable: Copyright © 2000, SRC Comput-
`ers, Inc.
`
`BACKGROUND OF THE INVENTION
`
`20
`
`25
`
`30
`
`35
`
`2
`factor. Such a processor or processing element is possible
`through the use of a reconfigurable processor. Reconfigurable
`processors instantiate only the functional units needed to
`solve a particular application, and as a result, have available
`space to instantiate as many functional units as may be
`required to solve the problem up to the total capacity of the
`integrated circuit chips they employ.
`Atpresent, reconfigurable processors, such as multi-adap-
`tive processor elements (MAP™, a trademark of SRC Com-
`puters, Inc.) can achieve two to three orders of magnitude
`more parallelism and performance than state-of-the-art
`microprocessors. Through the advantageous application of
`adaptive processing techniquesas disclosed herein,this type
`ofreconfigurable processing parallelism may be employed in
`a variety of applications resulting in significantly higher per-
`formance than that which can now be achieved while using
`significantly smaller and less expensive computer systems.
`However, in addition to these benefits, there is an addi-
`tional much less obvious one that can have even greater
`impacton certain applications and has only becomeavailable
`with the advent of multi-million gate reconfigurable chips.
`Performance gains are also realized by reconfigurable pro-
`cessors due to the muchtighter coupling ofthe parallel func-
`tional units within each chip than can be accomplished in a
`microprocessor based computing system.
`In a multi-processor, microprocessor-based system, each
`processoris allocated buta relatively small portion ofthe total
`problem called a cell. However, to solve the total problem,
`results of one processor are often required by many adjacent
`cells becausetheir cells interact at the boundary and upwards
`of six or more cells, all having to interact to computeresults,
`would not be uncommon. Consequently, intermediate results
`must be passed around the system in order to complete the
`computation ofthe total problem. This, of necessity, involves
`numerous other chips and busses that run at much slower
`speeds than the microprocessorthus resulting in system per-
`formance often many orders ofmagnitude lowerthan the raw
`computation time.
`On the other hand, in the use of an adaptive processor-
`based system, since ten to one thousand times more compu-
`tations can be performed within a single chip, any boundary
`data that is shared between these functional units need never
`
`40
`
`45
`
`50
`
`The present invention relates, in general, to the field of
`computing systems and techniques. Moreparticularly, the
`present invention relates to multi-adaptive processing sys-
`tems and techniques for enhancing parallelism and perfor-
`mance of computational functions.
`leave a single integrated circuit chip. Therefore, data moving
`Currently, most large software applications achieve high
`around the system,and its impact on reducing overall system
`performanceoperation through the use ofparallel processing.
`performance, can also be reduced by twoorthree orders of
`This technique allows multiple processors to work simulta-
`magnitude. This will allow both significant improvements in
`neously on the same problem to achieve a solution in a frac-
`performancein certain applications as well as enabling cer-
`tion ofthe time required for a single processor to accomplish
`tain applications to be performedinapractical timeframethat
`the same result. The processors in use may be performing
`could not previously be accomplished.
`many copies of the same operation, or may be performing
`Particularly disclosed herein is a method for data process-
`totally different operations, but in either case all processors
`ing in a reconfigurable computing system comprising a plu-
`are working simultaneously.
`rality of functional units. The method comprises: defining a
`The use of such parallel processing has led to the prolif-
`calculation for the reconfigurable computing system; instan-
`eration of both multi-processor boards andlarge scale clus-
`tiating at least two of the functional units to perform the
`tered systems. However, as more and more performanceis
`calculation; utilizing a first of the functional units to operate
`required, so is more parallelism, resulting in ever larger sys-
`upon a subsequent data dimension of the calculation and
`tems. Clusters exist today that have tens of thousands of
`substantially concurrently utilizing a second ofthe functional
`processors and can occupy football fields of space. Systems
`units to operate upon a previous data dimensionof the calcu-
`lation.
`of such a large physical size present many obvious down-
`Further disclosed herein is a methodfor data processing in
`sides, including, among otherfactors, facility requirements,
`a reconfigurable computing system comprising a plurality of
`power, heat generation and reliability.
`functional units. The method comprises: definingafirst sys-
`SUMMARYOF THE INVENTION
`tolic wall comprising rows of cells forming a subset of the
`plurality of functional units; computing a valueat each of the
`cells in at least a first row ofthe first systolic wall; commu-
`nicating the values betweencellsin the first row of the cells to
`produce updated values; communicating the updated values
`
`However, ifa processor technology could be employed that
`offers orders of magnitude mo