throbber
|||||l||||||l||||||||l|||||||||||||||||||||||||||||||||||||||||||||||||||||
`USIJtlfi339819B‘l
`
`(12) United States Patent
`(10) Patent 1%.:
`US 6,339,819 B1
`
`Huppenthal et al.
`Jan. 15, 2002
`(45) Date of Patent:
`
`(54) MUL’I‘IPROCHSSOR WITH EACH
`PROCESSOR ELEMENT ACCESSING
`OPERANDS [N LOADED INPUT BUFFER
`AND FORWARDING RESULTS TO FIFO
`OUTPUT BUFFER
`
`(7'5)
`
`Inventors: Jon M. Huppemhal, Paul A. Leskal',
`b0“! 01‘ Cfilmfldfl Spring-5, CO (US)
`
`_
`(73) N‘S'gflcei SF“; Computers, Inez, Celorad“
`Spring-s, CO (US)
`I
`I
`I
`.
`I
`Subject to any dlsclatmer, the [em] 01 this
`patent is extended or adjusted under 35
`U.S.C. 1540;) by 0 days.
`
`I
`( ‘ ) Notlee:
`
`(31) AFPI- NO-i 09/563,551
`(22)
`Filed:
`May 3, 2000
`Related U.S. Application Data
`
`(63) Continuation-impart of application No. D‘Jf481,9f12, filed on
`Jan, 12. 2am, nnw Pat. No. 6,247.]“1, which is :1 continue
`ation oi application No. 081992.703. filed on Dee. 1?. 1997.
`now Pat. No. 0,076,152.
`
`Int. Cl.7
`(51)
`(52) U.S. Cl.
`
`(53) Field of Search
`
`GMI" 15116
`712/16; 326.r'39;32614l;
`712B?
`326/39. 41; figfloI
`71337
`
`(56)
`
`References Cited
`I
`.. I
`..
`I
`I
`I
`”'3' PA] is?“ DOCUMbN TS
`
`
`320m
`5.570.040 A - name Lytle a a].
`,
`
`71mm
`5,737,766 A *
`411998 Tan ..
`
` I”10%| Cloulier... . 712116
`5,302,062 A .
`
`712337
`
`2:2000 Cusselntan
`0.023.755 A *
`.
`l
`,
`.
`,
`.
`.
`_
`.
`omen meAHUNE’
`Vemuri. Ranger R. et at, “Configurable Computing: 'l‘eeh—
`nology and Applications", Apr. 2000, Computer. pp. 39—40.
`Dellon. Andre. "The Density Advantage of Configurable
`Computing". Apr. 2000, Computer, pp. 41-49.
`Haynes, Simon D. et 51]., "Video Image Processing with the
`Sonic Architecture”. Apr. 2600, Computer. pp. 50—51
`l’latzner. Marco. "Reconfigurable Accelerators for Combi~
`natorial Problems", Apr. 2000. Computer. pp. 58—60.
`Callahan. Timothy J. et 31., "The (jarp Architecture and C
`Compiler", Apr. 3000‘ Computer, PP. 5245:),
`(List continued on next page.)
`Primary Examiner—Kenneth S‘ Kim
`(74) Attorney, Agent, or Firm—William J. Kuhida; Kent A.
`l.embke; Hogan & Hanson [JP
`(57')
`ABSTRACT
`
`(“MAP")
`An enhanced memory algorithmic processor
`architecture for multiprocessor computer systems comprises
`an assembly that may comltrim, for example, field program-
`mahle gale arrays ("FPGAs") functioning as the memory
`.
`.
`.
`algortthmtc processors. The MAP elements may further
`include an operand storage, intelligent address generation,
`on board function libraries, result storage and multiple
`input-"output (“H0") ports. The MAP elements are intended
`to augment, not necessarily replace, the high performance
`microproces‘xlrs 1'11 the system and, in a particular embodi-
`ment of the present
`invention.
`they may be connected
`through the memory subsystem of the computer system
`resultinn in it bein ve
`ti htl
`eou lecl to the 5 Stem as
`well asbbeing globglly liii:t.‘t:‘,‘E'sil)lie {will any proeezsor in a
`lTlUill'Pm‘BSSOr mmpulcf Symm-
`47 Claims, 11 Drawing Sheets
`
`FRO M
`PREVIOUS
`-\ MAP
`24
`Cl {AIM
`PORT
`'\ sun
`24
`
`K‘- 112
`
`
`
`
`
`
`
`MEMORY
`CONTROLL
`
`CHAIN
`(ream
`SWITCH
`PROCESSOR
`PROCESSOR
`PORT
`ASSEMBLY
`800
`245m
`
`_
`r-24”“':_—E"?5:
`._— 238:
`0u.
`t
`It
`Ii
`L.
`_
`
`..._JAP_ "i
`g
`lll I| .
`
`
`MEMORY
`
`ONTROLL
`(FPGA)
`IIIIE
`
`
`
`
`R
`
`: C
`
`READ
`
`lRUNK
`j 53]
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 1
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 1
`
`

`

`US 6,339,819 B1
`Page 2
`
`OTHER PUBLICATIONS
`
`Goldstein, Seth Copen el al., “PipeRench: A Reconfigurable
`Architecture and Compiler", Apr. 2000, Computer. pp.
`NL'I'EI.
`Albaharna, Usama, et a]., “0n the viabilityr of FPGA—based
`integrated
`coprocessors”, ©l996
`IEEE,
`Pub]. No.
`0—8136—7548—91’96, pp. 206—215.
`Barthel, Dominique Aug. 25—26, 1997, "PVP a Parallel
`Video coProcessor", Hot Chips IX, pp. 203-210.
`Bittner, Ray, et a]., "Computing kernels implemented with a
`wormhole RTR CCM”, ©1997
`IEEE,
`Pub]. No.
`0—8186—8159—U97, pp. 98—105.
`Babb, Jonathan, 01 a]., “l’arallizing applications into sili-
`con“, ©1999 IEEE.
`Berlin, Patrice, et a]., “Programmable active memories: a
`performance assessment”, @1993 Massachusetts Institute of
`Technology, pp. 88—102.
`Culbertson. W. Bruce, at a]., "Exploring architectures for
`volume visualization on the Teramac custom computer",
`©1EEE. Pub]. No. 0—8186—?548—9t96, pp. 80—88.
`Culbertson, W, Bruce, at a]., "Defect
`tolerance on the
`'l‘erarnac custom computer", ©1997r
`IEEE, Pub]. No.
`8186—8l59—4i97, pp] tin—123.
`Chan, Pak, et a]., "Architectural tradenflls in field—program-
`mable—dcvicc-bascd computing systems", ©|993 IEEE,
`Pub]. No. 0—8186-3890—7/93. pp. 152—161.
`Clark, David, et al., “Supporting FPGA microprocessors
`through retargctable software tools", @1996 IEEE, Pub].
`No. 0—8186—7548—91'96, pp. 195—103.
`Cuccarco. Steven. et a]., “The (TM—2X: a hybrid (SM—33‘
`Xilink prototype", ©1993 iEEE. Pub]. No. 0—81867339tk7i
`93, pp. 121430.
`Dehon, Andre. "DI’GA—Coupled microprocessors: cont-
`mndity [C For the early 2]" century”, @1994 IEEE, Pub].
`No. 0—8186—5400—2/94. pp. 31—39.
`Dhaussy, Philippe, et a]., “Global control synthesis for an
`MIMDKFPGA
`machine",
`@1994,
`Pub].
`No.
`0—8] 86—5490—2t94, pp. 72—81.
`Elliott, Duncan, ct a]., “Computational Ram: a memo—
`ry—SIMD hybrid and its application to DSP", ©1992 IEEE,
`Pub]. No. 0—7803—0246—Xf92, pp. 30.6.1—30.r‘:.4.
`Fortes, Jose, et a]., “Systolic arrays,
`11 survey of seven
`projects”. @198?
`IEEE,
`Pub]. No.
`0018—9162871“
`men—mm, pp. 91—103.
`Puma, Karthikeya, at 3]., ”Temporal partitioning and sched-
`uling data flow graphs for
`reconfigurable computers",
`©1999 IEEE, Pub]. No. (Ill&9340f99. P137 5797590,
`Gibbs, W. Wayt, "Blilzing hits“, @1999 Scientific American
`Presents, pp. 57—61.
`Gonzalez, Ricardo, “Configurable and extensible processors
`change system design", Aug. 1571?, 199‘), Hot Chips. 11
`Tutorials, pp. 135446.
`Graham, Paul, at a]., “FPGA—based sonar processing",
`@1998 ACM 0-89791—978-5/98, pp. 201—208.
`Hauser, John, ct al.: "CARP:
`a MIPS processor with a
`reconfigurable eta—processor", ©1997 IEEE, Pub]. No.
`0—08186—8159—461 pp. 12—2].
`Hammond, Lance, et a]., "The Stanford Hydra CMP", Aug.
`15—17, 1999 Hot Chips 11 'l‘utorials, pp. 23—31.
`llartenstein, Reiner, et a]., “A reconfigurable data—driven
`ALU for Xputers", ©1994 IEEE, Pub]. No. 0-8186-5490-2]
`94, pp. 139—]46.
`
`Hayes, John, et a]., "A microprocessor—based hypercube,
`supercomputer", ©1986 IEEE, Pub]. No.
`(1272—1732f86f
`1000-0006, pp. 6-17.
`Hagiwara, Hiroshi, et a]., "A dynamically microprogram-
`mable computer with low—level parallelism", ©1980 IEEE,
`Pub]. No. (l018—9340180i07iln—0577, pp. 577—594.
`Hasebe, A], et a]., “Architecture ol' SIPS, a real time image
`processing system," @1988 IEEE, Pub]. No. (T112603r9f88f
`00001062] , pp. 621—630.
`Jean, Jack, at a]., "Dynamic reconfiguration to support
`concurrent
`applications". @1999
`IEEE,
`Pub]. No.
`0(118—9340f9‘), pp. 591—602.
`a computer—driven
`Kastrup, Bemardo, et al., “Concise:
`CPLD—based instruction set accelerator", ©1999 IEEE,.
`King, William, et a]., “Using MORRPII in an industrial
`machine
`vision
`system”. @1996
`IEEE,
`Pub]. No.
`08186—7'548—9f96, pp. 18—26.
`Manohar, Swaminalhan. et a]., "A pragmatic approach to
`systolic design”, @1988 IEEE, Pub]. No. CH3603—9f88t
`0m0t0463, pp. 463—472.
`Motornura, Masato, et a]., “An embedded DRAMii-‘PGA
`chip with instantaneous logic reconfiguration”, ©1998
`IEEE, Pub]. No. Ill—818649004538, pp. 2%256.
`McConnell, Ray, "Massively parallel computing on the
`Fuzion chip”,Aug. JS—ET, [999.Hot Chips 1] Tutorials, pp.
`83—94.
`McShane, Erik, el al., “Functionally integrated systems on a
`chip:
`technologies, architectures, CAD tools, and applica—
`tions”, ©1998 IEEE, Pub]. No. 8—8186—8424—0f98. pp.
`67—?5.
`Mauduit. Nicolas, et a]., "Lneuro 1.0: a piece of hardware
`LEGO for building neural network systems," ©1992 IEEE,
`Pub]. No. 104579227t92, pp. 414—422.
`Patterson, David, et a]., "A case intelligent DRAM: IRAM",
`Hot Chips VII], Aug. 19—20, 1996. pp. 75—94.
`Peterson, Janes, eta1., “Scheduling and partitioningANSI£
`programs onto mulli—FPGA CCM architectures". ©1996
`IEEE, Pub]. No. 0—8186—7548—9/96, pp.
`IT’S—187.
`Rupp, Charley, ct a]., "The NAPA adaptive processing
`architecture”, @1998 the Authors, pp. 1—10.
`Sailo, Osamu, et a]., “A ]M synapse self learning digital
`neural
`network
`chip”, ©1998
`IEEE,
`Pub]. No.
`0—7803—4344f1l98, pp. 94—95.
`Schott, Brian, et a]., "Architectures for systemilevel appli-
`cations of adaptive computing", ©1999 IEEE.
`Schmit, Herman, "Incremental reconfiguration of pipelined
`applications." @1997 IEEE, Pub]. No. 0—8186—8159—4f97,
`pp. 47—55.
`Villasenor, John, at a]., “Configurable computing”, @1997
`Scientific American, Jun. 1997'.
`Stone, Harold, “A logic—in—memory computer", @1970
`IEEE, lEEE Transactions on Computers, pp. 73—78.
`Trimberger, Steve, et a]., "A time—multiplexed FPGA",
`©1997 IEEE, Pub]. No. 0—8186—8159—4/97, pp. 22—28.
`'l'homburg, Mike, et a].,
`"'l‘ransformablc Computers“,
`©1994 IEEE, Pub]. No. 0-6186-5602-6/94, pp. 674-679.
`Tangcn, Uwe, et a]., "A parallel hardware evolvable com—
`puter POLYP extended abstract", @1997 IEEE, Pub]. No.
`0—8186—8159f4f97, pp. 238—239.
`Tomita, Shinji, et a]., “A computer low—level parallelism
`Opt-2". ©1986 IEEE. Pub]. No. U-n384—7495/86/0000t
`0230, pp. 280—289.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 2
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 2
`
`

`

`US 6,339,819 B1
`Page 3
`
`Ueda, Hirolada, et all, "A multiprocessor system ulilizing
`enhanced DSl"s for image processing", ©1988 IEEE, l’ubl‘
`No‘ CH2603¢9I88IEKIKW6LL pp. 6114320.
`Wang, Quiang, el al.."Automatcd fieldiprograrnrnablc com-
`pute accelerator design using partial evaluation”, ©1997
`IEEE, Publ. No. (l—SISGfi‘SlSEifi’l-M'F, pp. 1457154.
`Wirlhlin1 Michael, el all, “The Nano processor: a low
`resource reconfigurable processor", @1994 IEEE, Publ. Nu
`(L—8l86—54‘Jlt—21‘94, pp 23—30,
`Wittig. Ralph, et 41]., "One Chip: An FPGA processor with
`reconfigurable
`logic", ©1996
`IEEE,
`l’ubl.
`No‘
`0—8'186—7548-9J96, pp [26—135
`Wirthlin, MlChfll‘Jl, at £11., “A dynamic instruction set corn-
`puler”, ©1995 IEEE, Puhl. No. D—Slflfli'flflfifiXf'JS, pp.
`997107.
`Yamauchi.Tsukasa, ct 31., “SOP: Areconfigurahle massively
`parallel system and its control—data llrtw based compiling
`method“. @1996 IEEE, Pub]. No 0—8186—7548—9f96, pp;
`148—15fi‘
`
`"PAM—Film: High Performance
`Mencer. Oskar, el al.,
`FI’GA Design for Adaptive Computing", @1998 IEEE,
`Conference Paper, lnspcc Abstract No. 139811—126513—044,
`(‘9811521LHI09
`
`Miyamor, 'l'akashi, at a]., "A quantilalive analysis of recon-
`figurable coprocessor: for multimedia applications", @1998
`IEEE,
`Conference
`Paper.
`lnspec Abstract
`Nos.
`B9811—1265F-011, C98l 1—5310-010.
`
`Wll. Mangionc—Smith and BL. Hutchingfi. Configurable
`computing: The Road Ahead. In Proceedings of thc Recon—
`figurable Architecture Workshop (RAWEEJT), pp. 81791,
`199?.
`
`Mirsky. Ethan A., "Coarseifirain Reconfigurable Comput-
`ing", Massachusetts Institute of Technology, Jun. 1996‘
`
`* cited by examiner
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 3
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 3
`
`

`

`US. Patent
`
`Jan. 15, 2002
`
`Sheet 1 of 11
`
`US 6,339,819 B1
`
`160
`
`/‘
`Memory
`Subsystem
`Bank 0
`
`Memory
`Subsystem
`Bank 1
`
`Memory
`Subsystem
`Bank M
`
`14
`
`MEMORY
`INTERCONNECT
`
`FABRIC
`
`120
`
`Processor
`0
`
`[1-121
`
`1
`
`|
`
`| |1 l ll I K
`
`12M
`
`Processor
`N
`
`1120
`
`MAP
`0
`
`1121
`
`MAP
`.1
`
`| || | | | 1
`
`l
`112M I
`
`Fig. 1
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 4
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 4
`
`

`

`US. Patent
`
`Jan. 15, 2002
`
`Sheet 2 of 11
`
`US 6,339,819 B1
`
`Em:
`
`
`m0<am
`
` >EOEME%mum:«an:%
`
`nas—
`
`£52
`
`SN:
`
`«an:
`
`5&3
`
`«mm:
`
`n25
`
`132
`
`%
`
`man:
`
`n_<s_
`
`n22
`
`9N:
`
`mm“:
`
`5N:
`
`n_<s_
`
`%
`
`5N:
`
`%
`
`mum:
`
`%5N:mum:
`
`n_<_2
`
`n_<5_
`
`EN:
`
`<5:
`
`w
`
`_wZOGmmeIwmomeDw—m!4u44<x<a
`
`
`
`
`
`,_2m_._m_._._<m<n_Dm—,.10NET:ZOFGOQEOUMDDMZEEOEDEME
`
`
`
`
`2.0:.e,_eWEIF_KOOI_<A:wOwWZOFUDKLME
`ma:_I__
`£2_6so:__52n_.
`_AH
`
`mn—
`
`_
`
`e(no:uE__.25;
`
`fl.
`_I
`
`__.e_.aӣ2_mg:__Iu9_a_r_an:n.32.m_
`_uvDZ‘eum25—..
`.a_e.32.1«2953252."é:_E_I.
`
` _.e55:..59_.ue__"we
`inn:.L.m69w_$953.52.,
`
`_
`
`_,
`
`.150rE__.wzofianmz.
`
`._
`
`___
`
`,
`
`.I'J
`
`_.___
`
`
`
`822mmmamas
`
`.
`
`ZO_h_wOQEOUm—D
`
`H206mm
`
`ddqmfi.r/
`
`rmor
`
`w:93:
`
`43.35;J}
`
`”Nov
`
`434qu
`
`n206mm
`
`$.3qu
`
`v206mm
`
`amo—
`
`i:
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 5
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 5
`
`
`

`

`US. Patent
`
`US 6,339,819 B1
`
`0')
`
`9
`LL
`
` 1__H__mm15528umTimam?as).E.msw069m_5528_m@xz<mmmz:12:”:_m,w.2396_m_mmmmom<_m"><mm<
`E055.".W_____._
`
`
`
`______
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 6
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 6
`
`
`

`

`US. Patent
`
`Jan. 15, 2002
`
`Sheet 4 of 11
`
`US 6,339,819 B1
`
`at\I"E
`
`mm“92
`
`zofiEnoEzoom:_EfiwfiouIIIIIII...
`.201“.m:.93....qu.2
`
`twin“
`
`mm:
`
`zofi<m3wazoo
`mtm
`
`25:5
`
`mum:mwmmog
`
`zozssuizoanz<EEOu_£9.E_a;
`
`<09.zo:.<m:o_u_zoo_IIIII.I.IIIIIIImmm2mm:05;__
`0k2:mtm55on
`
`ozémmoEEEmmat55_x2:
`
`Encomo_.
`
`
`
`
`
`
`
`we“.__we.8:;.watfim
`
` mah§4mioum."E223_.m_z_I_mn=n__m_(an...20EVAMEmSE30963mwEEmJ___mmpzzouWNW,“2mm:"_mzfiwli_mzzwnzn.w.a?(on:W$55.32
`EOE...
`
`
`_mm:.9%Emma
`
`N?
`
`U...................................__m2
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 7
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 7
`
`
`

`

`US. Patent
`
`Jan. 15, 2002
`
`Sheet 5 of 1]
`
`US 6,339,819 B1
`
`«m:
`
`mm:
`
`omw
`
`
`
`l<mm<'mOEwS.
`
`0.2
`
`>mOS—w2
`
`><mm<
`
`'IOtllN OD
`
`'IOHLNOO
`
`mmm
`
`Iota/W
`
`mowmmUOMm
`
`com
`
`
`
`mDOSmmn.20mm
`
`ads.
`
`Om<0m
`
`mOmmMUOmm
`
`Dm<0m
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 8
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 8
`
`

`

`US. Patent
`
`Jan. 15, 2002
`
`Sheet 6 of 1]
`
`US 6,339,819 B1
`
`ZEIU
`
`PKG“;
`
`<._.<D
`
`ENVow
`
`wwwmofl<
`
`SE28
`
`<20
`
`\-2ES:
`
`
`
`3.E05:
`
`aat
`
`
`
`
`
`S20>mO§m=2ZOE—200I._._>>
`
`mm
`
`20—2200
`
`>m02w§
`
`Iotgw
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 9
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 9
`
`
`
`
`
`

`

`US. Patent
`
`Jan. 15, 2002
`
`Sheet 7 of 1]
`
`US 6,339,819 B1
`
`vm
`
`vm
`
`SE305%:
`
`
`
`mnzéwao825.30ow
`
`23:02210Eom
`
`>EOEw§
`
`FDQZ.
`
`<H<D
`
`
`<29Bm<m20iz©omm
`
`53302mm
`
`
`
`
`mesa—23m:com
`
`><mm<
`
`JOmFZOO
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 10
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 10
`
`
`
`
`
`
`
`
`

`

`US. Patent
`
`Jan. 15, 2002
`
`Sheet 8 of 11
`
`US 6,339,819 B1
`
`ifldLnO NIVHO
`
`ViVCI AHOWHW
`
`8
`
`.09.25Em:
`
`QMOh
`
`k
`
`3
`
`«bk
`
`XJOHKE
`
`><EE<
`
`
`
`Emma;<F<D
`
`3mmmmaad.
`
`aw
`
`LUCOOCC<KE
`
`NV
`
`
`
`zEIo5632
`
`um‘-av
`ajvmmm(ban—
`
`Ken
`
`
`If;onom
`
`
`
` 0:4522:0.6528No5989amw\vM”mlmmmenEm‘5&2.m5&2.
`mmmmogq
`
`
`N:mat
`
`9am22:0:mm
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 11
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 11
`
`
`
`
`

`

`US. Patent
`
`Jan. 15,2002
`
`Sheet 9 of 11
`
`US 6,339,819 B1
`
`tr
`‘91::
`o.._..
`
`Fig.9
`
`
`
`InputData
`
`[00:64]
`
`
`
`ChainInput
`
`{00:54]
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 12
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 12
`
`

`

`US. Patent
`
`Jan. 15, 2002
`
`Sheet 10 of 11
`
`US 6,339,819 B1
`
`
`
`H.15—
`
`Z_<IDH___#m.3.3gro::a.58
`
`
`kmnvaNW?;.--mdem:m‘wwm:
` §<Iu.-.1‘ILMIIIIEIJukIiwfirwl.__WUa_:_mJ...
`:51wmwrwfi...-ti(
`
`
`
`com.mwwmmwmamvmunrmoqm.mmww
`
`Q»g........._
`
`
`
`1<2,/,__awDO_>m_WE.cvvmer—4NVN.qovmkll
`zEIU____mVN____.|__H_._
`
`
`agaro____aEon.I‘lwwwmm1..
`20mm_..................................
`I_.._‘I
`
`
`HETIOHLNOD
`AHOWSW
`
`HSTIOELNOO
`AHOWEW
`
`(VOdfl
`HQUMS
`Hossaooad
`
`A‘IGWESSV
`BOSSEOOHd
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 13
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 13
`
`
`
`
`
`

`

`US. Patent
`
`.h
`
`HwhS
`
`0]
`
`m
`
`lBw
`
`mITmcorI!Jmmtmlmm
`
`«tatx-asymEmDSaEm,2893>
`
`IImam1?I!IT
`
`mcmé
`
`v.36
`
`6.,£825>
`
`HIImcmII
`
`0‘163w
`
`ITm:9-1\
`
`wwtmmm
`
`Etmlmm
`
`
`
`
`
`SmDEaEO“m,ME..mmIn22:1«fl
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 14
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 14
`
`
`

`

`US 6,339,819 B]
`
`l
`MUIII‘II’ROCESSOR WITH EACH
`PROCESSOR ELEMENT ACCESSING
`OPERANDS IN LOADED INPUT BUFFER
`AND FORWARDING RESULTS TO FIFO
`OUTPUT BUFFER
`CROSS REFERENCE TO RELA'l'ED PATENT
`APPLICATIONS
`
`The present invention is a continuation-in—part application
`ofU.S. patent application Ser. No. fl9f481,9f]2 tiled Jan. 12,
`2000, now U.S. Pat. No. 6,247,110, which is a continuation
`of US. patent application Ser. No. 08f992,763 filed Dec. 17,
`1997, now US. Pat. No. 6,076,152, for: “Multiprocessor
`Computer Architecture Incorporating a Plurality of Memory
`Algorithm Processors in the Memory Subsystem", assigned
`to SRC Computers, Inc., Colorado Springs, Colo. assignec
`of the present invention, the disclosures of which are herein
`specifically incorporated by this reference.
`BACKGROUND OF THE INVENTION
`
`Ill
`
`15
`
`3f]
`
`The enhanced memory algorithmic processor architecture
`for multiprocessor computer systems of the present inven-
`
`The present invention relates, in general, to the field of
`computer architectures incorporating multiple processing
`elements. More particularly, the present invention relates to
`a multiprocessor computer architecture incorporating a
`number of memory algorithmic processors (“MAP") in the
`memory subsystem or closely coupled to the processing g
`elements to significantly enhance overall system processing
`speed.
`As commodity microprocessors increase in capability
`there is an ever increasing push to use them in high perfor-
`mance multiprocessor systems capable of performing tril-
`lions ot‘ calculations per second at significantly lower cost
`than those made from custom counterparts. However, many
`of these processors lack specific features common to sys-
`tems in this category that employ much more expensive
`custom processors. One such feature is the ability to perform
`vector processing.
`In this form ofprocessing, a data register or buffer is filled
`with operands forming what is called a vector. All of these
`operands are then passed one after the other thmugh a
`functional unit capable of performing operations such as
`multiplication. This functional unit will output one result
`every clock cycle. This type of processing does require that
`the same operation be performed on all operands in the input
`vector and it
`is, therefore, widely used in that it exhibits
`much higher processing rates than the traditional scalar
`method of computation used in most microprocessors.
`Nevertheless, neither vector nor scalar processors perform
`very well when required to perform bit manipulation as is
`required. for example, in matrix arithmetic. One such func-
`tion is a bit matrix multiply operation in which two matrices
`of difl’crcnt sizes are multiplied together to form a third
`matrix. Another shortfall of both vector and scalar process-
`ing is their inability to quickly perform pattern searches such
`as those used in a variety of pattern recognition programs.
`A solution to all of these deficiencies can be found by
`building a high performance computer which contains num-
`bers of commodity microprocessors to reduce the system
`cost
`together with MAP elements developed by SRC
`Computers, Inc, assignee of the present invention, to pro—
`vide the deficient functions at very low cost. The MAP
`architecture and specific features thereof is disclosed in the
`aforementioned patent applications, the disclosures of which
`are herein specifically incorporated by this reference.
`SUMMARY 01" THE INVENTION
`
`35
`
`4t]
`
`45
`
`50
`
`55
`
`a0
`
`65
`
`2
`tion is an assembly that not only contains, for example, field
`programmable gate arrays functioning as the memory algo-
`rithmic processors, but also an operand storage, intelligent
`address generation, on board function libraries, result store
`age and multiple U0 ports. Like the original MAP architec-
`turc disclosed in the aforementioned patent applications, this
`architecture ditfers from other so called “reconfigurable"
`computers in many ways.
`First,
`its function is intended to be altered every few
`seconds distinguishing itself from other systems with very
`long reconfiguration times primarily intended for a single
`function Secondly, it contains dedicated hardware to pro-
`vide for large data set operand storage (on the order of 16
`Mbytes or more) allowing the MAP element to function
`autonomously from its host system once operands are
`loaded. Thirdly, it contains dedicated data ports to allow, but
`not require, multiple MAP elements to be chained together
`to perform very large operations. As currently contemplated,
`it is intended that typically 32 to 512 or more MAP sections
`can be connected in a single systemi
`Further,
`the MAP element is intended to augment, not
`replace, the high performance microprocessors in the sys-
`tem. As such,
`in a particular embodiment of the present
`invention, it may be connected through the memory sub-
`system of the computer system resulting in it being very
`tightly coupled to the system as well as being globally
`accessible from any processor in the system. This technique
`was developed by SRC Computers, Inc. and distinguishes
`the MAP architecture from all other so called “attached array
`processor” systems that may exist
`today. While such
`“attached array processor“ systems may bear some superfi-
`cial similarities to MAP based systems,
`they are entirely
`separate units connected to the host computer
`through
`relatively slow interconnects resulting in test system per-
`formance.
`The MAP architecture developed by SRC Computers, Inc.
`as defined in the aforementioned patent applications over
`comes many of the limitations of such “attached array
`processor" systems. Because of the particular limitations in
`the exemplary architecture disclosed therein surrounding the.
`attachment of input storage anti chaining capabilities, certain
`vector processing functions may not have been optimally
`implemented unlike relatively smaller algorithms.
`'l'hrough the addition of these and other features to the
`MAP architecture, a much more powerful multiprocessor
`computer system is provided. Moreover, while, as originally
`diselosed, another feature of the MAP architecture was its
`ability to perform direct memory access ("DMA") into the
`common the memory ofthe system, enhancements disclosed
`herein have expanded the potential utilization of this feature.
`Particularly disclosed herein is a Memory Algorithmic
`Processor ("MAP") assembly (or element) comprising
`reconfigurable field programmable gate array (“Fl’GA”)
`circuitry, an intelligent address generator, input data buffers,
`output first-in, first-out (“FIFO") devices and ports to allow
`connection to a memory array and chaining of multiple MAP
`assemblies for the purpose of augmenting the capability of
`a microprocessor in a high performance computer.
`Further disclosed herein is a MAP assembly comprising
`an intelligent address generator capable of supporting a data
`gather function from its associated input butler or common
`memory. The MAP assembly may also comprise circuitry to
`allow the reconfigurable elements to reprogram their
`on-hoard configuration read only memory ("ROM") devices
`to cause alterations in the functionality of the reconfigurable
`circuitry.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 15
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 15
`
`

`

`US 6,339,819 B1
`
`3
`Still further disclosed herein is a MAP assembly com-
`prising dedicaled input and output ports for the purpose of
`allowing an infinite number of MAP elements to be chained
`together to accomplish a single function. The MAP assem-
`bly may also incorporate provisions to create a single MAP
`chain or multiple independent MAP chains automatically
`based on the contents of the reconfigurable circuitry.
`l-‘unhcr disclosed herein is a MAP assembly comprising
`output FIliOs for the purpose of holding output data and
`allowing the MAP element to not stall
`in the event the
`processor reading these results is delayed due to outside
`[actors such as workload or crossbar switch conllicLs. The
`MAP assembly may further comprise relatively large dedi-
`cated input storage buffers to allow for optimization of
`operand transfer as well as allow multiple accesses to an
`operand without requiring external processor intervention.
`Still further disclosed herein is a MAP assembly com-
`prising a dedicated port for connection to an input buffer so
`that the MAP element can simultaneously receive operands
`via the chained input (chain) port and the input [wires This
`allows the MAP element to perform mathematical process-
`ing at the maximum possible rate while also allowing the
`MAP element to accept operands via the chain port while
`accessing reference data in the input buffer (such as recip-
`rocal look up tables) to allow the MAP element to perform
`operations such as division at the fastest possible rate.
`Also further disclosed herein is a MAP assembly which
`may comprise connections to the memory subsystem of a
`high performance computer for the purpose of providing
`global access to it from all processors in a multiprocessor
`high performance computer system. The MAP assembly
`incorporates the capability to update multiple on board
`function ROMS under program control while in the system
`and may also include connections to the memory subsystem
`of a high performance computer utilizing DMA to accept
`commands from a microprocessor.
`BRIEF DESCRIPTION 01" THE DRAWINGS
`
`The aforementioned and other features and objects of the
`present
`invention and the manner of attaining them will
`become more apparent and the invention itself will be best
`understood by reference to the following description of a
`preferred embodiment taken in conjunction with the accom-
`panying drawings, wherein:
`FIG. 1 is a simplified, high level, functional block dia-
`gram of a multiprocessor computer architecture employing
`memory algorithmic processors (“MAP") in accordance
`with the disclosure of the aforementioned patent applica-
`tions in an alternative embodiment wherein direct memory
`access (“DMA”) techniques may be utilized to send com-
`mands to the MAP elements in addition to data;
`FIG. 2 is a simplified logical block diagram of a possible
`computer application program decomposition sequence for
`use in conjunction with a multiprocessor computer archi-
`tecture utilizing a number of MAP elements located, for
`example, in the computer system memory space, in accor-
`dance with a particular embodiment ofthe present invention;
`FIG. 3 is a more detailed functional block diagram of an
`exemplary individual one of the MAP elements of the
`preceding figures and illustrating the bank control
`logic,
`memory array and MAP assembly thereof;
`FIG. 4 is a more detailed functional block diagram of the
`control block of the MAP assembly of the preceding illus~
`tralion illustrating its interconnection to the user FPGA
`thereof in a particular embodiment;
`FIG. 5 is a functional block diagram of an alternative
`embodiment of the present
`invention wherein individual
`
`it]
`
`15
`
`3f]
`
`35
`
`4t]
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`MAP elements are closely associated with individual pro-
`cessor boards and each of the MAP elements comprises
`independent chain ports for coupling the MAP elements
`directly to each other;
`FIG; 6 isa functional block diagram ofan individual MAP
`element wherein each comprises on board memory and a
`control block providing common memory DMA capabili-
`tics;
`FIG. 7 is an additional functional block diagram of an
`individual MAP element illustrating the on board memory
`function as an input buffer and output FIFO portions thereof;
`FIG. 8 is a more detailed functional block diagram of an
`individual MAP element as illustrated in FIGS. 6 and 7'.
`FIG. 9 isa user array interconnect diagram illustrating, for
`example,
`four user Fl’GAs interconnected through
`horizontal, vertical and diagonal buses to allow for expan-
`sion in designs that exceed the capacity of a single l’I’GA;
`FIG. III is a functional block diagram of another altema-
`tivc embodiment ofthc present invention wherein individual
`MAP elements are closely associated with individual
`memory arrays and each of the MAP elements comprises
`independent chain ports for coupling the MAP elements
`directly to each other; and
`FIGS. 11A and 11B are timing diagrams respectively
`input and output timing in relationship to the system clock
`(“Sysclk”) signal.
`DESCRIPTION OF A PREFERRED
`EMBODIMENT
`
`With reference now to FIG. I, a multiprocessor computer
`10 architecture in accordance with one embodiment of the
`present invention is shown. The multiprocessor computer 10
`incorporates N processors 1200 through 12” which are
`[ii-directionally coupled to a memory interconnect fabric 14.
`The memory interconnect fabric 14 is then also coupled to
`M memory banks comprising memory bank subsystems 160
`(Bank 0) through 16M (Bank M). N number of memory
`algorithmic processors ("MAP”) 1120 through IIZN are also
`coupled to the memory interconnect fabric 14 as will be
`more fully described hereinafter.
`With reference now to FIG. 2, a representative application
`program decomposition for a multiprocessor computer
`architecture [0|] incorporating a plurality of memory algo-
`rithm processors in accordance with the present invention is
`shown. The computer architecture 100 is operative in
`response to user instructions and data which,
`in a coarse
`grained portion of the decomposition, are selectively
`directed to one of (for purposes of example only)
`four
`parallel regions “)2,
`through 1024 inclusive. The instruc-
`tions and data output from each of the parallel regions 1021
`through 1024 are respectively input
`to parallel
`regions
`segregated into data areas 104, through 1044 and instruction
`areas 106] through 106... Data maintained in the data areas
`104'
`through Ill-ll;l and instructions maintained in the
`instruction areas 106J through 196,. are then supplied to, for
`example, corresponding pairs of processors 1081, 1082 (P1
`and P2); 1083, 108., (P3 and P4); 1085, 1086 (P5 and P6);
`and 1087, 1088 (P7 and P8) as shown. At this point,
`the
`medium grained decomposition of the instructions and data
`has been accomplished.
`A fine grained decomposition, or parallelism, is elfcctu-
`ated by a further algorithmic decomposition wherein the
`output of each of the processors 1081 through 1GB”,
`is
`broken up, for example,
`into a number of fundamental
`algorithms 1101“, 11013, 11014, 110213 through 110M as
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 16
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2084, p. 16
`
`

`

`US 6,339,819 B1
`
`in
`
`15
`
`3f]
`
`5
`shown. Each of the algorithms is then supplied to a corre-
`sponding one of the MAP elements 112“, 112m: 112M,
`112:”, through 1123,, which may be located in the memory
`space of the computer architecture 100 for execution therein
`as will be more fully described hereinafter.
`With reference additionally now to FIG. 3, an exemplary
`implementation of a memory bank 120 in a MAP system
`computer architecture 100 of the present invention is shown
`for a representative one of the MAP elements 112 illustrated
`in the preceding figure. Each memory bank 120 includes a
`bank control logic block 122 bi—direclionaliy coupled to the
`computer system trunk lines, for example, a 72 line bus 124.
`The bank control
`logic block 122 is coupled to a
`bi-directional data bus 126 (for example 256 lines) and
`supplies addresses on an address bus 128 (for example 17
`lines) for accessing data at specified locations within a
`memory array 130.
`The data bus 126 and address bus 128 are also coupled to
`a MAP element 112. The MAP element 112 comprises a
`control block 132 coupled to the address bus 123. The
`control block 132 is also bi-directionally coupled to a user
`field programmable gate array (”FPUA") 134 by means of a
`number of signal lines 136. The user FPGA 134 is coupled
`directly to the data bus 126. In a particular embodiment, the
`lil’GA 134 may be provided as a Leccnt
`'l'echnolngies ‘
`ORSTSO device.
`The computer architecture 100 comprises a multiproces-
`sor system employing uniform memory access across com-
`mon shared memory with one or more MAP elements 112
`which maybe located in the memory subsystem, or memory
`space. As previously described, each MAP element 112
`contains at least one relatively large FPGA 134 that is used
`as a reconfigurable functional unit. In addition, a control
`block 132 and a preprogrammed or dynamically program—
`mable configuration ROM (as will be more fully described
`hereinafter) contains the information needed by the rccon~
`figurable MAP element 112 to enable it to perform a specific
`algorithm. It is also possible for the user to directly down-
`load a new configuration into the FPGA 134 under program
`control, although in some instances this may consume a
`number of memory accesses and might result in an overall
`decrease in system performance if the algorithm was short-
`lived.
`li‘PGAs have particular advantages in the application
`shown for several reasons. First, commercially available
`1“l’UAs now contain sufficient internal logic cells to perform
`meaningful computational Functions Secondly,
`they can
`operate at speeds comparable to microprocessors, which
`eliminates the need for speed matching buffers. Still further,
`the internal programmable routing resources of F'PGAs are
`now extensive enough that meaningful algorithms can now
`be programmed without the need to reassign the locations of
`the inpultoutput ("NO”) pins.
`By. for example, placing the MAP element 112 in the
`memory subsystem or memory space.
`it can be

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket