throbber
United States Patent
`
`[19]
`
`[11] Patent Number:
`
`5,956,518
`
`DeHon et al.
`[45] Date of Patent: Sep. 21, 1999
`
`
`
`US005956518A
`
`[54]
`
`INTERMEDIATE-GRAIN
`RECONFIGURABLE PROCESSING DEVICE
`
`[75]
`
`Inventors: André DeHon; Ethan Mirsky, both of
`Cambridge; Thomas F. Knight, Jr.,
`Belmont, all of Mass.
`
`Assignee:
`
`Massachusetts Institute of
`
`Technology, Cambridge, Mass.
`
`Appl. No.: 08/632,371
`
`Filed:
`
`Apr. 11, 1996
`
`Int. Cl.6 ...................................................... G06F 15/80
`US. Cl.
`....................................... 395/800.15; 395/653
`Field of Search ............................ 395/800.1, 800.15,
`395/20051, 280, 653
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,597,041
`............................ 364/200
`6/1986 Guyer et a1.
`4,748,585
`.. 364/900
`5/1988 Chiarulli et a1.
`
`4,754,412
`6/1988 Deering ..........
`364/736
`
`4,858,113
`8/1989 Saccardi ..
`364/200
`4,870,302
`9/1989 Freeman ..
`307/465
`4,873,626
`10/1989 Gifford .......
`364/200
`
`5,020,059
`5/1991 Gorin et a1.
`. 371/11.3
`5,233,539
`.......
`8/1993 Agrawal et a1.
`.. 364/489
`5,239,654
`.. 395/800
`8/1993 Ing—Simmons et a1.
`
`5,241,635
`...... 395/375
`8/1993 Papadopoulos et a1.
`..
`
`5,265,207
`. 395/200.31
`11/1993 Zak .......................
`5,301,340
`4/1994 Cook ............. 395/800
`
`5,305,462
`4/1994 Grondalski .
`395/800.1
`
`5,336,950
`.
`.. 307/465
`8/1994 Popli et a1.
`5,426,378
`6/1995 Ong ................. 326/39
`
`5,457,644 10/1995 McCollum .....
`364/716
`11/1997 Casselman .............................. 395/500
`5,684,980
`
`OTHER PUBLICATIONS
`
`Takashi Miyamori et al., “A Quantitative Analysis of Recon-
`figurable Coprocessors for Multimedia Applications,” IEEE
`Symposium on Field—Programmable Custom Computing
`Machines Conference (FCCM98), Apr. 15—17, 1998.
`
`Charlé Rupp et al., “The Napa Adaptive Processing Archi-
`tecture”, IEEE Symposium on Field—Programmable Custom
`Computing Machines Conference (FCCM98), Apr. 15—17,
`1998, pp. 1—10.
`
`Stephen M. Scalera et al., The Design and Implementation
`of a Context Switching FPGA,
`IEEE Symposium on
`Field—Programmable Custom Computing .
`
`T. Bridges, “The GPA Machine: A Generally Partitionable
`MSIMD Architecture,” Third Symposium on the Frontier of
`Massively Parallel Computation Proceedings IEEE pp.
`196—203 (1990).
`
`P. Clarke, “Pilington Preps Reconfigurable Video DSP,”
`News (Aug. 7, 1995).
`
`DC. Chen, et al., “A Reconfigurable Multiprocess IC for
`Rapid Prototyping of Algorithmic—Specific High—Speed
`DSP Data Paths,” IEEE Journal ofSolid—State Circuits, vol.
`27 (12): 1895—1904 (Dec 1992).
`
`AK. Yeung, et al., “TA 6.3: A2.4GOPS Data—Given Recon-
`figurable Multiprocessor IC for DSP,” IEEE International
`Solid—State Circuits Conference, pp. 108—109, 346 (1995).
`
`(List continued on next page.)
`
`Primary Examiner—Eric Coleman
`Attorney, Agent, or Firm—Hamilton, Brook, Smith &
`Reynolds, PC.
`
`[57]
`
`ABSTRACT
`
`Aprogrammable integrated circuit utilizes a large number of
`intermediate-grain processing elements which are multibit
`processing units arranged in a configurable mesh. The
`coarse-grain resources, such as memory and processing, are
`deployable in a way that takes advantage of the opportuni-
`ties for optimization present in given problems. To accom-
`plish this, the interconnect supports three different modes of
`operation: a static value in which a value set by the con-
`figuration data is provided to a functional unit, static source
`in which another functional unit serves as the value source,
`and a dynamic source mode in which the source is deter-
`mined by the value from another functional unit.
`
`31 Claims, 20 Drawing Sheets
`
`8 BIT M|CROPROCESSOR
`
`
`IOC
`
`
`
`Jfijj
`£11m J
`[1
`J o
`
`:L—U-i“ ofilj d
`
`
`
`
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 1
`
`Petitioner Microsoft Corporation - EX. 1019, p. 1
`
`

`

`5,956,518
`
`Page 2
`
`OTHER PUBLICATIONS
`
`J .E. Brewer, et al., “A Monolithic Processing Subsystem,”
`IEEE Transactions on Components, Packaging, and Manu-
`facturing Technology —Part B, vol. 17(3) :310—317 (Aug.
`1994).
`I. Gilbert, Chapter 11 —Mesh Multiprocessing, The Lincoln
`Laboratory Journal, 1(1) :11.1—11.18 (Spring 1988).
`Ira H. Gilbert, et al., “The Monolithic Synchronous Proces-
`sor,” Lincoln Laboratory, Massachusetts Institute of Tech-
`nology, Lexington, IVIA 02173 (No Date Given).
`G. Masera, et al., “A Microprogrammable Parallel Archi-
`tecture for DSP,” IEEE, pp. 824—827, (1991).
`L. Wang, et al., “Distributed Instruction Set Computer,”
`Proceedings of the 1988 International Conference on Par-
`allel Processing, vol. 1, pp. 426—429.
`M. Sowa, et al., “Parallel Execution on the Function—Parti-
`tioned Processor With Multiple Instruction Streams,” Sys-
`tems and Computers in Japan, 22(4) :22—27 (Nov. 1991).
`T. Alexander, et al., “A Reconfigurable Approach to a
`Systolic Sorting Architecture,” IEEE, pp.
`1178—1182
`(1989).
`Z. Blazek, et al., “Design of a Reconfigurable Parallel
`RISC—Machine,” North—Holland Microprocessing and
`Microprogramming 21, pp. 39—46, (1987).
`S. Morton, et al., The Dynamically Reconfigurable CAP
`Array Chip I, IEEE Journal of Solid—State Circuits, SC21
`(5) :820—826 (Oct. 1986).
`L. Snyder,
`“A Taxonomy of Synchronous Parallel
`Machines,” Proceedings of the 1988 International Confer-
`ence on Parallel Processing, pp. 281—285 (Aug. 1988).
`L. Snyder, “An Inquiry into the Benefits of Multigauge
`Parallel Computation,” Proceedings of the 1985 Interna-
`tional conference on Parallel Processing, pp. 488—492
`(Aug. 1985).
`A. DeHon, “DPGA Utilization and Application,” FPGA
`’96—ACM/SIGDA Fourth International Symposium on
`FPGAs, Monterey, CA (Feb. 11—13, 1996).
`D. Epstein, “Chromatic Raises the Multimedia Bar,” Micro-
`processor Report, pp. 23—27 (Oct. 23, 1995).
`J. Labrousse, et al., “Create—Life: A Modular Design
`Approach for High Performance ASIC’s,” IEEE CH2843,
`pp. 427—433 (Jan. 1990).
`M. Slater, “MicroUnity Lift Veil on MediaProcessor,”
`Microprocessor Report, pp. 11—18 (Oct. 23, 1995).
`E. Tau, et al., “A First Generation DPGa Implementation,”
`FPD ’95 —Third Canadian Workshop of Field—Program-
`mable Devices Montreal, Canada (May 29 —Jun. 1, 1995).
`
`S. Kartashev, et al., “A Multicomputer System With
`Dynamic Architecture,” IEEE Transactions on Computers,
`vol. C—28, No. 10, pp. 704—721 (Oct. 1979).
`D. Bursky, “Programmable Data Paths Speed Computa-
`tions,” Electronic Design, pp. 171—174 (May 1, 1995).
`V. Bove, Jr., et al., “Cheops: A Reconfigurable Data—Flow
`System for Video Processing,” IEEE Transactions on Cir-
`cuits and Systems for Video Technology, pp. 140—149
`(1995).
`M. Schaffner, “Processing by Data and Program Blocks,”
`Transactions on Computers, vol. C—27, No.
`11, pp.
`1015—1027 (Nov. 1978).
`J. Nickolls, “The Design of the MasPar MP—1: A Cost
`Effective Massively Parallel Computer,” IEEE CH2843, pp.
`25—28 (Jan. 1990).
`
`B. Narasimha, “Performance—Oriented, Fully Routable
`Dynamic Architecture for a Field Programmable Logic
`Device,” UCB/ERL M93/42, University of California, Ber-
`keley, pp. 1—21 (Jun. 1993).
`
`M. Bolotski, et al., “A 1024 Processor 8ns SIMD Array,”
`Advanced Research in VLSI 1995, pp. 1—13 (1995).
`
`D. Cherepacha, et al., “A Datapath Oriented Architecture for
`FPGAs,” Second International ACM/SIGDA Workshop on
`Field Programmable Gate Arrays ACIVI, pp. 1—11 (Feb.
`1994).
`
`D. Jones, et al., “A Time—Multiplexed FPGA Architecture
`for Logic Emulation,” Proceedings of the IEEE 1995 Cus-
`tom Integrated Circuits Conference, pp. 495—498 (May
`1995).
`
`G. Nutt, “Microprocessor Implementation of a Parallel Pro-
`cessor,” Proceedings of the Fourth Annual Symposium on
`Computer Architecture, pp. 147—152, (1977).
`
`W. Kim, “MasPar MP—2 PE Chip: ATotally Cool Hot Chip,”
`Proceedings of Hot Chips V pp. 1—5 (Mar. 29, 1993).
`
`E. Mirsky, et al., “Matrix: Coarse—Grain Reconfigurable
`Computing (Abstract)”, Published at the 5th Annual MIT
`Student Workshop on Scalable Computing, pp. 1—2 (Aug.
`1995) (Available on the Internet May 1, 1995).
`
`E. Mirsky, et al., “Matrix: A Reconfigurable Computing
`Architecture With Configurable Instruction Distribution and
`Deployable Resources,” Published at FCCM’96 —IEEE
`Symposium on FPGA’s for Custom Computing Machines,
`pp. 1—10 (Apr. 17—19, 1996).
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 2
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 2
`
`

`

`US. Patent
`
`Sep. 21, 1999
`
`Sheet 1 0f 20
`
`5,956,518
`
`
`
`Emkm>m032m
`
`_0_ mommmoommomog
`
`._._mm
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 3
`
`Petitioner Microsoft Corporation - EX. 1019, p. 3
`
`

`

`US. Patent
`
`Sep. 21, 1999
`
`Sheet 2 0f 20
`
`5,956,518
`
`Emkm>m
`
`>>_|_> mOmmmoOmaomBE
`
`.EmNM
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 4
`
`Petitioner Microsoft Corporation - EX. 1019, p. 4
`
`

`

`US. Patent
`
`Sep. 21, 1999
`
`Sheet 3 0f 20
`
`5,956,518
`
`
` DEED
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 5
`
`Petitioner Microsoft Corporation - EX. 1019, p. 5
`
`2mhm>m02:2tmm
`
`

`

`US. Patent
`
`Sep.21, 1999
`
`Sheet 4 0f 20
`
`5,956,518
`
`
`CONTEXT
`
`
`Configuration
`DATA
`ADR
`Memory
`
`
`
`
`m_5
`
`WE
`
`
`
`WE
`
`
`
` DATA
`MODE
`
`
`Memory
`Block
`
`
`
`
`
`
`A_PORT B_PORT
`
`IIO
`A_ADR — B_ADR
`
`
`
`
`Memory
`
`Funcfion
`
`
`Port
`
`l22
`
`Net 0 k
`
`l32
`
`ALU
`
`Function
`
`Port
`
`
` I34
`
`Control
`
`Logic
`
`H4
`
`Corry In
`
`'20
`
`Carry Out
`
`FIG. 6
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 6
`
`Petitioner Microsoft Corporation - EX. 1019, p. 6
`
`

`

`US. Patent
`
`Sep. 21, 1999
`
`Sheet 5 0f 20
`
`5,956,518
`
`I?)
`
`IliE..-
`
`mm2 -N._—Z§=Nn=—3—EN...
`.:___w..=_m_=.mm;
`...u.__mm_=.m_=_z__m_
`IllII_-‘i-IIIIIII
`
`.z__-n__=.\n_.__-m__._._.
`-N-_-Z__=__N-__-E=_N-
`
`IIII.....ulHIIInfin"
`
`"m""m."u""u._...nmu‘.
`
`llnwumfl'lI"W—uln
`
`36
`
`FIG. 8
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 7
`
`Petitioner Microsoft Corporation - EX. 1019, p. 7
`
`

`

`US. Patent
`
`Sep.21, 1999
`
`Sheet 6 0f 20
`
`5,956,518
`
`Level—2, Level—3
`Network
`
`Incoming Network
`Lines(L1,L2,L3}
`
`'36,
`'37
`
`Incoming Network
`Lines (L1, L2, L3)
`
`
`Network
`
`Level 2,3
`
`Network
`
`Floating Portl
`
`Floating Port 2
`
`
`
`
`
`
`Switch1(N_1) 1 NetworkDrivers
`Switch2(N3) I
`
`L3 Control fl
`
`1' (FP2) I
`" (E)
`I
`
`
`
` C/R LogiCI
`
`
`
`Control Context
`
`Select
`
`202
`
`Function
`
`(F_m)
`
`COHtYOI Bli
`2|7
`
` Level—1
`
`Level-l
`C/R Logic
`Network
`
`FIG. 9
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 8
`
`Petitioner Microsoft Corporation - EX. 1019, p. 8
`
`

`

`US. Patent
`
`Sep. 21, 1999
`
`Sheet 7 0f 20
`
`5,956,518
`
`
`
`Control Byte
`2E
`
`Control Bit
`_2|_7
`
`FPout
`
`Control Bit
`2a
`
`Configuration Configuration
`Word 8
`Word A
`
`I62
`
`Registers on
`A3 Ports Only
`
`Configuration Configuration
`Word B
`Word A
`
`FIG. 11
`
`ADA,ADB,NI,N2
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 9
`
`Petitioner Microsoft Corporation - EX. 1019, p. 9
`
`

`

`US. Patent
`
`Sep.21, 1999
`
`Sheet 8 0f 20
`
`5,956,518
`
`Local Output
`
`Level—l
`
`I2x8
`
`Level—2
`
`Level—3
`
`Control Byte
`.212
`
`Control Bit
`2|?
`
`Configuration Configuration
`Word A
`
`FIG. 12
`
`Word B
`.LevelISouth i
`
`Level-1 NorthWest
`
`Level-l North
`
`W-Enable
`
`I78
`
`LLevel—I—NorthEastj.
`“—7—
`—->t Level-1 East]
`
`Eevel- lSouthWests_J
`S_Enable
`
`E_Enable
`
`BFU Output
`
`__ _i__,E
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 10
`
`Petitioner Microsoft Corporation - EX. 1019, p. 10
`
`

`

`US. Patent
`
`Sep.21, 1999
`
`Sheet 9 0f 20
`
`5,956,518
`
`Enable
`
`Reg Enable
`(Level-2 Only)
`
`Select
`
`Ntout
`
`N20ut
`
`FPlout
`
`FPZout
`Broadcast Cycle~|92
`
`FIG. 14
`
`'36
`
`F|G.15
`
`FIG. 16
`
`BFU Output
`
`Control
`
`Context Select
`
`l—
`l
`:
`
`202
`
`‘1
`II
`
`Match?
`
`to Leve|1
`
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 11
`
`L. __ _ _ _ _|
`
`Petitioner Microsoft Corporation - EX. 1019, p. 11
`
`

`

`US. Patent
`
`Sep. 21, 1999
`
`Sheet 10 0f 20
`
`5,956,518
`
`____
`
`____
`
`9OI
`
`_tm_9N
`
`
`
`03m..9280
`
`:m.9550
`
`
`
`BmEmeES_ouon_
`
`mco_n_IO 82828:50
`:m6.230
`
`.33
`
`«in.AN\:
`
`{0382N4
`
`,3th
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 12
`
`Petitioner Microsoft Corporation - EX. 1019, p. 12
`
`

`

`US. Patent
`
`Sep. 21, 1999
`
`Sheet 11 0f 20
`
`5,956,518
`
`Qty
`
`mo.mNww
`
`22:8685
`
`cm:
`
`
`
`366:mEoEEEooi
`
`HH23:00H$250
`
`H33:00H$82.50
`
`
`
`3:396:mEoEEEooE
`
`O_N
`
`NN
`
`Iwe
`
`H.“8.6mm:ano
`
`8:959:00
`
`m.<9.25
`
`aQC
`
`2.5.9250
`
`:m_o:coo
`
`ON.o_n_
`
`33.85558
`39:0Em
`Petitioner Microsoft Corporation - Ex. 1019, p. 13
`
`Petitioner Microsoft Corporation - EX. 1019, p. 13
`
`NON
`
`ESQ9:62“.
`
`Htom0562....
`
`3262062
`
`moznwm\ano
`
`8252062
`
`
`
`
`
`
`

`

`US. Patent
`
`Sep.21, 1999
`
`Sheet 12 0120
`
`5,956,518
`
`I/OPorts
`
`FIG.22
`
`I/OPorts
`
`I/OPorts
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 14
`
`Petitioner Microsoft Corporation - EX. 1019, p. 14
`
`m
`”C
`
`OD
`
`. O\l
`
`—l
`
`

`

`US. Patent
`
`Sep.21, 1999
`
`Sheet 13 0120
`
`5,956,518
`
`Data OE
`
`I/OOutputData
`
`I/OInputData
`
`1/0 Bit 0 Output
`
`I/O Bit 0 Input
`
`332
`
`SIG
`
`320
`
`} 8
`
`{
`
`322
`
`334
`
`336
`
`BitOOE
`324 312
`I/O Register } 1
`
`I/O
`Pads
`
`I/O Register ‘ 326
`
`338
`
`340
`
`'1
`B' 105
`328
`
`314
`
`NO
`
`330
`Enable
`
`I/O Bitl Output
`
`I/O Register > 1
`
`I/O BitI Input
`
`I/O Register ‘
`
`342
`
`FIG. 23
`
`Data In
`
`Data Out
`
`Master CLK
`
`350
`
`332—342
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 15
`
`Petitioner Microsoft Corporation - EX. 1019, p. 15
`
`

`

`US. Patent
`
`Sep.21, 1999
`
`Sheet 14 0f 20
`
`5,956,518
`
`360
`
`PLA In Data
`
`I/O Input Data
`
`C/R Out
`
`I/O Input Bits
`
`PLA Out Data 372
`
`' C/R In 374
`
`1/0 OutpuTBits 376
`
`1/0 Oquuf Enables378
`
`_/0Port
`
`Tm?)36086I_m386360
`
`_/0Port-386
`
`PLA 360
`
`Col.5
`
`Output Multiplexor 384
`
`Control
`
`388
`
`Output
`Selector
`
`Col. I
`
`Co|.2
`
`Col. 3
`
`Col.4
`
`FIG. 26
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 16
`
`Petitioner Microsoft Corporation - EX. 1019, p. 16
`
`

`

`US. Patent
`
`Sep. 21, 1999
`
`Sheet 15 0f 20
`
`5,956,518
`
`cozoflom
`
`32:0
`
`7.93
`
`N123
`
`$65..
`
`m#:a:_
`
`meN
`
`{02:02
`
`8:059:80
`
`20>)
`
`2838.oh
`
`93935<.E
`
`mmGE
`
`9.:33:0
`
`28539:0
`
`20033:0960
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 17
`
`Petitioner Microsoft Corporation - EX. 1019, p. 17
`
`
`

`

`US. Patent
`
`Sep.21, 1999
`
`Sheet 16 0120
`
`5,956,518
`
`
`
`
`FIG. 29
`
`BFU Output
`
`I/O Port 0 Input Data
`
`I/O Portl Input Data
`
`I/O Port 2 Input Doto
`PLA 0 Output Byte
`
`PLA I Output Byte
`
`PLA 2 Output Byte
`
`Configuration
`
`Word
`
`FIG. 30
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 18
`
`Petitioner Microsoft Corporation - EX. 1019, p. 18
`
`

`

`US. Patent
`
`Sep.21, 1999
`
`Sheet 17 0120
`
`5,956,518
`
`_J
`
`C/R
`for
`
`Control
`Llnes
`
`FIG. 31
`
`2x5
`
`Network
`Inputs
`i6X4
`
`Level—1
`
`Level-2
`Level-3
`I/O lnpUiS
`
`PLA Outputs
`
`Selec
`Word
`
`Dynamic Source Switch
`
`Configuration
`
`FIG. 32
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 19
`
`Petitioner Microsoft Corporation - EX. 1019, p. 19
`
`

`

`US. Patent
`
`Sep.21, 1999
`
`Sheet 18 0f 20
`
`5,956,518
`
`460
`
`462
`
`
`
`
`
`Control Lines
`
`
`Level-3
`Controller
`
`Dynamic Control
`Switch
`
`Level-3
`Controller
`
`Level—3-1
`Level-3-2
`Control Lines
`
`
`FIG. 33
`
`Level—l
`
`Level-2
`
`Level—3
`
`I/O Inputs 3
`PLA Outputs
`
`X4
`
`Configuration
`
`522
`
`534
`
`Word
`FIG. 36
`Petitioner Microsoft Corporation - Ex. 1019, p. 20
`
`524
`
`(I6 bits output
`over 2 cycles)
`
`
`Petitioner Microsoft Corporation - EX. 1019, p. 20
`
`

`

`US. Patent
`
`Sep. 21, 1999
`
`Sheet 19 0f 20
`
`5,956,518
`
`3228N53
`
`38825w:
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 21
`
`Petitioner Microsoft Corporation - EX. 1019, p. 21
`
`

`

`US. Patent
`
`Sep.21, 1999
`
`Sheet 20 0f 20
`
`5,956,518
`
`,—
`
`52‘
`:6
`
`‘9' I"-i E
`"Hill
`I=!"I I
`“fix“
`I=!['I I.I 2g
`I-u
`,
`J'ilii—
`E
`'9’."II. l
`”HEP—.1
`
`III I I
`
`'I'HIIJII
`
`.-
`">3
`
`92‘.
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 22
`
`Petitioner Microsoft Corporation - EX. 1019, p. 22
`
`

`

`5,956,518
`
`1
`INTERMEDIATE-GRAIN
`RECONFIGURABLE PROCESSING DEVICE
`
`GOVERNMENT FUNDING
`
`This invention was made with Government support under
`the Advanced Research and Projects Agency (ARPA) of the
`Department of Defense under Rome Labs Contract No.
`F30602-94-C-0252. The Government has certain rights in
`the invention.
`
`BACKGROUND OD THE INVENTION
`
`Continuing advances in semiconductor technology have
`greatly increased the amount of processing that can be
`performed by single-chip, general-purpose computing
`devices. The relatively slow increase in inter-chip commu-
`nication bandwidth requires that modern high performance
`devices use as much of the potential on-chip processing
`power as possible. This results in large, dense integrated
`circuit devices and a large design space of processing
`architectures.
`
`One way of viewing this design space is in terms of
`granularity. Designers have the option of building very large
`processing units, or many smaller ones, in the same space.
`Traditional architectures are either very coarse grain, such as
`microprocessors, or very fine grain, such as field program-
`mable gate arrays (FPGAs). Both architectures have advan-
`tages and disadvantages.
`Microprocessors incorporate very few large processing
`units that operate on wide data-words, and each unit is
`hardwired to perform defined instructions on these data-
`words. Usually each unit is optimized for a different set of
`instructions, such as integer and floating point, and the units
`are generally hardwired to operate in parallel. The hardwired
`nature of these units allows very rapid instructions. In fact,
`a great deal of area on modern microprocessor chips is
`dedicated to cache memories in order to support a very high
`rate of instruction issue. Thus, the devices efficiently handle
`very dynamic instruction streams.
`Very fine grain devices, such as FPGAs, incorporate a
`large number of very small processing elements. These
`elements are arranged in a configurable interconnect net-
`work. The configuration data used to define the functionality
`of the processing units and network can be thought of as a
`very large, semantically powerful, instruction word. Nearly
`any operation can be described and mapped to hardware.
`
`SUMMARY OF THE INVENTION
`
`Unfortunately, because microprocessors are highly opti-
`mized for simple, wide-word, dynamic instructions, they are
`relatively inefficient when performing other kinds of opera-
`tions. For example, many cycles are required to build up
`complex operations that are not part of the processor’s
`pre-selected instruction set. Also, when performing short-
`word operations, much of the processing unit is not being
`used, and when the instructions being issued are very
`regular, the large instruction caches are unnecessary. Thus,
`very coarse-grain microprocessors are not equipped to take
`the maximum advantage of these cases.
`The size of the “instruction word” creates a number of
`
`problems with fine-grain FPGA devices, however. Reload-
`ing new instructions takes a relatively long time, making
`dynamic instruction streams very difficult for these devices.
`Moreover, if the operation being performed is, in fact, a wide
`word operation, a great deal of this “instruction word” must
`be dedicated to re-describing the operation for each of the
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`small processing elements. Thus, fine grain processing ele-
`ments are not well equipped to take advantage of a large
`number of common computing operations.
`The present
`invention utilizes a large number of
`intermediate-grain processing elements which are arranged
`in a configurable mesh. Thus,
`the regularity and rapid
`instruction issue features of coarse-grain units are exploited,
`but a reconfigurable or programmable interconnect allows
`these units to be connected in an application-specific man-
`ner. This means that coarse-grain resources, such as memory
`and processing, can be deployed in a way that takes advan-
`tage of the opportunities for optimization present in any
`given problem. In addition, configuration memories may be
`deployed to take advantage of application specific redun-
`dancy.
`In general according to one aspect, the invention features
`a programmable integrated circuit that comprises a logic
`units that perform operations on data in response to instruc-
`tions and memories that store and retrieve addressed data. A
`
`configurable or programmable interconnect provides a mode
`of signal transmission between the logic units and memo-
`ries. Configuration control data defines data paths through
`the interconnect, which can be address inputs to memories,
`data inputs to memories and logic units, and instruction
`inputs to logic units. Thus, the interconnect is configurable
`to define an interdependent functionality of the functional
`units. A programmable configuration storage stores the
`configuration control data.
`Thus the present invention may be configured to operate
`according to a number of traditionally distinct computing
`architectures. For example, a centrally located functional
`unit may be assigned the role of arithmetic logic unit (ALU)
`with memories of surrounding functional units being con-
`figured to act as instruction caches, register files, and pro-
`gram counters. Wider data paths are accommodated by tying
`near-neighbor ALUs to each other. Wider instructions are
`achieved by configuring instruction memories of separate
`functional units as if they were a single memory. For a
`different problem, the same integrated circuit may be recon-
`figured to emulate a single instruction multiple data (SIMD)
`architecture. The logic units of rows of functional units are
`tied together to create wider data paths, and the rows
`perform separate serial tasks.
`In specific embodiments, functional units may provide at
`least part of the instructions to logic units of other functional
`units. Also,
`the configuration storage may hold multiple
`contexts of configuration control data for reconfiguration of
`the programmable interconnect.
`In other embodiments, the interconnect may support three
`different modes of operation: a static value in which a value
`set by the configuration data is provided to a functional unit
`or static source in which another functional unit serves as the
`
`value source. A dynamic source mode can be included in
`which the source is determined by the value from another
`functional unit.
`
`In still other embodiments, each logic unit can also have
`programmable logic arrays on data paths between functional
`units which perform bit level logic operations. Additionally,
`reduction logic can be added that performs logic operations
`on the output of the logic units and passes a result to other
`functional units as control information. Network drivers are
`
`assigned to each unit to transmit received signals to other
`functional units. The sources of the signals received by the
`drivers may also be dynamic so that the sources are pro-
`grammable by other functional units.
`the invention
`In general according to another aspect,
`features an integrated reconfigurable computing device,
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 23
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 23
`
`

`

`5,956,518
`
`3
`which has functional units of multi-bit arithmetic logic units
`and memories. A configurable interconnect that connects the
`units includes function ports which determine the source of
`the instructions to the logic units. Network ports of the units
`are configurable by the functional units and determine the
`source of addresses to the memories and the source of data
`
`to the logic units and memories.
`In general according to still another aspect, the invention
`can also be characterized in the context of a method for
`
`organizing signal transmission within an array of functional
`units. Data read from the memories of functional units may
`be transmitted as instructions to the logic units of other
`functional units. Also, data read from logic units may be
`transmitted as addresses for the memories of other func-
`
`tional units. Finally, the data read from functional units can
`also be used as data inputs for the logic units of other
`functional units.
`
`In specific embodiments, the paths of the data and instruc-
`tions are dynamic in response to control from the functional
`units. More specifically, static values, values from other
`functional units, and values from sources may be transmitted
`between functional units.
`
`The above and other features of the invention including
`various novel details of construction and combinations of
`
`parts, and other advantages, will now be more particularly
`described with reference to the accompanying drawings and
`pointed out in the claims. It will be understood that the
`particular method and device embodying the invention are
`shown by way of illustration and not as a limitation of the
`invention. The principles and features of this invention may
`be employed in various and numerous embodiments without
`departing from the scope of the invention.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`In the accompanying drawings, reference characters refer
`to the same parts throughout the different views. The draw-
`ings are not necessarily to scale; emphasis has instead been
`placed upon illustrating the principles of the invention. Of
`the drawings:
`FIG. 1 shows a programmable integrated processing
`device of the present invention, which has been configured
`as an 8-bit microprocessor;
`FIG. 2 shows a SIMD processor configuration for the
`processing device according to the invention;
`FIG. 3 shows a 32-bit processor configuration for the
`processing device according to the invention;
`FIG. 4 shows a very long instruction word (VLIW)
`processor configuration for the processing device according
`to the invention;
`
`FIG. 5 shows multiple instruction multiple data (MIMD)
`processor configuration for the processing device according
`to the invention;
`FIG. 6 is a block diagram showing the architecture of a
`basic functional unit (BFU) core of the present invention;
`FIG. 7 is a block diagram showing the inter-BFU con-
`nectivity provided by the level-1 network connections;
`FIG. 8 is a block diagram showing the BFU interconnec-
`tion provided by the level-2 network connections;
`FIG. 9 is a block diagram showing the network switch
`architecture for a BFU of the present invention;
`FIG. 10 is a block diagram illustrating the function switch
`architecture of the present invention;
`FIG. 11 is a block diagram showing the address/data and
`network switch architecture of the present invention;
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`FIG. 12 is a block diagram illustrating the floating port
`architecture of the present invention;
`FIG. 13 is a block diagram showing the level-1 network
`drivers of the present invention;
`FIG. 14 shows the level-2 drivers of the present invention;
`FIG. 15 shows the level-3 drivers of the present invention;
`FIG. 16 shows BFU input registers of the present inven-
`tion;
`FIG. 17 shows the reduction logic in the BFU control
`architecture of the present invention;
`FIG. 18 is an example of multi-BFU reduction performed
`by the reduction logic of the present invention;
`FIG. 19 is a block diagram illustrating the operation of the
`distributed programmable logic array (PLA) associated with
`each BFU according to the invention;
`FIG. 20 is a block diagram showing the control logic for
`a single BFU;
`FIG. 21 shows an alternative embodiment of the configu-
`ration memory supporting multiple contexts;
`FIG. 22 is a block diagram of the configurable logic
`device of the present invention in the form of an integrated
`chip;
`FIG. 23 is a block diagram showing the input/output port
`architecture for the chip of the present invention;
`FIG. 24 is a block diagram showing the structure of an I/O
`register according to the invention;
`FIG. 25 is a block diagram of a programmable logic array
`for customizing the chip’s interface;
`FIG. 26 is a block diagram showing the movement of data
`from the BFU core off-chip according to the invention;
`FIG. 27 is a block diagram of a selector switch that
`chooses the core outputs to be driven on an output wire
`according to the invention;
`FIG. 28 is a block diagram showing a tri-state buffer used
`in the selector switch of FIG. 20;
`FIG. 29 is a block diagram illustrating how data enters the
`BFU core from off-chip;
`FIG. 30 is a block diagram showing the selector switch
`that selects among incoming data bytes from I/O ports and
`PLAs according to the invention;
`FIG. 31 is a block diagram of a C/R input architecture
`according to the invention;
`FIG. 32 is a block diagram showing the construction of
`the controller switches of the level-3 network lines accord-
`
`ing to the invention;
`FIG. 33 is a block diagram illustrating the dynamic
`control of the controller switches, which is shared between
`pairs of controllers at each column, according to the inven-
`tion;
`FIG. 34 shows the architecture of one of the dynamic
`control switches according to the invention;
`FIG. 35 is a block diagram showing the connectivity of
`BFUs in a systolic-type configuration according to the
`invention;
`FIG. 36 shows the configuration of the BFUs for a
`microcoded-type implementation for the convolution prob-
`lem according to the invention;
`FIG. 37 shows the organization of the BFUs for a VLIW,
`horizontal microcode-type implementation according to the
`invention; and
`FIG. 38 shows the organization of the BFUs for a VLIW/
`MSIMD-type implementation according to the invention.
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 24
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 24
`
`

`

`5,956,518
`
`5
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`
`FIG. 1 shows a multi-bit microprocessor configuration of
`a reconfigurable processing device, which has been con-
`structed and programmed according to the principles of the
`present invention. A two-dimensional array of basic func-
`tional units 100 are located in a programmable interconnect
`101. Five of the BFUs 100 and the portion of the reconfig-
`urable interconnect connecting the BFUs have been config-
`ured to operate as a microprocessor 102.
`Each of the BFUs 100 preferably has addressable memory
`resources and logic resources, such as an 8-bit arithmetic
`logic unit (ALU). One of the BFUs 100, denoted ALU,
`utilizes its logic resources to perform the logic operations of
`the microprocessor 102 and utilizes its memory resources as
`a data store and/or extended register file. Another BFU
`operates as a function store F that controls the successive
`logic operations performed by the logic resources of the
`ALU. Two additional BFUs, A and B, operate as further
`instruction stores that control the addressing of the memory
`resources of the ALU. A final BFU, PC, operates as a
`program counter for the various instruction BFUs F, A, B.
`As shown in FIG. 2, the same reconfigurable processing
`array, however, may be reprogrammed to function as a
`SIMD system, and as described below, this reconfiguration
`can occur on a cycle-by-cycle basis. The functions of the
`program counter PC and instruction stores A, B and F have
`been again assigned to different BFUs 100, but the ALU
`function has been replicated into 12 BFUs. Each of the
`ALUs is connected via the reconfigurable interconnect 101
`to operate on globally broadcast
`instructions from the
`instruction stores A, B, F. These same operations are per-
`formed by each of these ALU, or common instructions may
`be broadcast on a row-by-row basis.
`FIG. 3 shows how wider data paths can be constructed in
`the programmable device. This 32-bit microprocessor con-
`figured device has the same instruction stores A, B, F and
`program counter as described in connection with FIG. 1.
`Four BFUs, however, have been assigned an ALU operation,
`and the ALUs are chained together to act as a single 32-bit
`wide microprocessor in which the interconnect 101 supports
`carry-in and carry-out operations between the ALUs.
`FIG. 4 shows how the device can be configured to operate
`as a very long instruction word (VLIW) system. The various
`instruction stores A, B, F are defined to encompass multiple
`BFUs 100 to accommodate the desired instruction word
`width.
`
`FIG. 5 shows the configuration of the present system to
`operate as a multiple instruction multiple data (MIMD)
`system. The 8-bit microprocessor configuration 102 of FIG.
`1 is replicated into an adjacent set of BFUs to accommodate
`multiple,
`independent processing units within the same
`device. Of course, wider data paths could also be accom-
`modated by chaining ALUs of each processor 102 to each
`other.
`1. Basic Functional Unit Architecture
`
`FIG. 6 shows the moderately coarse grain, preferably
`8-bit, BFU core. Primarily, the BFU core has memory block
`110, basic ALU core 120, and configuration memory 105.
`The main memory block 110 is a 256 word><8 bit wide
`memory, which is arranged to be used in either single or dual
`port modes. In dual port mode, the memory size is reduced
`to 128 words in order to be able to perform the two
`simultaneous read operations without increasing the read
`latency of the memory. The memory mode is controlled by
`control logic 114 accessed through a Memory/MuX function
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`port 112, and the write enable can be controlled either
`through the memory/muX function port 112 or by the control
`logic 134 accessed through ALU function port 132. Control
`logic is hardwired and also controls the ALU functions.
`In single port mode, the memory 110 uses the AiADR
`port for an address and outputs the selected value to both
`AiPORT and BiPORT. In dual port mode, the AiADR
`port selects a value for AiPORT only, and BiADR port
`selects a value for the BiPORT.
`In either

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket