`
`[19]
`
`[11] Patent Number:
`
`5,956,518
`
`DeHon et al.
`[45] Date of Patent: Sep. 21, 1999
`
`
`
`US005956518A
`
`[54]
`
`INTERMEDIATE-GRAIN
`RECONFIGURABLE PROCESSING DEVICE
`
`[75]
`
`Inventors: André DeHon; Ethan Mirsky, both of
`Cambridge; Thomas F. Knight, Jr.,
`Belmont, all of Mass.
`
`Assignee:
`
`Massachusetts Institute of
`
`Technology, Cambridge, Mass.
`
`Appl. No.: 08/632,371
`
`Filed:
`
`Apr. 11, 1996
`
`Int. Cl.6 ...................................................... G06F 15/80
`US. Cl.
`....................................... 395/800.15; 395/653
`Field of Search ............................ 395/800.1, 800.15,
`395/20051, 280, 653
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,597,041
`............................ 364/200
`6/1986 Guyer et a1.
`4,748,585
`.. 364/900
`5/1988 Chiarulli et a1.
`
`4,754,412
`6/1988 Deering ..........
`364/736
`
`4,858,113
`8/1989 Saccardi ..
`364/200
`4,870,302
`9/1989 Freeman ..
`307/465
`4,873,626
`10/1989 Gifford .......
`364/200
`
`5,020,059
`5/1991 Gorin et a1.
`. 371/11.3
`5,233,539
`.......
`8/1993 Agrawal et a1.
`.. 364/489
`5,239,654
`.. 395/800
`8/1993 Ing—Simmons et a1.
`
`5,241,635
`...... 395/375
`8/1993 Papadopoulos et a1.
`..
`
`5,265,207
`. 395/200.31
`11/1993 Zak .......................
`5,301,340
`4/1994 Cook ............. 395/800
`
`5,305,462
`4/1994 Grondalski .
`395/800.1
`
`5,336,950
`.
`.. 307/465
`8/1994 Popli et a1.
`5,426,378
`6/1995 Ong ................. 326/39
`
`5,457,644 10/1995 McCollum .....
`364/716
`11/1997 Casselman .............................. 395/500
`5,684,980
`
`OTHER PUBLICATIONS
`
`Takashi Miyamori et al., “A Quantitative Analysis of Recon-
`figurable Coprocessors for Multimedia Applications,” IEEE
`Symposium on Field—Programmable Custom Computing
`Machines Conference (FCCM98), Apr. 15—17, 1998.
`
`Charlé Rupp et al., “The Napa Adaptive Processing Archi-
`tecture”, IEEE Symposium on Field—Programmable Custom
`Computing Machines Conference (FCCM98), Apr. 15—17,
`1998, pp. 1—10.
`
`Stephen M. Scalera et al., The Design and Implementation
`of a Context Switching FPGA,
`IEEE Symposium on
`Field—Programmable Custom Computing .
`
`T. Bridges, “The GPA Machine: A Generally Partitionable
`MSIMD Architecture,” Third Symposium on the Frontier of
`Massively Parallel Computation Proceedings IEEE pp.
`196—203 (1990).
`
`P. Clarke, “Pilington Preps Reconfigurable Video DSP,”
`News (Aug. 7, 1995).
`
`DC. Chen, et al., “A Reconfigurable Multiprocess IC for
`Rapid Prototyping of Algorithmic—Specific High—Speed
`DSP Data Paths,” IEEE Journal ofSolid—State Circuits, vol.
`27 (12): 1895—1904 (Dec 1992).
`
`AK. Yeung, et al., “TA 6.3: A2.4GOPS Data—Given Recon-
`figurable Multiprocessor IC for DSP,” IEEE International
`Solid—State Circuits Conference, pp. 108—109, 346 (1995).
`
`(List continued on next page.)
`
`Primary Examiner—Eric Coleman
`Attorney, Agent, or Firm—Hamilton, Brook, Smith &
`Reynolds, PC.
`
`[57]
`
`ABSTRACT
`
`Aprogrammable integrated circuit utilizes a large number of
`intermediate-grain processing elements which are multibit
`processing units arranged in a configurable mesh. The
`coarse-grain resources, such as memory and processing, are
`deployable in a way that takes advantage of the opportuni-
`ties for optimization present in given problems. To accom-
`plish this, the interconnect supports three different modes of
`operation: a static value in which a value set by the con-
`figuration data is provided to a functional unit, static source
`in which another functional unit serves as the value source,
`and a dynamic source mode in which the source is deter-
`mined by the value from another functional unit.
`
`31 Claims, 20 Drawing Sheets
`
`8 BIT M|CROPROCESSOR
`
`
`IOC
`
`
`
`Jfijj
`£11m J
`[1
`J o
`
`:L—U-i“ ofilj d
`
`
`
`
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 1
`
`Petitioner Microsoft Corporation - EX. 1019, p. 1
`
`
`
`5,956,518
`
`Page 2
`
`OTHER PUBLICATIONS
`
`J .E. Brewer, et al., “A Monolithic Processing Subsystem,”
`IEEE Transactions on Components, Packaging, and Manu-
`facturing Technology —Part B, vol. 17(3) :310—317 (Aug.
`1994).
`I. Gilbert, Chapter 11 —Mesh Multiprocessing, The Lincoln
`Laboratory Journal, 1(1) :11.1—11.18 (Spring 1988).
`Ira H. Gilbert, et al., “The Monolithic Synchronous Proces-
`sor,” Lincoln Laboratory, Massachusetts Institute of Tech-
`nology, Lexington, IVIA 02173 (No Date Given).
`G. Masera, et al., “A Microprogrammable Parallel Archi-
`tecture for DSP,” IEEE, pp. 824—827, (1991).
`L. Wang, et al., “Distributed Instruction Set Computer,”
`Proceedings of the 1988 International Conference on Par-
`allel Processing, vol. 1, pp. 426—429.
`M. Sowa, et al., “Parallel Execution on the Function—Parti-
`tioned Processor With Multiple Instruction Streams,” Sys-
`tems and Computers in Japan, 22(4) :22—27 (Nov. 1991).
`T. Alexander, et al., “A Reconfigurable Approach to a
`Systolic Sorting Architecture,” IEEE, pp.
`1178—1182
`(1989).
`Z. Blazek, et al., “Design of a Reconfigurable Parallel
`RISC—Machine,” North—Holland Microprocessing and
`Microprogramming 21, pp. 39—46, (1987).
`S. Morton, et al., The Dynamically Reconfigurable CAP
`Array Chip I, IEEE Journal of Solid—State Circuits, SC21
`(5) :820—826 (Oct. 1986).
`L. Snyder,
`“A Taxonomy of Synchronous Parallel
`Machines,” Proceedings of the 1988 International Confer-
`ence on Parallel Processing, pp. 281—285 (Aug. 1988).
`L. Snyder, “An Inquiry into the Benefits of Multigauge
`Parallel Computation,” Proceedings of the 1985 Interna-
`tional conference on Parallel Processing, pp. 488—492
`(Aug. 1985).
`A. DeHon, “DPGA Utilization and Application,” FPGA
`’96—ACM/SIGDA Fourth International Symposium on
`FPGAs, Monterey, CA (Feb. 11—13, 1996).
`D. Epstein, “Chromatic Raises the Multimedia Bar,” Micro-
`processor Report, pp. 23—27 (Oct. 23, 1995).
`J. Labrousse, et al., “Create—Life: A Modular Design
`Approach for High Performance ASIC’s,” IEEE CH2843,
`pp. 427—433 (Jan. 1990).
`M. Slater, “MicroUnity Lift Veil on MediaProcessor,”
`Microprocessor Report, pp. 11—18 (Oct. 23, 1995).
`E. Tau, et al., “A First Generation DPGa Implementation,”
`FPD ’95 —Third Canadian Workshop of Field—Program-
`mable Devices Montreal, Canada (May 29 —Jun. 1, 1995).
`
`S. Kartashev, et al., “A Multicomputer System With
`Dynamic Architecture,” IEEE Transactions on Computers,
`vol. C—28, No. 10, pp. 704—721 (Oct. 1979).
`D. Bursky, “Programmable Data Paths Speed Computa-
`tions,” Electronic Design, pp. 171—174 (May 1, 1995).
`V. Bove, Jr., et al., “Cheops: A Reconfigurable Data—Flow
`System for Video Processing,” IEEE Transactions on Cir-
`cuits and Systems for Video Technology, pp. 140—149
`(1995).
`M. Schaffner, “Processing by Data and Program Blocks,”
`Transactions on Computers, vol. C—27, No.
`11, pp.
`1015—1027 (Nov. 1978).
`J. Nickolls, “The Design of the MasPar MP—1: A Cost
`Effective Massively Parallel Computer,” IEEE CH2843, pp.
`25—28 (Jan. 1990).
`
`B. Narasimha, “Performance—Oriented, Fully Routable
`Dynamic Architecture for a Field Programmable Logic
`Device,” UCB/ERL M93/42, University of California, Ber-
`keley, pp. 1—21 (Jun. 1993).
`
`M. Bolotski, et al., “A 1024 Processor 8ns SIMD Array,”
`Advanced Research in VLSI 1995, pp. 1—13 (1995).
`
`D. Cherepacha, et al., “A Datapath Oriented Architecture for
`FPGAs,” Second International ACM/SIGDA Workshop on
`Field Programmable Gate Arrays ACIVI, pp. 1—11 (Feb.
`1994).
`
`D. Jones, et al., “A Time—Multiplexed FPGA Architecture
`for Logic Emulation,” Proceedings of the IEEE 1995 Cus-
`tom Integrated Circuits Conference, pp. 495—498 (May
`1995).
`
`G. Nutt, “Microprocessor Implementation of a Parallel Pro-
`cessor,” Proceedings of the Fourth Annual Symposium on
`Computer Architecture, pp. 147—152, (1977).
`
`W. Kim, “MasPar MP—2 PE Chip: ATotally Cool Hot Chip,”
`Proceedings of Hot Chips V pp. 1—5 (Mar. 29, 1993).
`
`E. Mirsky, et al., “Matrix: Coarse—Grain Reconfigurable
`Computing (Abstract)”, Published at the 5th Annual MIT
`Student Workshop on Scalable Computing, pp. 1—2 (Aug.
`1995) (Available on the Internet May 1, 1995).
`
`E. Mirsky, et al., “Matrix: A Reconfigurable Computing
`Architecture With Configurable Instruction Distribution and
`Deployable Resources,” Published at FCCM’96 —IEEE
`Symposium on FPGA’s for Custom Computing Machines,
`pp. 1—10 (Apr. 17—19, 1996).
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 2
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 2
`
`
`
`US. Patent
`
`Sep. 21, 1999
`
`Sheet 1 0f 20
`
`5,956,518
`
`
`
`Emkm>m032m
`
`_0_ mommmoommomog
`
`._._mm
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 3
`
`Petitioner Microsoft Corporation - EX. 1019, p. 3
`
`
`
`US. Patent
`
`Sep. 21, 1999
`
`Sheet 2 0f 20
`
`5,956,518
`
`Emkm>m
`
`>>_|_> mOmmmoOmaomBE
`
`.EmNM
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 4
`
`Petitioner Microsoft Corporation - EX. 1019, p. 4
`
`
`
`US. Patent
`
`Sep. 21, 1999
`
`Sheet 3 0f 20
`
`5,956,518
`
`
` DEED
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 5
`
`Petitioner Microsoft Corporation - EX. 1019, p. 5
`
`2mhm>m02:2tmm
`
`
`
`US. Patent
`
`Sep.21, 1999
`
`Sheet 4 0f 20
`
`5,956,518
`
`
`CONTEXT
`
`
`Configuration
`DATA
`ADR
`Memory
`
`
`
`
`m_5
`
`WE
`
`
`
`WE
`
`
`
` DATA
`MODE
`
`
`Memory
`Block
`
`
`
`
`
`
`A_PORT B_PORT
`
`IIO
`A_ADR — B_ADR
`
`
`
`
`Memory
`
`Funcfion
`
`
`Port
`
`l22
`
`Net 0 k
`
`l32
`
`ALU
`
`Function
`
`Port
`
`
` I34
`
`Control
`
`Logic
`
`H4
`
`Corry In
`
`'20
`
`Carry Out
`
`FIG. 6
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 6
`
`Petitioner Microsoft Corporation - EX. 1019, p. 6
`
`
`
`US. Patent
`
`Sep. 21, 1999
`
`Sheet 5 0f 20
`
`5,956,518
`
`I?)
`
`IliE..-
`
`mm2 -N._—Z§=Nn=—3—EN...
`.:___w..=_m_=.mm;
`...u.__mm_=.m_=_z__m_
`IllII_-‘i-IIIIIII
`
`.z__-n__=.\n_.__-m__._._.
`-N-_-Z__=__N-__-E=_N-
`
`IIII.....ulHIIInfin"
`
`"m""m."u""u._...nmu‘.
`
`llnwumfl'lI"W—uln
`
`36
`
`FIG. 8
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 7
`
`Petitioner Microsoft Corporation - EX. 1019, p. 7
`
`
`
`US. Patent
`
`Sep.21, 1999
`
`Sheet 6 0f 20
`
`5,956,518
`
`Level—2, Level—3
`Network
`
`Incoming Network
`Lines(L1,L2,L3}
`
`'36,
`'37
`
`Incoming Network
`Lines (L1, L2, L3)
`
`
`Network
`
`Level 2,3
`
`Network
`
`Floating Portl
`
`Floating Port 2
`
`
`
`
`
`
`Switch1(N_1) 1 NetworkDrivers
`Switch2(N3) I
`
`L3 Control fl
`
`1' (FP2) I
`" (E)
`I
`
`
`
` C/R LogiCI
`
`
`
`Control Context
`
`Select
`
`202
`
`Function
`
`(F_m)
`
`COHtYOI Bli
`2|7
`
` Level—1
`
`Level-l
`C/R Logic
`Network
`
`FIG. 9
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 8
`
`Petitioner Microsoft Corporation - EX. 1019, p. 8
`
`
`
`US. Patent
`
`Sep. 21, 1999
`
`Sheet 7 0f 20
`
`5,956,518
`
`
`
`Control Byte
`2E
`
`Control Bit
`_2|_7
`
`FPout
`
`Control Bit
`2a
`
`Configuration Configuration
`Word 8
`Word A
`
`I62
`
`Registers on
`A3 Ports Only
`
`Configuration Configuration
`Word B
`Word A
`
`FIG. 11
`
`ADA,ADB,NI,N2
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 9
`
`Petitioner Microsoft Corporation - EX. 1019, p. 9
`
`
`
`US. Patent
`
`Sep.21, 1999
`
`Sheet 8 0f 20
`
`5,956,518
`
`Local Output
`
`Level—l
`
`I2x8
`
`Level—2
`
`Level—3
`
`Control Byte
`.212
`
`Control Bit
`2|?
`
`Configuration Configuration
`Word A
`
`FIG. 12
`
`Word B
`.LevelISouth i
`
`Level-1 NorthWest
`
`Level-l North
`
`W-Enable
`
`I78
`
`LLevel—I—NorthEastj.
`“—7—
`—->t Level-1 East]
`
`Eevel- lSouthWests_J
`S_Enable
`
`E_Enable
`
`BFU Output
`
`__ _i__,E
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 10
`
`Petitioner Microsoft Corporation - EX. 1019, p. 10
`
`
`
`US. Patent
`
`Sep.21, 1999
`
`Sheet 9 0f 20
`
`5,956,518
`
`Enable
`
`Reg Enable
`(Level-2 Only)
`
`Select
`
`Ntout
`
`N20ut
`
`FPlout
`
`FPZout
`Broadcast Cycle~|92
`
`FIG. 14
`
`'36
`
`F|G.15
`
`FIG. 16
`
`BFU Output
`
`Control
`
`Context Select
`
`l—
`l
`:
`
`202
`
`‘1
`II
`
`Match?
`
`to Leve|1
`
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 11
`
`L. __ _ _ _ _|
`
`Petitioner Microsoft Corporation - EX. 1019, p. 11
`
`
`
`US. Patent
`
`Sep. 21, 1999
`
`Sheet 10 0f 20
`
`5,956,518
`
`____
`
`____
`
`9OI
`
`_tm_9N
`
`
`
`03m..9280
`
`:m.9550
`
`
`
`BmEmeES_ouon_
`
`mco_n_IO 82828:50
`:m6.230
`
`.33
`
`«in.AN\:
`
`{0382N4
`
`,3th
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 12
`
`Petitioner Microsoft Corporation - EX. 1019, p. 12
`
`
`
`US. Patent
`
`Sep. 21, 1999
`
`Sheet 11 0f 20
`
`5,956,518
`
`Qty
`
`mo.mNww
`
`22:8685
`
`cm:
`
`
`
`366:mEoEEEooi
`
`HH23:00H$250
`
`H33:00H$82.50
`
`
`
`3:396:mEoEEEooE
`
`O_N
`
`NN
`
`Iwe
`
`H.“8.6mm:ano
`
`8:959:00
`
`m.<9.25
`
`aQC
`
`2.5.9250
`
`:m_o:coo
`
`ON.o_n_
`
`33.85558
`39:0Em
`Petitioner Microsoft Corporation - Ex. 1019, p. 13
`
`Petitioner Microsoft Corporation - EX. 1019, p. 13
`
`NON
`
`ESQ9:62“.
`
`Htom0562....
`
`3262062
`
`moznwm\ano
`
`8252062
`
`
`
`
`
`
`
`
`US. Patent
`
`Sep.21, 1999
`
`Sheet 12 0120
`
`5,956,518
`
`I/OPorts
`
`FIG.22
`
`I/OPorts
`
`I/OPorts
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 14
`
`Petitioner Microsoft Corporation - EX. 1019, p. 14
`
`m
`”C
`
`OD
`
`. O\l
`
`—l
`
`
`
`US. Patent
`
`Sep.21, 1999
`
`Sheet 13 0120
`
`5,956,518
`
`Data OE
`
`I/OOutputData
`
`I/OInputData
`
`1/0 Bit 0 Output
`
`I/O Bit 0 Input
`
`332
`
`SIG
`
`320
`
`} 8
`
`{
`
`322
`
`334
`
`336
`
`BitOOE
`324 312
`I/O Register } 1
`
`I/O
`Pads
`
`I/O Register ‘ 326
`
`338
`
`340
`
`'1
`B' 105
`328
`
`314
`
`NO
`
`330
`Enable
`
`I/O Bitl Output
`
`I/O Register > 1
`
`I/O BitI Input
`
`I/O Register ‘
`
`342
`
`FIG. 23
`
`Data In
`
`Data Out
`
`Master CLK
`
`350
`
`332—342
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 15
`
`Petitioner Microsoft Corporation - EX. 1019, p. 15
`
`
`
`US. Patent
`
`Sep.21, 1999
`
`Sheet 14 0f 20
`
`5,956,518
`
`360
`
`PLA In Data
`
`I/O Input Data
`
`C/R Out
`
`I/O Input Bits
`
`PLA Out Data 372
`
`' C/R In 374
`
`1/0 OutpuTBits 376
`
`1/0 Oquuf Enables378
`
`_/0Port
`
`Tm?)36086I_m386360
`
`_/0Port-386
`
`PLA 360
`
`Col.5
`
`Output Multiplexor 384
`
`Control
`
`388
`
`Output
`Selector
`
`Col. I
`
`Co|.2
`
`Col. 3
`
`Col.4
`
`FIG. 26
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 16
`
`Petitioner Microsoft Corporation - EX. 1019, p. 16
`
`
`
`US. Patent
`
`Sep. 21, 1999
`
`Sheet 15 0f 20
`
`5,956,518
`
`cozoflom
`
`32:0
`
`7.93
`
`N123
`
`$65..
`
`m#:a:_
`
`meN
`
`{02:02
`
`8:059:80
`
`20>)
`
`2838.oh
`
`93935<.E
`
`mmGE
`
`9.:33:0
`
`28539:0
`
`20033:0960
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 17
`
`Petitioner Microsoft Corporation - EX. 1019, p. 17
`
`
`
`
`US. Patent
`
`Sep.21, 1999
`
`Sheet 16 0120
`
`5,956,518
`
`
`
`
`FIG. 29
`
`BFU Output
`
`I/O Port 0 Input Data
`
`I/O Portl Input Data
`
`I/O Port 2 Input Doto
`PLA 0 Output Byte
`
`PLA I Output Byte
`
`PLA 2 Output Byte
`
`Configuration
`
`Word
`
`FIG. 30
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 18
`
`Petitioner Microsoft Corporation - EX. 1019, p. 18
`
`
`
`US. Patent
`
`Sep.21, 1999
`
`Sheet 17 0120
`
`5,956,518
`
`_J
`
`C/R
`for
`
`Control
`Llnes
`
`FIG. 31
`
`2x5
`
`Network
`Inputs
`i6X4
`
`Level—1
`
`Level-2
`Level-3
`I/O lnpUiS
`
`PLA Outputs
`
`Selec
`Word
`
`Dynamic Source Switch
`
`Configuration
`
`FIG. 32
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 19
`
`Petitioner Microsoft Corporation - EX. 1019, p. 19
`
`
`
`US. Patent
`
`Sep.21, 1999
`
`Sheet 18 0f 20
`
`5,956,518
`
`460
`
`462
`
`
`
`
`
`Control Lines
`
`
`Level-3
`Controller
`
`Dynamic Control
`Switch
`
`Level-3
`Controller
`
`Level—3-1
`Level-3-2
`Control Lines
`
`
`FIG. 33
`
`Level—l
`
`Level-2
`
`Level—3
`
`I/O Inputs 3
`PLA Outputs
`
`X4
`
`Configuration
`
`522
`
`534
`
`Word
`FIG. 36
`Petitioner Microsoft Corporation - Ex. 1019, p. 20
`
`524
`
`(I6 bits output
`over 2 cycles)
`
`
`Petitioner Microsoft Corporation - EX. 1019, p. 20
`
`
`
`US. Patent
`
`Sep. 21, 1999
`
`Sheet 19 0f 20
`
`5,956,518
`
`3228N53
`
`38825w:
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 21
`
`Petitioner Microsoft Corporation - EX. 1019, p. 21
`
`
`
`US. Patent
`
`Sep.21, 1999
`
`Sheet 20 0f 20
`
`5,956,518
`
`,—
`
`52‘
`:6
`
`‘9' I"-i E
`"Hill
`I=!"I I
`“fix“
`I=!['I I.I 2g
`I-u
`,
`J'ilii—
`E
`'9’."II. l
`”HEP—.1
`
`III I I
`
`'I'HIIJII
`
`.-
`">3
`
`92‘.
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 22
`
`Petitioner Microsoft Corporation - EX. 1019, p. 22
`
`
`
`5,956,518
`
`1
`INTERMEDIATE-GRAIN
`RECONFIGURABLE PROCESSING DEVICE
`
`GOVERNMENT FUNDING
`
`This invention was made with Government support under
`the Advanced Research and Projects Agency (ARPA) of the
`Department of Defense under Rome Labs Contract No.
`F30602-94-C-0252. The Government has certain rights in
`the invention.
`
`BACKGROUND OD THE INVENTION
`
`Continuing advances in semiconductor technology have
`greatly increased the amount of processing that can be
`performed by single-chip, general-purpose computing
`devices. The relatively slow increase in inter-chip commu-
`nication bandwidth requires that modern high performance
`devices use as much of the potential on-chip processing
`power as possible. This results in large, dense integrated
`circuit devices and a large design space of processing
`architectures.
`
`One way of viewing this design space is in terms of
`granularity. Designers have the option of building very large
`processing units, or many smaller ones, in the same space.
`Traditional architectures are either very coarse grain, such as
`microprocessors, or very fine grain, such as field program-
`mable gate arrays (FPGAs). Both architectures have advan-
`tages and disadvantages.
`Microprocessors incorporate very few large processing
`units that operate on wide data-words, and each unit is
`hardwired to perform defined instructions on these data-
`words. Usually each unit is optimized for a different set of
`instructions, such as integer and floating point, and the units
`are generally hardwired to operate in parallel. The hardwired
`nature of these units allows very rapid instructions. In fact,
`a great deal of area on modern microprocessor chips is
`dedicated to cache memories in order to support a very high
`rate of instruction issue. Thus, the devices efficiently handle
`very dynamic instruction streams.
`Very fine grain devices, such as FPGAs, incorporate a
`large number of very small processing elements. These
`elements are arranged in a configurable interconnect net-
`work. The configuration data used to define the functionality
`of the processing units and network can be thought of as a
`very large, semantically powerful, instruction word. Nearly
`any operation can be described and mapped to hardware.
`
`SUMMARY OF THE INVENTION
`
`Unfortunately, because microprocessors are highly opti-
`mized for simple, wide-word, dynamic instructions, they are
`relatively inefficient when performing other kinds of opera-
`tions. For example, many cycles are required to build up
`complex operations that are not part of the processor’s
`pre-selected instruction set. Also, when performing short-
`word operations, much of the processing unit is not being
`used, and when the instructions being issued are very
`regular, the large instruction caches are unnecessary. Thus,
`very coarse-grain microprocessors are not equipped to take
`the maximum advantage of these cases.
`The size of the “instruction word” creates a number of
`
`problems with fine-grain FPGA devices, however. Reload-
`ing new instructions takes a relatively long time, making
`dynamic instruction streams very difficult for these devices.
`Moreover, if the operation being performed is, in fact, a wide
`word operation, a great deal of this “instruction word” must
`be dedicated to re-describing the operation for each of the
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`small processing elements. Thus, fine grain processing ele-
`ments are not well equipped to take advantage of a large
`number of common computing operations.
`The present
`invention utilizes a large number of
`intermediate-grain processing elements which are arranged
`in a configurable mesh. Thus,
`the regularity and rapid
`instruction issue features of coarse-grain units are exploited,
`but a reconfigurable or programmable interconnect allows
`these units to be connected in an application-specific man-
`ner. This means that coarse-grain resources, such as memory
`and processing, can be deployed in a way that takes advan-
`tage of the opportunities for optimization present in any
`given problem. In addition, configuration memories may be
`deployed to take advantage of application specific redun-
`dancy.
`In general according to one aspect, the invention features
`a programmable integrated circuit that comprises a logic
`units that perform operations on data in response to instruc-
`tions and memories that store and retrieve addressed data. A
`
`configurable or programmable interconnect provides a mode
`of signal transmission between the logic units and memo-
`ries. Configuration control data defines data paths through
`the interconnect, which can be address inputs to memories,
`data inputs to memories and logic units, and instruction
`inputs to logic units. Thus, the interconnect is configurable
`to define an interdependent functionality of the functional
`units. A programmable configuration storage stores the
`configuration control data.
`Thus the present invention may be configured to operate
`according to a number of traditionally distinct computing
`architectures. For example, a centrally located functional
`unit may be assigned the role of arithmetic logic unit (ALU)
`with memories of surrounding functional units being con-
`figured to act as instruction caches, register files, and pro-
`gram counters. Wider data paths are accommodated by tying
`near-neighbor ALUs to each other. Wider instructions are
`achieved by configuring instruction memories of separate
`functional units as if they were a single memory. For a
`different problem, the same integrated circuit may be recon-
`figured to emulate a single instruction multiple data (SIMD)
`architecture. The logic units of rows of functional units are
`tied together to create wider data paths, and the rows
`perform separate serial tasks.
`In specific embodiments, functional units may provide at
`least part of the instructions to logic units of other functional
`units. Also,
`the configuration storage may hold multiple
`contexts of configuration control data for reconfiguration of
`the programmable interconnect.
`In other embodiments, the interconnect may support three
`different modes of operation: a static value in which a value
`set by the configuration data is provided to a functional unit
`or static source in which another functional unit serves as the
`
`value source. A dynamic source mode can be included in
`which the source is determined by the value from another
`functional unit.
`
`In still other embodiments, each logic unit can also have
`programmable logic arrays on data paths between functional
`units which perform bit level logic operations. Additionally,
`reduction logic can be added that performs logic operations
`on the output of the logic units and passes a result to other
`functional units as control information. Network drivers are
`
`assigned to each unit to transmit received signals to other
`functional units. The sources of the signals received by the
`drivers may also be dynamic so that the sources are pro-
`grammable by other functional units.
`the invention
`In general according to another aspect,
`features an integrated reconfigurable computing device,
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 23
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 23
`
`
`
`5,956,518
`
`3
`which has functional units of multi-bit arithmetic logic units
`and memories. A configurable interconnect that connects the
`units includes function ports which determine the source of
`the instructions to the logic units. Network ports of the units
`are configurable by the functional units and determine the
`source of addresses to the memories and the source of data
`
`to the logic units and memories.
`In general according to still another aspect, the invention
`can also be characterized in the context of a method for
`
`organizing signal transmission within an array of functional
`units. Data read from the memories of functional units may
`be transmitted as instructions to the logic units of other
`functional units. Also, data read from logic units may be
`transmitted as addresses for the memories of other func-
`
`tional units. Finally, the data read from functional units can
`also be used as data inputs for the logic units of other
`functional units.
`
`In specific embodiments, the paths of the data and instruc-
`tions are dynamic in response to control from the functional
`units. More specifically, static values, values from other
`functional units, and values from sources may be transmitted
`between functional units.
`
`The above and other features of the invention including
`various novel details of construction and combinations of
`
`parts, and other advantages, will now be more particularly
`described with reference to the accompanying drawings and
`pointed out in the claims. It will be understood that the
`particular method and device embodying the invention are
`shown by way of illustration and not as a limitation of the
`invention. The principles and features of this invention may
`be employed in various and numerous embodiments without
`departing from the scope of the invention.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`In the accompanying drawings, reference characters refer
`to the same parts throughout the different views. The draw-
`ings are not necessarily to scale; emphasis has instead been
`placed upon illustrating the principles of the invention. Of
`the drawings:
`FIG. 1 shows a programmable integrated processing
`device of the present invention, which has been configured
`as an 8-bit microprocessor;
`FIG. 2 shows a SIMD processor configuration for the
`processing device according to the invention;
`FIG. 3 shows a 32-bit processor configuration for the
`processing device according to the invention;
`FIG. 4 shows a very long instruction word (VLIW)
`processor configuration for the processing device according
`to the invention;
`
`FIG. 5 shows multiple instruction multiple data (MIMD)
`processor configuration for the processing device according
`to the invention;
`FIG. 6 is a block diagram showing the architecture of a
`basic functional unit (BFU) core of the present invention;
`FIG. 7 is a block diagram showing the inter-BFU con-
`nectivity provided by the level-1 network connections;
`FIG. 8 is a block diagram showing the BFU interconnec-
`tion provided by the level-2 network connections;
`FIG. 9 is a block diagram showing the network switch
`architecture for a BFU of the present invention;
`FIG. 10 is a block diagram illustrating the function switch
`architecture of the present invention;
`FIG. 11 is a block diagram showing the address/data and
`network switch architecture of the present invention;
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`FIG. 12 is a block diagram illustrating the floating port
`architecture of the present invention;
`FIG. 13 is a block diagram showing the level-1 network
`drivers of the present invention;
`FIG. 14 shows the level-2 drivers of the present invention;
`FIG. 15 shows the level-3 drivers of the present invention;
`FIG. 16 shows BFU input registers of the present inven-
`tion;
`FIG. 17 shows the reduction logic in the BFU control
`architecture of the present invention;
`FIG. 18 is an example of multi-BFU reduction performed
`by the reduction logic of the present invention;
`FIG. 19 is a block diagram illustrating the operation of the
`distributed programmable logic array (PLA) associated with
`each BFU according to the invention;
`FIG. 20 is a block diagram showing the control logic for
`a single BFU;
`FIG. 21 shows an alternative embodiment of the configu-
`ration memory supporting multiple contexts;
`FIG. 22 is a block diagram of the configurable logic
`device of the present invention in the form of an integrated
`chip;
`FIG. 23 is a block diagram showing the input/output port
`architecture for the chip of the present invention;
`FIG. 24 is a block diagram showing the structure of an I/O
`register according to the invention;
`FIG. 25 is a block diagram of a programmable logic array
`for customizing the chip’s interface;
`FIG. 26 is a block diagram showing the movement of data
`from the BFU core off-chip according to the invention;
`FIG. 27 is a block diagram of a selector switch that
`chooses the core outputs to be driven on an output wire
`according to the invention;
`FIG. 28 is a block diagram showing a tri-state buffer used
`in the selector switch of FIG. 20;
`FIG. 29 is a block diagram illustrating how data enters the
`BFU core from off-chip;
`FIG. 30 is a block diagram showing the selector switch
`that selects among incoming data bytes from I/O ports and
`PLAs according to the invention;
`FIG. 31 is a block diagram of a C/R input architecture
`according to the invention;
`FIG. 32 is a block diagram showing the construction of
`the controller switches of the level-3 network lines accord-
`
`ing to the invention;
`FIG. 33 is a block diagram illustrating the dynamic
`control of the controller switches, which is shared between
`pairs of controllers at each column, according to the inven-
`tion;
`FIG. 34 shows the architecture of one of the dynamic
`control switches according to the invention;
`FIG. 35 is a block diagram showing the connectivity of
`BFUs in a systolic-type configuration according to the
`invention;
`FIG. 36 shows the configuration of the BFUs for a
`microcoded-type implementation for the convolution prob-
`lem according to the invention;
`FIG. 37 shows the organization of the BFUs for a VLIW,
`horizontal microcode-type implementation according to the
`invention; and
`FIG. 38 shows the organization of the BFUs for a VLIW/
`MSIMD-type implementation according to the invention.
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 24
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 24
`
`
`
`5,956,518
`
`5
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`
`FIG. 1 shows a multi-bit microprocessor configuration of
`a reconfigurable processing device, which has been con-
`structed and programmed according to the principles of the
`present invention. A two-dimensional array of basic func-
`tional units 100 are located in a programmable interconnect
`101. Five of the BFUs 100 and the portion of the reconfig-
`urable interconnect connecting the BFUs have been config-
`ured to operate as a microprocessor 102.
`Each of the BFUs 100 preferably has addressable memory
`resources and logic resources, such as an 8-bit arithmetic
`logic unit (ALU). One of the BFUs 100, denoted ALU,
`utilizes its logic resources to perform the logic operations of
`the microprocessor 102 and utilizes its memory resources as
`a data store and/or extended register file. Another BFU
`operates as a function store F that controls the successive
`logic operations performed by the logic resources of the
`ALU. Two additional BFUs, A and B, operate as further
`instruction stores that control the addressing of the memory
`resources of the ALU. A final BFU, PC, operates as a
`program counter for the various instruction BFUs F, A, B.
`As shown in FIG. 2, the same reconfigurable processing
`array, however, may be reprogrammed to function as a
`SIMD system, and as described below, this reconfiguration
`can occur on a cycle-by-cycle basis. The functions of the
`program counter PC and instruction stores A, B and F have
`been again assigned to different BFUs 100, but the ALU
`function has been replicated into 12 BFUs. Each of the
`ALUs is connected via the reconfigurable interconnect 101
`to operate on globally broadcast
`instructions from the
`instruction stores A, B, F. These same operations are per-
`formed by each of these ALU, or common instructions may
`be broadcast on a row-by-row basis.
`FIG. 3 shows how wider data paths can be constructed in
`the programmable device. This 32-bit microprocessor con-
`figured device has the same instruction stores A, B, F and
`program counter as described in connection with FIG. 1.
`Four BFUs, however, have been assigned an ALU operation,
`and the ALUs are chained together to act as a single 32-bit
`wide microprocessor in which the interconnect 101 supports
`carry-in and carry-out operations between the ALUs.
`FIG. 4 shows how the device can be configured to operate
`as a very long instruction word (VLIW) system. The various
`instruction stores A, B, F are defined to encompass multiple
`BFUs 100 to accommodate the desired instruction word
`width.
`
`FIG. 5 shows the configuration of the present system to
`operate as a multiple instruction multiple data (MIMD)
`system. The 8-bit microprocessor configuration 102 of FIG.
`1 is replicated into an adjacent set of BFUs to accommodate
`multiple,
`independent processing units within the same
`device. Of course, wider data paths could also be accom-
`modated by chaining ALUs of each processor 102 to each
`other.
`1. Basic Functional Unit Architecture
`
`FIG. 6 shows the moderately coarse grain, preferably
`8-bit, BFU core. Primarily, the BFU core has memory block
`110, basic ALU core 120, and configuration memory 105.
`The main memory block 110 is a 256 word><8 bit wide
`memory, which is arranged to be used in either single or dual
`port modes. In dual port mode, the memory size is reduced
`to 128 words in order to be able to perform the two
`simultaneous read operations without increasing the read
`latency of the memory. The memory mode is controlled by
`control logic 114 accessed through a Memory/MuX function
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`port 112, and the write enable can be controlled either
`through the memory/muX function port 112 or by the control
`logic 134 accessed through ALU function port 132. Control
`logic is hardwired and also controls the ALU functions.
`In single port mode, the memory 110 uses the AiADR
`port for an address and outputs the selected value to both
`AiPORT and BiPORT. In dual port mode, the AiADR
`port selects a value for AiPORT only, and BiADR port
`selects a value for the BiPORT.
`In either