IPR2018-01606, No. 1019 Exhibit - US Patent 5,956,518 to DeHon et al (P.T.A.B. Sep. 6, 2018)

United States Patent
`
`[19]
`
`[11]
`
`Patent Number:
`
`5,956,518
`[45] Sep. 21, 1999
`Date of Patent:
`DeHonetal.
`
`
`
`US005956518A
`
`[54]
`
`INTERMEDIATE-GRAIN
`RECONFIGURABLE PROCESSING DEVICE
`
`[75]
`
`Inventors: André DeHon; Ethan Mirsky, both of
`Cambridge; Thomas F. Knight, Jr.,
`Belmont, all of Mass.
`
`[73] Assignee: Massachusetts Institute of
`Technology, Cambridge, Mass.
`
`[21] Appl. No.: 08/632,371
`
`[22]
`
`Filed:
`
`Apr. 11, 1996
`
`Tint, (de oeccc cceessccsssssneeecceesnsnneeeee GO06F 15/80
`[SD]
`[52] U.S. Che i eecsecsseccneeneenenee 395/800.15; 395/653
`[58] Field of Search 0.0. 395/800.1, 800.15,
`395/200.51, 280, 653
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`Charlé Rupp et al., “The Napa Adaptive Processing Archi-
`tecture”, IEEE Symposium on Field—Programmable Custom
`Computing Machines Conference (FCCM98), Apr. 15-17,
`1998, pp. 1-10.
`
`Stephen M. Scalera et al., The Design and Implementation
`of a Context Switching FPGA,
`IEEE Symposium on
`Field—Programmable Custom Computing .
`
`T. Bridges, “The GPA Machine: A Generally Partitionable
`MSIMDArchitecture,” Third Symposium on the Frontier of
`Massively Parallel Computation Proceedings IEEE pp.
`196-203 (1990).
`
`P. Clarke, “Pilington Preps Reconfigurable Video DSP,”
`News (Aug. 7, 1995).
`
`D.C. Chen,et al., “A Reconfigurable Multiprocess IC for
`Rapid Prototyping of Algorithmic—Specific High-Speed
`DSPData Paths,” JEEE Journal ofSolid-State Circuits, vol.
`27 (12): 1895-1904 (Dec 1992).
`
`A.K. Yeung,et al., “TA 6.3: A2.4GOPS Data—Given Recon-
`figurable Multiprocessor IC for DSP,” IEEE International
`Solid-State Circuits Conference, pp. 108-109, 346 (1995).
`
`(List continued on next page.)
`
`Primary Examiner—Eric Coleman
`Attorney, Agent, or Firm—Hamilton, Brook, Smith &
`Reynolds, P.C.
`
`[57]
`
`ABSTRACT
`
`6/1986 Guyeret al. oes 364/200
`4,597,041
`.. 364/900
`5/1988 Chiarulli etal.
`4,748,585
`
`
`... 364/736
`6/1988 Deering..........
`4,754,412
`8/1989 Saccardi ..
`... 364/200
`4,858,113
`9/1989 Freeman ..
`... 307/465
`4,870,302
`
`10/1989 Gifford .......
`... 364/200
`4,873,626
`..
`5/1991 Gorin et al.
`. 371/113
`5,020,059
`
`8/1993 Agrawal etal. .......
`. 364/489
`5,233,539
`8/1993 Ing-Simmonsetal. ...
`.. 395/800
`5,239,654
`
`8/1993 Papadopouloset al... 395/375
`5,241,635
`A programmable integrated circuit utilizes a large numberof
`11/1993 Zak ecco
`. 395/200.31
`5,265,207
`intermediate-grain processing elements which are multibit
`
`cteecteceesenceeceneeneee 395/800
`4/1994 COOK weet
`5,301,340
`processing units arranged in a configurable mesh. The
`4/1994 Grondalski .
`395/800.1
`5,305,462
`
`coarse-grain resources, such as memory and processing, are
`8/1994 Popli et al.
`.
`. 307/465
`5,336,950
`deployable in a way that takes advantage of the opportuni-
`5,426,378—6/1995) ONG vaceccceccsseccesesceesescsesesteees 326/39
`ties for optimization present in given problems. To accom-
`
`5,457,644 10/1995 McCollum .....
`.. 364/716
`11/1997 Casselman oo... eeeeecereeeeeee 395/500
`5,684,980
`plish this, the interconnect supports three different modes of
`operation: a static value in which a value set by the con-
`figuration data is provided to a functional unit, static source
`in which another functional unit serves as the value source,
`and a dynamic source mode in which the source is deter-
`mined by the value from another functional unit.
`
`OTHER PUBLICATIONS
`
`Takashi Miyamoriet al., “A Quantitative Analysis of Recon-
`figurable Coprocessors for Multimedia Applications,” IEEE
`Symposium on Field—Programmable Custom Computing
`Machines Conference (FCCM98), Apr. 15-17, 1998.
`
`31 Claims, 20 Drawing Sheets
`
`8 BIT MICROPROCESSOR
`
`
`
`
`OC
`
`
`
`
`
`JOO
`ajclepe
`
`Ajo
`ee u 7
`Petitioner Microsoft Corporation - Ex. 1019, p. 1
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 1
`
`

`5,956,518
`
`Page 2
`
`OTHER PUBLICATIONS
`
`J.E. Brewer, et al., “A Monolithic Processing Subsystem,”
`IEEE Transactions on Components, Packaging, and Manu-
`facturing Technology —Part B, vol. 17(3) :310-317 (Aug.
`1994).
`I. Gilbert, Chapter 11 —Mesh Multiprocessing, The Lincoln
`Laboratory Journal, 11) :11.1-11.18 (Spring 1988).
`Ira H. Gilbert, et al., “The Monolithic Synchronous Proces-
`sor,” Lincoln Laboratory, Massachusetts Institute of Tech-
`nology, Lexington, MA 02173 (No Date Given).
`G. Masera, et al., “A Microprogrammable Parallel Archi-
`tecture for DSP,” JEEE, pp. 824-827, (1991).
`L. Wang, et al., “Distributed Instruction Set Computer,”
`Proceedings of the 1988 International Conference on Par-
`allel Processing, vol. 1, pp. 426-429.
`M. Sowa,et al., “Parallel Execution on the Function—Parti-
`tioned Processor with Multiple Instruction Streams,” Sys-
`tems and Computers in Japan, 22(4) :22-27 (Nov. 1991).
`T. Alexander, et al., “A Reconfigurable Approach to a
`Systolic Sorting Architecture,” IEEE, pp. 1178-1182
`(1989).
`Z. Blazek, et al. “Design of a Reconfigurable Parallel
`RISC—Machine,” North-Holland Microprocessing and
`Microprogramming 21, pp. 39-46, (1987).
`S. Morton, et al., The Dynamically Reconfigurable CAP
`Array Chip I, IEEE Journal of Solid-State Circuits, SC21
`(5) :820-826 (Oct. 1986).
`L. Snyder,
`“A Taxonomy of Synchronous Parallel
`Machines,” Proceedings of the 1988 International Confer-
`ence on Parallel Processing, pp. 281-285 (Aug. 1988).
`L. Snyder, “An Inquiry into the Benefits of Multigauge
`Parallel Computation,” Proceedings of the 1985 Interna-
`tional conference on Parallel Processing, pp. 488-492
`(Aug. 1985).
`A. DeHon, “DPGA Utilization and Application,’ FPGA
`°96—ACM/SIGDA Fourth International Symposium on
`FPGAs, Monterey, CA (Feb. 11-13, 1996).
`D. Epstein, “Chromatic Raises the Multimedia Bar,” Micro-
`processor Report, pp. 23-27 (Oct. 23, 1995).
`J. Labrousse, et al., “Create-Life: A Modular Design
`Approach for High Performance ASIC’s,” IEEE CH2843,
`pp. 427-433 (Jan. 1990).
`M. Slater, “MicroUnity Lift Veil on MediaProcessor,”
`Microprocessor Report, pp. 11-18 (Oct. 23, 1995).
`E. Tau, et al., “A First Generation DPGa Implementation,”
`FPD °95 —Third Canadian Workshop of Field-Program-
`mable Devices Montreal, Canada (May 29 —Jun. 1, 1995).
`
`S. Kartashev, et al. “A Multicomputer System with
`Dynamic Architecture,” IEEE Transactions on Computers,
`vol. C-28, No. 10, pp. 704-721 (Oct. 1979).
`D. Bursky, “Programmable Data Paths Speed Computa-
`tions,” Electronic Design, pp. 171-174 (May 1, 1995).
`V. Bove, Jr., et al., “Cheops: A Reconfigurable Data—Flow
`System for Video Processing,” IEEE Transactions on Cir-
`cuits and Systems for Video Technology, pp. 140-149
`(1995).
`M. Schaffner, “Processing by Data and Program Blocks,”
`Transactions on Computers, vol. C-27, No.
`11, pp.
`1015-1027 (Nov. 1978).
`J. Nickolls, “The Design of the MasPar MP-1: A Cost
`Effective Massively Parallel Computer,” JEEE CH2843, pp.
`25-28 (Jan. 1990).
`B. Narasimha, “Performance—Oriented, Fully Routable
`Dynamic Architecture for a Field Programmable Logic
`Device,” UCB/ERL M93/42, University of California, Ber-
`keley, pp. 1-21 (Jun. 1993).
`M. Bolotski, et al., “A 1024 Processor 8ns SIMD Array,”
`Advanced Research in VLSI 1995, pp. 1-13 (1995).
`D. Cherepacha,et al., “A Datapath Oriented Architecture for
`FPGAs,” Second International ACM/SIGDA Workshop on
`Field Programmable Gate Arrays ACM, pp. 1-11 (Feb.
`1994).
`D. Jones, et al., “A Time—Multiplexed FPGA Architecture
`for Logic Emulation,” Proceedings of the IEEE 1995 Cus-
`tom Integrated Circuits Conference, pp. 495-498 (May
`1995).
`G. Nutt, “Microprocessor Implementation of a Parallel Pro-
`cessor,” Proceedings of the Fourth Annual Symposium on
`Computer Architecture, pp. 147-152, (1977).
`W. Kim, “MasPar MP-2 PE Chip: A Totally Cool Hot Chip,”
`Proceedings of Hot Chips V, pp. 1-5 (Mar. 29, 1993).
`E. Mirsky, et al., “Matrix: Coarse—Grain Reconfigurable
`Computing (Abstract)”, Published at the 5th Annual MIT
`Student Workshop on Scalable Computing, pp. 1-2 (Aug.
`1995) (Available on the Internet May 1, 1995).
`E. Mirsky, et al., “Matrix: A Reconfigurable Computing
`Architecture with Configurable Instruction Distribution and
`Deployable Resources,” Published at FCCM’96 —TEEE
`Symposium on FPGA’s for Custom Computing Machines,
`pp. 1-10 (Apr. 17-19, 1996).
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 2
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 2
`
`

`U.S. Patent
`
`WALSAS
`
`QWIS YOSSAIONdONSIWLIE8
`
`Sep. 21, 1999
`
`Sheet 1 of 20
`
`5,956,518
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 3
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 3
`
`

`Sep. 21, 1999
`
`Sheet 2 of 20
`
`U.S. Patent WALSASMITA
` PILILIL
`YOSSZIOYNdOYIIWLIEcE
`
`5,956,518
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 4
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 4
`
`

`Sep. 21, 1999
`
`Sheet 3 of 20
`
`5,956,518
`
`U.S. Patent
` PILILIL
`
`
`
`WALSASGWIWLid8
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 5
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 5
`
`

`U.S. Patent
`
`Sep. 21, 1999
`
`Sheet 4 of 20
`
`5,956,518
`
`CONTEXT
`
`
`
`
`Configuration
`ADR
`Memory
`
`
`
`DATA
`Port | Port
`
`Network|A_PORT B_PORT ] Il2
`
`
`iPee_in
`
`
`134 ControlLogic
`
`l22
`
`Network
`
`
`
`
`
`Memory
`Block
`io
`
`l24
`
`MODE
`
`
`
`
`
`
`
`ALU
`Function
`
`Memory
`Function
`
`
`
`connconn
`
`1'4
`
`Carry In
`
`120
`
`Carry Out
`
`FIG. 6
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 6
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 6
`
`

`U.S. Patent
`
`Sep. 21, 1999
`
`Sheet 5 of 20
`
`5,956,518
`
`O136°
`
`|
`
`Aaeat|Rieieie=aiita=a=e|alealaaSSaiCal
`iiceiw,NNce||THTP|TT
`dimen}
`
`OoON
`
`FIG. 8
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 7
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 7
`
`
`
`

`U.S. Patent
`
`Sep. 21, 1999
`
`Sheet 6 of 20
`
`5,956,518
`
`Incoming Network
`Lines(Li,L2,L3)
`
`Level-2, Level-3
`Network
`I36,
` !3?
`
`Incoming Network
`
`
`
`Network
`Switch 1 (N1)
`
`Level 2,3
`Network Drivers
`
`Network
`Switch 2 (N2)
`
`Floating Port |
`
`
`Lines (U1, L2, L3)
`
`
`
`
`pa
`2
`L3Control =
`
`
`
`
`Function
`(Fm)
`
`
`
`
`Control Context
`Select
`
`202
`
`Level-1
`Network
`
` Level-1
`C/R Logic
`
`FIG. 9
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 8
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 8
`
`

`U.S. Patent
`
`Sep. 21, 1999
`
`Sheet 7 of 20
`
`5,956,518
`
`Local Output 1x8
`| 2x8
`
`Control Byte
`215
`
`Control Bit
`2\7
`
`Word B
`
`Local Output
`12x8
`Level- 8x8
`Level-2
`Level-3
`
`Configuration Configuration
`Word A
`Word B
`
`160
`
`162
`Registers on
`A,B Ports Only
`
`Network
`Drivers
`(N1,N2)
`136
`
`Network
`Inputs
`30 x8
`
`Control Byte
`2I5
`
`FP out
`
`Control Bit
`217
`
`Configuration Configuration
`WordA
`
`FIG 11
`
`ADA, ADB, NI, N2
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 9
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 9
`
`

`217 Sheet 8 of 20
`evel-1South}
`
`a‘L
`
`U.S. Patent
`
`Sep. 21, 1999
`
`5,956,518
`
`Local Output
`
`Level-|
`
`Level-2
`
`Level-3
`
`Network
`Inputs
`30x8
`
`Control Byte
`215
`
`Control Bit
`
`Configuration
`WordA
`
`Configuration
`Word B
`
`FIG. 12
`
`Level-1 NorthWest
`
`Level-I North
`
`174
`
`N_ Enable
`
`W_. Enable
`
`iLevel-ISouthWest|
`
`S_Enable
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 10
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 10
`
`

`U.S. Patent
`
`Sep. 21, 1999
`
`Sheet 9 of 20
`
`5,956,518
`
`Enable
`
`Enable
`
`Reg Enable
`(Level-2 Only)
`
`Select
`
`Nout
`
`N2out
`
`FPlout
`
`FIG.14
`
`FIG. 15
`
`FP2out
`Broadcast Cycle~192
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 11
`
`Control
`Context Select
`
`FIG. 16
`
`BFU Output
`
`7%
`|
`|
`|
`
`202
`
`Match?
`
`to Level 1
`
`|
`
`[oo
`
`FIG. 17
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 11
`
`

`U.S. Patent
`
`Sep. 21, 1999
`
`Sheet 10 of 20
`
`5,956,518
`
`SlOlas
`
`
`
`ayAgjospuoy
`
`41g|O44U0D
`
`|0907]
`
`BUDidYO 9918S
`{1G|O4JUOD
`
`
`
`{x9,U0D[D207
`
`
`
`W1d(2/1)
`
`yAOMONZI
`
`YOyIMS
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 12
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 12
`
`

`U.S. Patent
`
`Sep. 21, 1999
`
`Sheet 11 of 20
`
`5,956,518
`
`o/bGOl
`2cLlv
`
`
`
`Jx9JUOD[DQO|H
`
`T4xayu0g
`
`I4xayu0g
`
`paJlMpsy
`a|qowwosbodd
`
`IE}xa4u09
`
`TIE}x9ju0g
`
`pauIMpsDH
`ajqowwoibold
`
`volyDunBipuog
`
`g*yPAOM
`
`IgSls
`
`TogHulyDo}|4
`
`I0gbuijsooj3
`agonpay/dwod
`pooysoqubien
`pooysoqubian
`
`a@UDIdYO
`
`(W1d2/1)
`
`
`
`ayAgjOspuod
`
`$1JO4fUOD
`
`O¢gls
`
`
`ndingn4ag
`Petitioner Microsoft Corporation - Ex. 1019, p. 13
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 13
`
`
`
`
`

`U.S. Patent
`
`Sep. 21, 1999
`
`Sheet 12 of 20
`
`5,956,518
`
`FIG.22
`
`I/OPorts
`
`2_
`Oo
`Oo
`©~s
`re
`
`I/OPorts
`
`I/OPorts
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 14
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 14
`
`

`U.S. Patent
`
`Sep. 21, 1999
`
`Sheet 13 of 20
`
`5,956,518
`
`Data OE
`
`I/O Output Data
`I/O Input Data
`
`320
`
`322
`
`334
`
`336
`
`1/0 Bit O Output
`
`1/0 Bit O Input
`
`BitO OE
`I/O
`324 32 Pads
`{
`SS
`1/0 Register
`1/0 Register Q 326
`it
`340
`BInTOE 34
`328
`
`I/O Bit! Output 1/0 Register->}>— /;
`
`I/O Bit 1 Input
`I/O Register <
`
`332
`310
`1/0 Register > 8
`I/O Register <|
`
`
`Master CLK
`
`338
`
`NO
`
`330
`
`342
`
`FIG. 23
`
`Data In
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 15
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 15
`
`

`U.S. Patent
`
`Sep. 21, 1999
`
`Sheet 14 of 20
`
`5,956,518
`
`360
`
`PLA In Data
`
`1/0 Input Data
`
`C/R Out
`
`I/O Input Bits
`
`PLA Out Data 372
`
`P C/R In 374
`I/O Output Bits 376
`
`OR Plane
`
`FIG. 25
`
`
`t|1/0Port|t|I/OPort|t|1/0Port|O Por if|1/0Port||I/OPort|386
`/
`
`PLA |360
`
`I/O Output Enables 378
`Col.5
`
`Tea36020
`
`|Bae360
`
`Output Multiplexor 384
`
`Control
`
`388
`
`Output
`Selector
`
`Col.O
`
`Col. 1
`
`Col. 2
`
`Col. 3
`
`Col.4
`
`FIG. 26
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 16
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 16
`
`

`U.S. Patent
`
`Sep. 21, 1999
`
`Sheet 15 of 20
`
`5,956,518
`
`uoljoa]as
`
`(9XZ)XZ
`Z-lana7
`
`|-|ana7
`
`ayhgyndulWd
`
`DyOQyndu]O/T9x22
`
`syndu|
`
`(QX2)XZ?
`yndjino(QXP)X?E1889YAOMIaN)
`
`
`
`volyDunbijuod
`
`P4sOM
`
`8¢OlA
`
`
`
`aulyndino
`
`
`
`ajqougyndjno
`
`
`
`
`
`DjOGgINdjNOa1049
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 17
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 17
`
`
`

`U.S. Patent
`
`Sep. 21, 1999
`
`Sheet 16 of 20
`
`5,956,518
`
`
`
`
`I/O Port O Input Data
`
`I/O Port] Input Data
`
`I/O Port 2 Input Data
`
`PLA O Output Byte
`PLA | Output Byte
`
`PLA 2 Output Byte
`
`FIG. 29
`
`Configuration
`Word
`
`FIG. 30
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 18
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 18
`
`

`U.S. Patent
`
`Sep. 21, 1999
`
`Sheet 17 of 20
`
`5,956,518
`
`__l
`
`C/R
`tor
`
`Selec
`Word
`
`FIG. 31
`
`Network
`Inputs
`lOx4
`
`2x5
`
`Level—|
`
`Level-2
`Level-3
`1/0 Inputs
`PLA Outputs
`
`Dynamic Source Switch
`
`Configuration
`
`FIG. 32
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 19
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 19
`
`

`U.S. Patent
`
`Sep. 21, 1999
`
`Sheet 18 of 20
`
`5,956,518
`
`460
`
`462
`
`Level-3
`Controller
`
`Dynamic Control
`Switch
`
`Level-3
`Controller
`
`Control Lines
`
`
`
`
`
`Level—3_1
`Level-3_2
`Control Lines
`
`
`
`
`FIG. 33
`
`Network
`Inputs
`I6x4
`
`Level-1
`
`Level-2
`Level-3
`I/O Inputs
`PLA Outputs Sx4
`
`Configuration
`
`522
`
`532
`
`X; (8bit)
`|
`
`324
`
`FIG. 36
`
`Word
`534
`
`yj
`530
`(16 bits output
`over 2 cycles)
`
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 20
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 20
`
`

`U.S. Patent
`
`Sep. 21, 1999
`
`Sheet 19 of 20
`
`5,956,518
`
`G¢Sls
`
`4¢Olas
`
`IK
`
`(4148)
`
`(S@jQADZ49A0
`
`
`
`indjnosj1q9))
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 21
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 21
`
`

`U.S. Patent
`
`Sep. 21, 1999
`
`Sheet 20 of 20
`
`5,956,518
`
`x6;
`
`KO;
`
`x4;
`
`x3;
`
`x2;
`
`x];
`
`yi
`
`FIG.38
`
`a
`
`
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 22
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 22
`
`

`5,956,518
`
`1
`INTERMEDIATE-GRAIN
`RECONFIGURABLE PROCESSING DEVICE
`
`GOVERNMENT FUNDING
`
`This invention was made with Government support under
`the Advanced Research and Projects Agency (ARPA)of the
`Department of Defense under Rome Labs Contract No.
`F30602-94-C-0252. The Government has certain rights in
`the invention.
`
`BACKGROUND OD THE INVENTION
`
`Continuing advances in semiconductor technology have
`greatly increased the amount of processing that can be
`performed by single-chip, general-purpose computing
`devices. The relatively slow increase in inter-chip commu-
`nication bandwidth requires that modern high performance
`devices use as much of the potential on-chip processing
`power as possible. This results in large, dense integrated
`circuit devices and a large design space of processing
`architectures.
`
`One way of viewing this design space is in terms of
`granularity. Designers have the option of building very large
`processing units, or many smaller ones, in the same space.
`Traditional architectures are either very coarse grain, such as
`microprocessors, or very fine grain, such as field program-
`mable gate arrays (FPGAs). Both architectures have advan-
`tages and disadvantages.
`Microprocessors incorporate very few large processing
`units that operate on wide data-words, and each unit is
`hardwired to perform defined instructions on these data-
`words. Usually each unit is optimized for a different set of
`instructions, such as integer and floating point, and the units
`are generally hardwired to operate in parallel. The hardwired
`nature of these units allows very rapid instructions. In fact,
`a great deal of area on modern microprocessor chips is
`dedicated to cache memories in order to support a very high
`rate of instruction issue. Thus, the devicesefficiently handle
`very dynamic instruction streams.
`Very fine grain devices, such as FPGAs, incorporate a
`large number of very small processing elements. These
`elements are arranged in a configurable interconnect net-
`work. The configuration data used to define the functionality
`of the processing units and network can be thought of as a
`very large, semantically powerful, instruction word. Nearly
`any operation can be described and mapped to hardware.
`
`SUMMARYOF THE INVENTION
`
`Unfortunately, because microprocessors are highly opti-
`mized for simple, wide-word, dynamic instructions, they are
`relatively inefficient when performing other kinds of opera-
`tions. For example, many cycles are required to build up
`complex operations that are not part of the processor’s
`pre-selected instruction set. Also, when performing short-
`word operations, much of the processing unit is not being
`used, and when the instructions being issued are very
`regular, the large instruction caches are unnecessary. Thus,
`very coarse-grain microprocessors are not equipped to take
`the maximum advantage of these cases.
`The size of the “instruction word” creates a number of
`
`problems with fine-grain FPGA devices, however. Reload-
`ing new instructions takes a relatively long time, making
`dynamic instruction streams very difficult for these devices.
`Moreover,if the operation being performedis, in fact, a wide
`word operation, a great deal of this “instruction word” must
`be dedicated to re-describing the operation for each of the
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`small processing elements. Thus, fine grain processing ele-
`ments are not well equipped to take advantage of a large
`number of common computing operations.
`The present
`invention utilizes a large number of
`intermediate-grain processing elements which are arranged
`in a configurable mesh. Thus,
`the regularity and rapid
`instruction issue features of coarse-grain units are exploited,
`but a reconfigurable or programmable interconnect allows
`these units to be connected in an application-specific man-
`ner. This meansthat coarse-grain resources, such as memory
`and processing, can be deployed in a way that takes advan-
`tage of the opportunities for optimization present in any
`given problem. In addition, configuration memories may be
`deployed to take advantage of application specific redun-
`dancy.
`In general according to one aspect, the invention features
`a programmable integrated circuit that comprises a logic
`units that perform operations on data in response to instruc-
`tions and memoriesthat store and retrieve addressed data. A
`
`configurable or programmable interconnect provides a mode
`of signal transmission between the logic units and memo-
`ries. Configuration control data defines data paths through
`the interconnect, which can be address inputs to memories,
`data inputs to memories and logic units, and instruction
`inputs to logic units. Thus, the interconnect is configurable
`to define an interdependent functionality of the functional
`units. A programmable configuration storage stores the
`configuration control data.
`Thus the present invention may be configured to operate
`according to a numberoftraditionally distinct computing
`architectures. For example, a centrally located functional
`unit may be assignedthe role of arithmetic logic unit (ALU)
`with memories of surrounding functional units being con-
`figured to act as instruction caches,register files, and pro-
`gram counters. Wider data paths are accommodatedby tying
`near-neighbor ALUs to each other. Wider instructions are
`achieved by configuring instruction memories of separate
`functional units as if they were a single memory. For a
`different problem, the same integrated circuit may be recon-
`figured to emulate a single instruction multiple data (SIMD)
`architecture. The logic units of rows of functional units are
`tied together to create wider data paths, and the rows
`perform separate serial tasks.
`In specific embodiments, functional units may provide at
`least part of the instructionsto logic units of other functional
`units. Also,
`the configuration storage may hold multiple
`contexts of configuration control data for reconfiguration of
`the programmable interconnect.
`In other embodiments, the interconnect may support three
`different modes of operation: a static value in which a value
`set by the configuration data is provided to a functional unit
`or static source in which another functional unit serves as the
`value source. A dynamic source mode can be included in
`which the source is determined by the value from another
`functional unit.
`
`In still other embodiments, each logic unit can also have
`programmable logic arrays on data paths between functional
`units which perform bit level logic operations. Additionally,
`reduction logic can be added that performs logic operations
`on the output of the logic units and passes a result to other
`functional units as control information. Network drivers are
`
`assigned to each unit to transmit received signals to other
`functional units. The sources of the signals received by the
`drivers may also be dynamic so that the sources are pro-
`grammable by other functional units.
`the invention
`In general according to another aspect,
`features an integrated reconfigurable computing device,
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 23
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 23
`
`

`5,956,518
`
`3
`which has functional units of multi-bit arithmetic logic units
`and memories. A configurable interconnect that connects the
`units includes function ports which determine the source of
`the instructionsto the logic units. Network ports of the units
`are configurable by the functional units and determine the
`source of addresses to the memories and the source of data
`to the logic units and memories.
`In general accordingto still another aspect, the invention
`can also be characterized in the context of a method for
`organizing signal transmission within an array of functional
`units. Data read from the memories of functional units may
`be transmitted as instructions to the logic units of other
`functional units. Also, data read from logic units may be
`transmitted as addresses for the memories of other func-
`tional units. Finally, the data read from functional units can
`also be used as data inputs for the logic units of other
`functional units.
`
`In specific embodiments,the paths of the data and instruc-
`tions are dynamic in response to control from the functional
`units. More specifically, static values, values from other
`functional units, and values from sources may be transmitted
`between functional units.
`
`The above and other features of the invention including
`various novel details of construction and combinations of
`
`parts, and other advantages, will now be more particularly
`described with reference to the accompanying drawings and
`pointed out in the claims. It will be understood that the
`particular method and device embodying the invention are
`shown by way of illustration and not as a limitation of the
`invention. The principles and features of this invention may
`be employed in various and numerous embodiments without
`departing from the scope of the invention.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`In the accompanying drawings, reference characters refer
`to the same parts throughout the different views. The draw-
`ings are not necessarily to scale; emphasis has instead been
`placed upon illustrating the principles of the invention. Of
`the drawings:
`FIG. 1 shows a programmable integrated processing
`device of the present invention, which has been configured
`as an 8-bit microprocessor;
`FIG. 2 shows a SIMD processor configuration for the
`processing device according to the invention;
`FIG. 3 shows a 32-bit processor configuration for the
`processing device according to the invention;
`FIG. 4 shows a very long instruction word (VLIW)
`processor configuration for the processing device according
`to the invention;
`FIG. 5 shows multiple instruction multiple data (MIMD)
`processor configuration for the processing device according
`to the invention;
`FIG. 6 is a block diagram showing the architecture of a
`basic functional unit (BFU) core of the present invention;
`FIG. 7 is a block diagram showing the inter-BFU con-
`nectivity provided by the level-1 network connections;
`FIG. 8 is a block diagram showing the BFU interconnec-
`tion provided by the level-2 network connections;
`FIG. 9 is a block diagram showing the network switch
`architecture for a BFU of the present invention;
`FIG. 10 is a block diagram illustrating the function switch
`architecture of the present invention;
`FIG. 11 is a block diagram showing the address/data and
`network switch architecture of the present invention;
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`FIG. 12 is a block diagram illustrating the floating port
`architecture of the present invention;
`FIG. 13 is a block diagram showing the level-1 network
`drivers of the present invention;
`FIG. 14 showsthe level-2 drivers of the present invention;
`FIG. 15 showsthe level-3 drivers of the present invention;
`FIG. 16 shows BFU input registers of the present inven-
`tion;
`FIG. 17 showsthe reduction logic in the BFU control
`architecture of the present invention;
`FIG. 18 is an example of multi-BFU reduction performed
`by the reduction logic of the present invention;
`FIG. 19 is a block diagram illustrating the operation of the
`distributed programmable logic array (PLA) associated with
`each BFU according to the invention;
`FIG. 20 is a block diagram showing the control logic for
`a single BFU;
`FIG. 21 showsan alternative embodimentof the configu-
`ration memory supporting multiple contexts;
`FIG. 22 is a block diagram of the configurable logic
`device of the present invention in the form of an integrated
`chip;
`FIG. 23 is a block diagram showing the input/output port
`architecture for the chip of the present invention;
`FIG. 24 is a block diagram showingthestructure of an I/O
`register according to the invention;
`FIG. 25 is a block diagram of a programmablelogic array
`for customizing the chip’s interface;
`FIG. 26 is a block diagram showing the movementof data
`from the BFU core off-chip according to the invention;
`FIG. 27 is a block diagram of a selector switch that
`chooses the core outputs to be driven on an output wire
`according to the invention;
`FIG. 28 is a block diagram showinga tri-state buffer used
`in the selector switch of FIG. 20;
`FIG. 29 is a block diagram illustrating how data enters the
`BFU core from off-chip;
`FIG. 30 is a block diagram showing the selector switch
`that selects among incoming data bytes from I/O ports and
`PLAs according to the invention;
`FIG. 31 is a block diagram of a C/R input architecture
`according to the invention;
`FIG. 32 is a block diagram showing the construction of
`the controller switches of the level-3 network lines accord-
`ing to the invention;
`FIG. 33 is a block diagram illustrating the dynamic
`control of the controller switches, which is shared between
`pairs of controllers at each column, according to the inven-
`tion;
`FIG. 34 shows the architecture of one of the dynamic
`control switches according to the invention;
`FIG. 35 is a block diagram showing the connectivity of
`BFUs in a systolic-type configuration according to the
`invention;
`FIG. 36 shows the configuration of the BFUs for a
`microcoded-type implementation for the convolution prob-
`lem according to the invention;
`FIG. 37 showsthe organization of the BFUs for a VLIW,
`horizontal microcode-type implementation according to the
`invention; and
`FIG. 38 showsthe organization of the BFUs for a VLIW/
`MSIMD-type implementation according to the invention.
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 24
`
`Petitioner Microsoft Corporation - Ex. 1019, p. 24
`
`

`5,956,518
`
`5
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`
`FIG. 1 shows a multi-bit microprocessor configuration of
`a reconfigurable processing device, which has been con-
`structed and programmed according to the principles of the
`present invention. A two-dimensional array of basic func-
`tional units 100 are located in a programmable interconnect
`101. Five of the BFUs 100 and the portion of the reconfig-
`urable interconnect connecting the BFUs have been config-
`ured to operate as a microprocessor 102.
`Each of the BFUs 100 preferably has addressable memory
`resources and logic resources, such as an 8-bit arithmetic
`logic unit (ALU). One of the BFUs 100, denoted ALU,
`utilizes its logic resources to perform the logic operations of
`the microprocessor 102 andutilizes its memory resources as
`a data store and/or extended register file. Another BFU
`operates as a function store F that controls the successive
`logic operations performed by the logic resources of the
`ALU. Two additional BFUs, A and B, operate as further
`instruction stores that control the addressing of the memory
`resources of the ALU. A final BFU, PC, operates as a
`program counter for the various instruction BFUsF, A, B.
`As shown in FIG. 2, the same reconfigurable processing
`array, however, may be reprogrammed to function as a
`SIMDsystem, and as described below,this reconfiguration
`can occur on a cycle-by-cycle basis. The functions of the
`program counter PC andinstruction stores A, B and F have
`been again assigned to different BFUs 100, but the ALU
`function has been replicated into 12 BFUs. Each of the
`ALUsis connected via the reconfigurable interconnect 101
`to operate on globally broadcast
`instructions from the
`instruction stores A, B, F. These same operations are per-
`formed by each of these ALU, or commoninstructions may
`be broadcast on a row-by-row basis.
`FIG. 3 shows how wider data paths can be constructed in
`the programmable device. This 32-bit microprocessor con-
`figured device has the same instruction stores A, B, F and
`program counter as described in connection with FIG. 1.
`Four BFUs, however, have been assigned an ALU operation,
`and the ALUsare chained together to act as a single 32-bit
`wide microprocessor in which the interconnect 101 supports
`carry-in and carry-out operations between the ALUs.
`FIG. 4 showshow the device can be configured to operate
`as a very long instruction word (VLIW)system. The various
`instruction stores A, B, F are defined to encompass multiple
`BFUs 100 to accommodate the desired instruction word
`width.
`
`FIG. 5 showsthe configuration of the present system to
`operate as a multiple instruction multiple data (MIMD)
`system. The 8-bit microprocessor configuration 102 of FIG.
`Lis replicated into an adjacent set of BFUs to accommodate
`multiple,
`independent processing units within the same
`device. Of course, wider data paths could also be accom-
`modated by chaining ALUs of each processor 102 to each
`other.
`1. Basic Functional Unit Architecture
`
`FIG. 6 shows the moderately coarse grain, preferably
`8-bit, BFU core. Primarily, the BFU core has memory block
`110, basic ALU core 120, and configuration memory 105.
`The main memory block 110 is a 256 wordx8 bit wide
`memory, whichis arranged to be used in either single or dual
`port modes. In dual port mode, the memory size is reduced
`to 128 words in order to be able to perform the two
`simultaneous read operations without increasing the read
`latency of the memory. The memory modeis controlled by
`control logic 114 accessed through a Memory/Mux function
`
`10
`
`15
`
`20
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`port 112, and the write enable can be controlled either
`through the memory/mux function port 112 or by the control
`logic 134 accessed through ALU function port 132. Control
`logic is hardwired and also controls the ALU functions.
`In single port mode, the memory 110 uses the A_ADR
`port for an address and outputs the selected value to both
`A_PORT and B_PORT. In dual port mode, the A_ADR
`port selects a value for A_PORT only, and B_ADR port
`selects a value for the B_PORT.
`In either mode the read operation takes place during the
`first

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases