throbber
United States Patent c19J
`Trim berger
`
`[54) REPROGRAMMABLE INSTRUCTION SET
`ACCELERATOR
`
`[76)
`
`Inventor: Stephen M. Trimberger. 1261 Chateau
`Dr .. San Jose. Calif. 95120
`
`[ *] Notice:
`
`The term of this patent shall not extend
`beyond the expiration date of Pat. No.
`5.737.631.
`
`[21) Appl. No.: 417,337
`
`[22) Filed:
`
`Apr. 5, 1995
`
`Int. Cl. 6
`••·•••·••••••·••••••••••••••••• G06F 1sn6: G06F 9/30
`[51]
`[52) U.S. Cl . ....................................... 395/800.37; 395/376
`[58) Field of Search ..................................... 395/430. 384.
`395/385. 386. 387. 388. 389. 800.37. 376;
`326/39
`
`[56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`Re. 34,363
`5.109.503
`5,301.344
`5,321.845
`5,336.950
`5,361.373
`5.386.518
`5.430.734
`5.471.593
`5.511.173
`5.517.628
`5.535.406
`5.537 .601
`5.574.930
`
`8/1993 Freeman .................................. 307/465
`4/1992 Cruickshank et al. .... .............. 395/500
`4/1994 Kolchinsky ......... .... ................ 395/800
`6/1994 Sawase et al ........................... 395/800
`8/1994 Popli et al. ......... ...... .............. 307 /465
`11/1994 Gilson .............. ... .................... 395/800
`1/1995 Reagle et al. ....... ...... .............. 395/325
`7/1995 Gilson .................................... 371/22.2
`11/1995 Branigin .................................. 395/375
`....................... 395/375
`4/1996 Yamaura et al.
`5/1996 Morrison et al. ....................... 395/375
`7 /1996 Kolchinsky ............................. 395/800
`7/1996 Kimura et al ........................... 395/800
`11/1996 Halverson, Jr. et al ................ 395/800
`
`OTHER PUBLICATIONS
`
`DeHon. A. "DPGA-Coupled Microprocessors: Commodity
`!Cs for the Early 21st Century". M.I.T. Transit Project.
`Transit Note #100. from internet site http://www.ai.mit.edu/
`
`l/0
`REGS
`
`115
`
`IOI
`
`103
`
`REG
`FILE
`
`PRNATE
`REGS
`
`108
`
`TO
`EXTNL
`MEMORY
`
`I 1111111111111111 11111 111111111111111 111111111111111 lllll 111111111111111111
`US005737631A
`[11) Patent Number:
`[45) Date of Patent:
`
`5,737,631
`*Apr. 7, 1998
`
`projects/transit/tnlO0/tnlOO.html. Jan. 29, 1994.
`French, P. et al. "A Self-Reconfiguring Processor", Proceed(cid:173)
`ings from 1993 Workshop on FPGAs for Custom Computing
`Machines, IEEE. pp. 50-59. 1993.
`
`(List continued on next page.)
`
`Primary Examiner-Tod R. Swann
`Assistant Examiner-Conley B. King, Jr.
`Attome)I Agent, or Finn-Mark A. Haynes; Adam H.
`Tachner: Jeanette S. Harms
`
`[57]
`
`ABSTRACT
`
`A microprocessor comprises a defined execution unit
`coupled to internal buses of the processor for execution of a
`predefined set of instructions. combined with a program(cid:173)
`mable execution unit coupled to the internal buses for
`execution of a programmed instruction providing an on chip
`reprogrammable instruction set accelerator RISA. The pro(cid:173)
`grammable execution unit may be made using a field pro(cid:173)
`grammable gate array having a configuration store. and
`resources for accessing the configuration store to program
`the programmable execution unit. An instruction register is
`included in the data processor which holds a current instruc(cid:173)
`tion for execution. and is coupled to an instruction data path
`to supply the instruction to the defined instruction unit and
`to the programmable instruction unit in parallel. through
`appropriate decoding resources. A condition code register is
`coupled to instruction fetching resources, and connected to
`receive condition codes from both the defined execution unit
`and from the programmable execution unit. The program(cid:173)
`mable execution unit includes logic to signal the instruction
`fetching resources to provide a next instruction when execu(cid:173)
`tion of the programmed instruction is done. Resources for
`accessing the configuration store to program the program(cid:173)
`mable execution unit are provided. which can utilize the
`internal buses of the data processor or be completely inde(cid:173)
`pendent of them.
`
`40 Claims, 3 Drawing Sheets
`
`100
`i
`
`A
`
`t:Wt y
`Bl cc
`
`,...,
`151
`120
`
`A RISA
`FPGA y
`B
`I CC PD CD
`
`123
`
`122
`
`102
`
`110
`
`121
`
`114
`
`125 'j
`
`126
`
`TOEXTNL
`MEMORY
`
`Intel Exhibit 1037 - 1
`
`

`

`5,737,631
`Page 2
`
`Iseli. C. et al. "Spyder: A Reconfigurable VLIW Processor
`using FPGAs". Proceedings from 1993 Workshop on
`FPGAs for Custom Computing Machines. IEEE. pp. 17-24.
`1993.
`Casselman. S. "Virtual Computing and The Virtual Com(cid:173)
`puter". Proceedings from 1993 Workshop on FPGAsfro
`Custom Computing Machines. pp. 43-48. 1993.
`Trim.berger. S. "A Reprogrammable Gate Array and Appli(cid:173)
`cations". Proceedings of the IEEE. pp. 1030-1041. Jul.
`1993.
`Hennessy J. et al. Computer Architecture: A Quantitative
`Approach. Chapter 5. Appendix E. 1990.
`Box. B .. "Field Programmable Gate Array Based Reconfig(cid:173)
`urable Preprocessor". Apr. 10. 1994: IEEE. pp. 40-48.
`DeHon. A.. "DPGA-Coupled Microprocessors: Commodity
`ICs for the Early 21st Century". Apr. 10. 1994. IEEE. pp.
`31-39.
`
`Thorson. M .. "General-Purpose Coprocessors". E-Mail
`Transcript. Jul. 3. 1992. 5 pages.
`
`Razdan. R.. "PRISC: Programmable Reduced Instruction
`Set Computers". Doctor of Philosophy Thesis. May 1994.
`116 pages.
`
`Razdan, R.; Brace. K; and Smith. M.; "PRISC Software
`Acceleration Techniques". IEEE. May 1994. pp. 145-149.
`
`Trimberger. S .• "Field-Programmable Gate Array Technol(cid:173)
`ogy". Design Applications. Section 2.6. pp. 68-90. Copy(cid:173)
`right 1994.
`
`Wirthlin. M.. Hutchings. Brad. Gilson. K.. 'The Nano
`Processor: a Low Resource Reconfigurable Processor". Apr.
`10. 1994. IEEE. pp. 23-30.
`
`Intel Exhibit 1037 - 2
`
`

`

`U.S. Patent
`
`Apr. 7, 1998
`
`Sheet 1 of 3
`
`5,737,631
`
`12
`
`INSTR
`\1EM ORY .,.__-----.
`
`11
`
`13
`
`DATA
`\1EMORY
`
`15
`
`DISK
`
`25
`
`22
`
`33
`FIXED
`REG
`FILE E-UNIT
`
`TOCONFIG
`RESOURCES
`
`35
`
`30
`
`PRGM CONRG
`E-UNIT
`STORE
`
`I
`
`31
`
`IR 1 -4 - - - - l -~ - - '
`
`26
`
`28
`
`20
`
`MP
`
`RISA
`
`21
`
`DATA
`PORTS
`
`37
`
`(TO VIDEO, AUDIO.
`ISOLATED MEMORY.
`ETC.)
`
`14
`
`36
`
`USER
`INTERFACE
`
`16
`
`OTHER
`PROCESSING
`RESOURCES
`
`FIG.1
`
`Intel Exhibit 1037 - 3
`
`

`

`e • 00 •
`
`105
`
`EXTNL
`DATA/
`INSTR/
`ADDR
`PORTS
`
`-------1IAR
`115
`
`TO
`EXTNL
`MEMORY
`
`101
`
`103
`
`141
`
`104
`
`1/0
`106 REGS
`
`REG
`FILE
`
`INCR
`
`PRIVATE
`REGS
`
`140
`
`109
`
`108
`
`l00
`
`A
`EXEC y
`UNIT
`BI CC
`
`150
`,....,,
`
`131
`
`151
`,....,,
`120
`
`FPGA
`WORK
`REGS
`
`A RJSA
`FPGA Y
`
`B
`I CC PD CD
`
`130
`
`102
`
`123
`
`122
`
`EXTNL
`CONFIG
`AND
`DATAI/O
`PORT
`
`cc l e - - - - - - t -~
`121
`110
`t+----~REGl.-----~------'---r-
`125
`
`TOEXTNL
`MEMORY
`
`126
`
`114
`DECODER I-----'-----___,_ ___ ~
`INST
`'----•• REG 1 - - - - -+ i
`OPCODE
`112
`
`111
`
`FIG.2
`
`Intel Exhibit 1037 - 4
`
`

`

`U.S. Patent
`
`Apr. 7, 1998
`
`Sheet 3 of 3
`
`5,737,631
`
`DIP OPCODE
`
`A
`
`B
`
`y
`
`IMMED
`
`200
`
`D
`
`ADD
`
`R3
`
`R4
`
`R5
`
`xxxx
`
`2 2
`
`203
`
`204
`
`2 5 2 6
`
`207
`
`20l
`
`p FFGA~ Rx
`
`Ry Rz
`
`2 8
`
`209
`
`2IO
`2 I
`212
`FIG. 3
`
`xxxx
`
`213
`
`OPCODE
`
`A
`
`B
`
`y
`
`IMMED
`
`250
`
`ADD
`
`R3
`
`R4
`
`R5
`
`xxxx
`
`2 1
`
`252
`
`253
`
`254
`
`255
`
`256--
`
`FPGAOP
`
`Rx
`
`Ry
`
`PGMD
`Rz OPCODE
`
`2 7
`
`258
`
`2 9
`
`2 0
`
`261
`
`FIG.4
`
`FIXED
`OPCODE
`
`A
`
`B
`
`y
`
`PGM
`OPCODE
`
`C
`
`IMMED
`
`AND
`
`210
`
`R3
`
`27ll
`
`R4 I RS I FffiA~ IR22 I
`t
`l
`21s
`273
`274
`
`27~
`
`xxxx
`
`27~
`
`FIG. 5
`
`Intel Exhibit 1037 - 5
`
`

`

`1
`REPROGRAMMABLE INSTRUCTION SET
`ACCELERATOR
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`The present invention relates to techniques to improve the
`speed of microprocessors using reprogrammable hardware;
`and more particularly to the use of reprogrammable execu(cid:173)
`tion units in parallel with predefined execution units in a data
`processing system.
`2. Description of Related Art
`General purpose computers based on current micropro(cid:173)
`cessors have a single predefined instruction set. The instruc(cid:173)
`tion set is devised to improve the speed of a large number of 15
`typical applications. given a limited amount of logic with
`which to implement the instructions. General purpose pro(cid:173)
`cessors include so called complex instruction set computers
`(CISC) which have sophisticated instruction sets designed
`for commercial settings. and reduced instruction set com(cid:173)
`puters (RISC). where the basic instruction set is kept to a
`minimum that gives good performance over a broad range of
`applications.
`In the interest of good overall performance for a given
`amount of logic. general purpose processors using both
`CISC and RISC approaches leave off instructions that may
`be beneficial for some problems. Therefore. the left off
`instructions must be replaced with a sequence of the pre(cid:173)
`defined instructions. so that these special problems take
`longer to solve. Some types of instructions that are com(cid:173)
`monly left off general purpose processors include floating
`point arithmetic. graphics manipulations. and bit field
`extraction used in encryption. data compression and image
`processing.
`If the need for a given instruction is great enough in some
`applications. some users may benefit from a special purpose
`instruction set accelerator. The instruction set accelerator
`intercepts the instructions and interprets them in place of the
`general purpose processor. This has been done with floating
`point arithmetic (for example. the Intel 387 class of floating
`point processors) and for graphics operations. This solution
`is only cost effective if many or most computer users need
`the additional speed for those operations so that the cost of
`developing the special purpose hardware accelerator for a
`specific instruction is shared. Although most computer users
`can use fast graphics, very few need, for example. fast
`encryption. Thus. special purpose instruction set accelera(cid:173)
`tors may not be developed for the encryption algorithm,
`even though great improvements in performance could be
`achieved.
`Special purpose processors have been built to provide
`very high speed solutions to compute intensive problems,
`such as encryption and image processing. These processors
`replace not a single instruction. but whole programs.
`Because only a few people need these special processors,
`they are expensive but provide a huge improvement in
`performance. Instructions to a special purpose processor are
`typically sent by commands from the host general purpose
`processor. however. not as instructions that the special 60
`purpose processor intercepts. Thus, special interface soft(cid:173)
`ware is required to access the special purpose processor.
`Computer users are also faced with the prospect of
`numerous instruction set accelerators and special purpose
`processors in their computers. one for each special applica- 65
`tion they may do. This adds to size. weight and expense of
`computers. Further. even the commonly used special opera-
`
`2
`tions are not needed always. so most of the hardware
`accelerators will be unused at any one time.
`An alternative technique for reconfiguring a general pur(cid:173)
`pose processor involves a use of writable microstores. One
`5 method of building a general purpose processor implements
`instructions by emulating them with microcode. Microcode
`comprises instructions that control flow through a set of
`functional units in the microprocessor. Each instruction on
`the general purpose processor is emulated by several micro-
`10 instructions. The general purpose processor has a microcon(cid:173)
`troller that reads the microcode from the microstore. and
`uses the instruction value to determine where in the micros(cid:173)
`tore to execute and what to do to perform the logic for the
`instruction.
`Typically. a manufacturer stores microcode in read only
`memory. However. some microprogrammed computers have
`been built with a writable microstore. In these machines. a
`user can write a program that emulates a new instruction set.
`However. these systems require an embedded
`20 microcontroller. and deal with a fixed set of functional units.
`Also. the microinstruction fetching technique divides each
`instruction into a number of instructions. rather than replac(cid:173)
`ing a slow instruction with a fast one. Thus. these systems
`have limited use for improving performance of systems that
`25 need special purpose instructions.
`One prior art approach to improving performance of
`general purpose processors involves the use of field pro(cid:173)
`grammable gate array (FPGA) logic configured as a
`co-processor attached to the same bus as the host processor.
`See for example U.S. Pat. No. 5.361373 to Gilson. This
`approach involves capturing entire sub-routines. detecting
`when the host CPU enters the captured sub-routine. and then
`taking over execution of the programmed function in the
`35 FPGA hardware. When the function completes. the FPGA
`returns control to the host CPU itself. However. this
`approach requires complex coordination with the host CPU.
`including maintaining CPU state and the like while the field
`programmable co-processor executes the sub-routine. The
`40 cost of overhead of the process. such as maintaining and
`restoring CPU state via dump and un-dump operations on
`the CPU. limits application of the field programmable
`co-processor to relatively complex sub-routines.
`Such reprogrammable hardware accelerators. like dedi-
`45 cated special purpose processors before them, are targeted at
`huge speed improvements in large scale operations.
`Therefore. their applicability is limited. They tend to be
`large and complicated, and only help with a limited number
`of special problems. Further. interfacing to such devices is
`50 non-standard. because they do not interpret instructions on
`the microprocessor bus.
`Accordingly, it is desirable to provide a technique for
`using reprogrammable logic to accelerate special instruc(cid:173)
`tions for a general purpose processor which is practical to
`55 implement. and provides a significant performance improve(cid:173)
`ment over prior art systems. This will provide the ability to
`a user to reprogram the host processor such that it includes
`an instruction set based on defined and programmed instruc(cid:173)
`tions optimized for the users particular applications.
`
`30
`
`SUMMARY OF THE INVENTION
`_ The present invention provides a technique providing a
`reprogrammable instruction set accelerator (RISA). The
`RISA can be programmed with reprogrammable logic to do
`small scale data manipulation and computation, just like the
`instruction set accelerators currently in use. Furthermore, the
`RISA can be tightly coupled to instruction and data paths in
`
`Intel Exhibit 1037 - 6
`
`

`

`5,737,631
`
`3
`parallel with the predefined execution units in microproces(cid:173)
`sors. This provides fast and efficient execution of new
`instructions and a significant improvement in performance.
`The RISA provides the capability for users to program
`instructions that may be difficult to do in the general purpose 5
`processor. and not in wide enough use to warrant a hardware
`accelerator. Further, the reprogrammable instruction set
`accelerator can be reprogrammed with different instructions
`at different times. saving space and cost in the computer.
`Some examples of the kinds of operations that may be 10
`implemented using the reprogrammable instruction set
`accelerator include the following:
`bit rotation/field extraction. used in instruction emulation
`or encryption or decryption;
`on - bit counting:
`polynomial evaluation:
`spreadsheet resolution. such as an instruction that calcu-
`lates each cell in the spreadsheet to be resolved;
`searching in a document;
`spell-checking;
`database access routines;
`procedure invoke/return operations:
`programming language interpreters:
`emulation of another processor; and
`context switching for multi-processing.
`The instruction set accelerator may be reprogrammed for
`each program that runs on the computer. or a manufacturer
`may select a few instructions and ship them stored with the
`computer. A program may reprogram the reprogrammable
`instruction set accelerator several times during the program
`to speed up different parts of the program. The logic space
`in the reprogrammable instruction set accelerator may be
`allocated by the computer system. and instruction sets
`swapped. using operations similar to overlays. virtual
`memory or caching.
`Accordingly. the present invention can be characterized as
`a data processor which comprises a defined execution unit
`coupled to internal buses of the processor for execution of a
`predefined set of instructions. combined with a program(cid:173)
`mable execution unit coupled to the internal buses for
`execution of a programmed instruction. The programmable
`execution unit may comprise a field programmable gate
`array having a configuration store. and resources for access(cid:173)
`ing the configuration store to program the programmable
`execution unit.
`In one aspect of the invention. an instruction register is
`included in the data processor which holds a current instruc(cid:173)
`tion for execution. and is coupled to an instruction data path
`to supply the instruction to the defined execution unit and to
`the programmable instruction unit in parallel. through
`appropriate decoding resources.
`The processor may include instruction fetching resources
`and other logic which are responsive to condition codes. A
`condition code register is connected to receive condition
`codes from both the defined execution unit and from the
`programmable execution unit.
`In addition. because the programmable execution unit
`may be reprogrammed after manufacture. the timing for
`execution of a programmed instruction may not be well
`predictable. Thus. the programmable execution unit includes
`logic to signal the instruction fetching resources to provide
`a next instruction when execution of the programmed
`instruction is done.
`The programmable execution unit may comprise a con(cid:173)
`figuration store, and resources for accessing the configura-
`
`4
`tion store to program the programmable execution unit are
`provided. which according to one alternative utilize the
`internal buses of the data processor. Thus. the programmable
`execution unit may be reconfigured under control of the
`defined execution unit. Alternatively. the programmable
`execution unit may include a configuration port which is
`independent of the internal buses of the data processing
`system. which allows access to the configuration store for
`reprogramming the programmable execution unit through a
`separate port.
`The instructions according to one aspect of the invention
`will have a pre-specified format. including an opcode field
`specifying an operation by one of the defined and program(cid:173)
`mable execution units, and plurality of address fields speci-
`15 fying addresses of operand data and result data. The pre(cid:173)
`specified format according to one alternative may include a
`defined/programmed flag. specifying a defined or pro(cid:173)
`grammed instruction. Thus. the decoder will be responsive
`to the flag to enable or disable the programmable execution
`20 unit for the purposes of access to the internal buses and
`register files on the device. According to another alternative.
`the pre-specified format may include an immediate data
`field. such that programmed instructions use the opcode field
`to identify the instruction as a programmed instruction. and
`25 the immediate data field to identify a programmed operation.
`A third alternative instruction format includes both an
`opcode for the defined execution unit and an opcode for the
`programmable execution unit.
`Accordingly. the present invention provides a new
`30 method for executing a computer program which includes a
`particular function. The method includes providing a defined
`instruction execution unit and programmable instruction
`execution unit in parallel with the defined instruction execu(cid:173)
`tion unit. The programmable instruction execution unit is
`35 programmed to execute at least a portion of the particular
`function in response to a programmed instruction. A
`sequence of instructions is supplied including the defined
`instructions and the programmed instruction. The defined
`instructions are executed in the sequence in the defined
`40 instruction execution unit and the programmed instruction is
`executed in the programmable instruction execution unit.
`The programmable instruction execution unit can be repro(cid:173)
`grammed when the user changes from one application to the
`next using a configuration port for the programmable
`45 instruction execution unit.
`Thus. the reprogrammable instruction set accelerator may
`be programmed through internal processor data paths or
`through separate, dedicated programing paths initiated by an
`instruction from the general purpose. defined execution unit
`50 Selecting the instructions to emulate can be done manually.
`by inspecting the instructions or procedures to be executed.
`and crafting new instructions to implement chosen function(cid:173)
`alities. A compiler may be used to automate the addition of
`programmed instructions to programs it compiles to improve
`55 their speed.
`Instruction selection may also be done automatically. by
`profiling the program to see how frequently various proce(cid:173)
`dures or lines of code are used. then replacing them with
`programmed instructions. This requires software to compile
`60 from the programing language to logic gates. This capability
`now exists with high level synthesis from VHDL. VER(cid:173)
`ILOG or other languages for specifying logic. Given such
`capability. it is simple to visualize a high level synthesis
`system that takes a high level programming language such
`65 as "C" as input and generates logic for the RISA.
`Extraction of instructions may also be done on the fly. by
`profiling analysis of the instructions to the general purpose
`
`Intel Exhibit 1037 - 7
`
`

`

`5,737,631
`
`5
`processor. then compiling from those instructions to logic in
`the reprogrammable instruction set accelerator. This profil(cid:173)
`ing may be done before hand. or may be done during
`execution. In the latter case. the computer learns which
`instructions are frequently executed and optimizes them as
`it runs.
`Alternatively. many instructions for the RISA can be
`defined in advance and the compiler may use all or a subset
`of these additional instructions when compiling. The needed
`instructions are loaded into the RISA when the program 10
`runs.
`Accordingly. the present invention provides a technique
`for improving the performance and flexibility of general
`purpose processors based on the use of reprogrammable
`logic techniques. The invention provides greater perfor- 15
`mance improvements and more flexibility than prior art
`attempts to optimize instruction execution in general pur(cid:173)
`pose processors.
`Other aspects and advantages the present invention can be
`seen upon review of the drawings. the detailed description
`and the claims which follow.
`
`BRJEF DESCRIPTION OF THE FIGURES
`
`FIG. 1 is a simplified block diagram of data processing
`system utilizing the reprogrammable instruction set accel(cid:173)
`erator (RISA) according to the present invention.
`FIG. 2 is a schematic diagram of a integrated circuit
`microprocessor including a defined execution unit and a
`reprogrammable execution unit according to the present
`invention.
`FIG. 3 illustrates one example instruction format for use
`according to the present invention.
`FIG. 4 illustrates an alternative example instruction for(cid:173)
`mat for use according to the present invention.
`FIG. 5 illustrates another alternative example instruction
`format for use according to the present invention.
`
`DETAILED DESCRIPTION OF THE DRAWINGS
`
`6
`which includes a configuration store 31. The field program(cid:173)
`mable gate array 30 is coupled to the internal buses 22 and
`23. and supplies a result through multiplexer 2S to bus 22.
`The instruction path through instruction register 26 supplies
`5 an instruction to the field programmable gate array 30 in
`substantially the same manner and timing as it supplies
`instructions to the defined execution unit 24. The field
`programmable gate array 30 also supplies conditions codes
`through multiplexer 27 to the condition code register 28.
`A configuration store 31 is coupled with the field pro(cid:173)
`grammable gate array 30. The configuration store 31 may
`accessible through a dedicated port generally 35. or by
`means of the internal buses 22 and 23 for dynamically
`reprogramming in a field programmable gate array 30.
`In the embodiment illustrated in FIG. 1. the RISA 21 is
`implemented using field programmable gate array logic. The
`field programmable gate array logic may take a variety of
`forms. such as the dynamically reconfigurable architecture
`described in our co-pending U.S. patent entitled A PRO-
`20 GRAMMABLE LOGIC DEVICE WHICH STORES
`MORE THAN ONE CONFIGURATION AND MEANS
`FOR SWITCHING CONFIGURATIONS. U.S. Pat. No.
`5.426378. issued Jun. 20. 1995. invented by Randy T. Ong.
`which was owned at the time of invention and is currently
`25 owned by the same assignee as the present application. and
`which is incotporated by reference as if fully set forth
`herein. Alternative programmable logic structures may be
`utilized. For instance. the RAM based configuration store of
`typical FPGA designs may be replaced using reprogram-
`30 mable non-volatile stores such as EEPROM. Also. the
`configuration store may be programmable during manufac(cid:173)
`ture rather than dynamically. Thus. the manufacturer may
`use more permanent programming technology such as.
`anti-fuses or the like to configure a new instruction into a
`35 previously defined instruction set.
`As shown in FIG. 1. the field programmable gate array is
`used as an execution unit which is reprogrammable. and
`connected in parallel with the defined execution unit 24. The
`system expects the field programmable gate array 30 to
`40 return results in a same manner as the standard defined
`instruction execution unit. The field programmable gate
`array 30 in the embodiment shown uses the same write back
`path as the defined execution unit. However. a separate write
`back path may be used for the FPGA 30 if desired.
`The RISA 21 includes an optional data port or ports 36.
`37 dedicated for use by the RISA 30. The data port 36 is
`coupled to the system memory 12. 13 across bus 11. The
`data port 37 is coupled to an external data source. such as a
`50 video data source. an audio data source or memory isolated
`from the system bus 11.
`The field programmable gate array 30. executes instruc(cid:173)
`tions that take an amount of time which is not predictable
`prior to configuration. and do not match well with the
`55 pipeline speed of the fixed execution unit 24. Thus. logic is
`included to hold the processor during execution of a pro(cid:173)
`grammed instruction. and to signal the processor when the
`programmed instruction is complete.
`In addition. one embodiment of the RISA 21 operates in
`60 an overlapped fashion with the defined instruction execution
`unit 24 for some operations. taking advantage of the parallel
`execution units in the system for greater performance.
`FIG. 2 provides a more detailed block diagram of an
`integrated circuit microprocessor which includes the RISA
`according to the present invention. Those skilled in the art
`will recognize the basic components of the microprocessor.
`Thus the figure is intended to represent wide variety of
`
`45
`
`A detailed description of the preferred embodiments of
`the present invention is provided with reference to the
`figures. in which FIG. 1 illustrates data processing system
`using the reprogrammable instruction set accelerator accord(cid:173)
`ing to the present invention. As shown in the figure. the data
`processing system includes a host CPU generally 10 coupled
`to the system bus 11. Also coupled to the buses are instruc(cid:173)
`tion memory 12. data memory 13. user interface 14. possibly
`a disk drive 1S. and other processing resources generally 16.
`The host CPU is made up of the basic microprocessor (MP)
`components generally 20. and a reprogrammable instruction
`set accelerator (RISA) generally 21. The basic microproces(cid:173)
`sor includes internal buses 22 and 23. also called buses A and
`B. respectively. A defined instruction execution unit 24 (or
`ALU) is coupled to buses 22 and 23. and supplies a result
`through a multiplexer 2S to bus 22. A register file 33 is
`accessible across the buses 22 and 23. An instruction path
`schematically represented by the instruction register IR 26
`supplies an instruction to the defined instruction execution
`unit 24. Also. the defined instruction execution unit 24
`generates condition codes which are supplied through mul(cid:173)
`tiplexer 27 to a condition code register 28 which is used by
`the processor as well known in the art in instruction sequenc(cid:173)
`ing and the like.
`In parallel with the general purpose microprocessor is the 65
`reprogrammable instruction set accelerator RISA 21. The
`RISA 21 comprises a field programmable gate array 30.
`
`Intel Exhibit 1037 - 8
`
`

`

`5,737,631
`
`7
`microprocessor architectures. including complex instruction
`set processors. reduced instruction set processors, and other
`data processing architectures.
`Thus. the system includes a execution unit 100 which is
`optimized for a predefined instruction set. The execution 5
`unit 100 is coupled to internal buses 101 and 102 for
`receiving operand data at its ports A and B. and supplying
`result data on its port Y back to one of the internal buses. e.g.
`bus 101. A register file 103. and input/output registers 104
`are also coupled to the buses 101 and 102. and act as sources 10
`for operands. and locations to store result data. The registers
`accessible by the executing unit 100 include private registers
`140 which are dedicated for use by the defined execution
`unit 100. and not by the programmable execution unit
`(FPGA RISA 120 described below). In an optional 15
`embodiment. the private registers 140 are directly coupled
`with the defined execution unit 100 as indicated by line 141.
`In another embodiment. internal bus 101 is broken into
`two sections at point 150. thereby allowing independent
`operation of execution unit 100 and RISA 120. In yet 20
`another embodiment. internal bus 101 is broken into two
`sections at point 151. thereby allowing independent outputs
`with shared inputs.
`Also. coupled to the internal buses 101 and 102 is an
`instruction address register 105. and other resources in the 25
`instruction address fetching path. The instruction address
`register 105 includes basic incrementing logic 106 for
`sequencing through a sequence of instruction addresses for
`an instruction memory off chip. Also. an instruction control
`state machine 107 is coupled to the instruction address 30
`register 105 for managing the instruction stream as known in
`the art. The instruction control state machine 107 is also
`coupled to other components on the chip as appropriate.
`A condition code register 108 supplies condition codes to 35
`the instruction control state machine 107 involved in the
`instruction sequencing decisions. and also as indicated by
`arrow 109 to other processing resources in the system that
`suits the particular implementation. Condition codes are
`generated on line 110 by the defined execution unit 100 as 40
`known in the art.
`An instruction register 111 receives instructions generated
`in response to instruction address register 105. Instructions
`are supplied from the instruction register 111 to decoder
`resources 112. The decoder resources 112 supply control
`signals. generally represented by arrow 113. throughout the
`device to control accessing of the register files. bus timing.
`and other functions on the processor. The decoder also
`generates an opcode on line 114 which is supplied at the
`instruction input I on the defined execution unit 100.
`External data. instruction. and address ports 115 are
`included as known in the art. for managing flow of data.
`instructions and addresses into and out of the chip. The
`external data/instruction/address ports 115 are coupled to the
`1/0 registers 104. the instruction address register 105. and 55
`the instruction register 111.
`According to the present invention. a reprogrammable
`instruction set accelerator is included on the chip. Thus. a
`RISA field programmable gate array (RISA FPGA) 120 is
`coupled to the internal buses 101 and 102. to receive 60
`operand data at ports A and B and supply result data at port
`Y to and from the buses. The opcode from line 114 is
`supplied at an ins

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket