`Trim berger
`
`[54) REPROGRAMMABLE INSTRUCTION SET
`ACCELERATOR
`
`[76)
`
`Inventor: Stephen M. Trimberger. 1261 Chateau
`Dr .. San Jose. Calif. 95120
`
`[ *] Notice:
`
`The term of this patent shall not extend
`beyond the expiration date of Pat. No.
`5.737.631.
`
`[21) Appl. No.: 417,337
`
`[22) Filed:
`
`Apr. 5, 1995
`
`Int. Cl. 6
`••·•••·••••••·••••••••••••••••• G06F 1sn6: G06F 9/30
`[51]
`[52) U.S. Cl . ....................................... 395/800.37; 395/376
`[58) Field of Search ..................................... 395/430. 384.
`395/385. 386. 387. 388. 389. 800.37. 376;
`326/39
`
`[56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`Re. 34,363
`5.109.503
`5,301.344
`5,321.845
`5,336.950
`5,361.373
`5.386.518
`5.430.734
`5.471.593
`5.511.173
`5.517.628
`5.535.406
`5.537 .601
`5.574.930
`
`8/1993 Freeman .................................. 307/465
`4/1992 Cruickshank et al. .... .............. 395/500
`4/1994 Kolchinsky ......... .... ................ 395/800
`6/1994 Sawase et al ........................... 395/800
`8/1994 Popli et al. ......... ...... .............. 307 /465
`11/1994 Gilson .............. ... .................... 395/800
`1/1995 Reagle et al. ....... ...... .............. 395/325
`7/1995 Gilson .................................... 371/22.2
`11/1995 Branigin .................................. 395/375
`....................... 395/375
`4/1996 Yamaura et al.
`5/1996 Morrison et al. ....................... 395/375
`7 /1996 Kolchinsky ............................. 395/800
`7/1996 Kimura et al ........................... 395/800
`11/1996 Halverson, Jr. et al ................ 395/800
`
`OTHER PUBLICATIONS
`
`DeHon. A. "DPGA-Coupled Microprocessors: Commodity
`!Cs for the Early 21st Century". M.I.T. Transit Project.
`Transit Note #100. from internet site http://www.ai.mit.edu/
`
`l/0
`REGS
`
`115
`
`IOI
`
`103
`
`REG
`FILE
`
`PRNATE
`REGS
`
`108
`
`TO
`EXTNL
`MEMORY
`
`I 1111111111111111 11111 111111111111111 111111111111111 lllll 111111111111111111
`US005737631A
`[11) Patent Number:
`[45) Date of Patent:
`
`5,737,631
`*Apr. 7, 1998
`
`projects/transit/tnlO0/tnlOO.html. Jan. 29, 1994.
`French, P. et al. "A Self-Reconfiguring Processor", Proceed(cid:173)
`ings from 1993 Workshop on FPGAs for Custom Computing
`Machines, IEEE. pp. 50-59. 1993.
`
`(List continued on next page.)
`
`Primary Examiner-Tod R. Swann
`Assistant Examiner-Conley B. King, Jr.
`Attome)I Agent, or Finn-Mark A. Haynes; Adam H.
`Tachner: Jeanette S. Harms
`
`[57]
`
`ABSTRACT
`
`A microprocessor comprises a defined execution unit
`coupled to internal buses of the processor for execution of a
`predefined set of instructions. combined with a program(cid:173)
`mable execution unit coupled to the internal buses for
`execution of a programmed instruction providing an on chip
`reprogrammable instruction set accelerator RISA. The pro(cid:173)
`grammable execution unit may be made using a field pro(cid:173)
`grammable gate array having a configuration store. and
`resources for accessing the configuration store to program
`the programmable execution unit. An instruction register is
`included in the data processor which holds a current instruc(cid:173)
`tion for execution. and is coupled to an instruction data path
`to supply the instruction to the defined instruction unit and
`to the programmable instruction unit in parallel. through
`appropriate decoding resources. A condition code register is
`coupled to instruction fetching resources, and connected to
`receive condition codes from both the defined execution unit
`and from the programmable execution unit. The program(cid:173)
`mable execution unit includes logic to signal the instruction
`fetching resources to provide a next instruction when execu(cid:173)
`tion of the programmed instruction is done. Resources for
`accessing the configuration store to program the program(cid:173)
`mable execution unit are provided. which can utilize the
`internal buses of the data processor or be completely inde(cid:173)
`pendent of them.
`
`40 Claims, 3 Drawing Sheets
`
`100
`i
`
`A
`
`t:Wt y
`Bl cc
`
`,...,
`151
`120
`
`A RISA
`FPGA y
`B
`I CC PD CD
`
`123
`
`122
`
`102
`
`110
`
`121
`
`114
`
`125 'j
`
`126
`
`TOEXTNL
`MEMORY
`
`Intel Exhibit 1037 - 1
`
`
`
`5,737,631
`Page 2
`
`Iseli. C. et al. "Spyder: A Reconfigurable VLIW Processor
`using FPGAs". Proceedings from 1993 Workshop on
`FPGAs for Custom Computing Machines. IEEE. pp. 17-24.
`1993.
`Casselman. S. "Virtual Computing and The Virtual Com(cid:173)
`puter". Proceedings from 1993 Workshop on FPGAsfro
`Custom Computing Machines. pp. 43-48. 1993.
`Trim.berger. S. "A Reprogrammable Gate Array and Appli(cid:173)
`cations". Proceedings of the IEEE. pp. 1030-1041. Jul.
`1993.
`Hennessy J. et al. Computer Architecture: A Quantitative
`Approach. Chapter 5. Appendix E. 1990.
`Box. B .. "Field Programmable Gate Array Based Reconfig(cid:173)
`urable Preprocessor". Apr. 10. 1994: IEEE. pp. 40-48.
`DeHon. A.. "DPGA-Coupled Microprocessors: Commodity
`ICs for the Early 21st Century". Apr. 10. 1994. IEEE. pp.
`31-39.
`
`Thorson. M .. "General-Purpose Coprocessors". E-Mail
`Transcript. Jul. 3. 1992. 5 pages.
`
`Razdan. R.. "PRISC: Programmable Reduced Instruction
`Set Computers". Doctor of Philosophy Thesis. May 1994.
`116 pages.
`
`Razdan, R.; Brace. K; and Smith. M.; "PRISC Software
`Acceleration Techniques". IEEE. May 1994. pp. 145-149.
`
`Trimberger. S .• "Field-Programmable Gate Array Technol(cid:173)
`ogy". Design Applications. Section 2.6. pp. 68-90. Copy(cid:173)
`right 1994.
`
`Wirthlin. M.. Hutchings. Brad. Gilson. K.. 'The Nano
`Processor: a Low Resource Reconfigurable Processor". Apr.
`10. 1994. IEEE. pp. 23-30.
`
`Intel Exhibit 1037 - 2
`
`
`
`U.S. Patent
`
`Apr. 7, 1998
`
`Sheet 1 of 3
`
`5,737,631
`
`12
`
`INSTR
`\1EM ORY .,.__-----.
`
`11
`
`13
`
`DATA
`\1EMORY
`
`15
`
`DISK
`
`25
`
`22
`
`33
`FIXED
`REG
`FILE E-UNIT
`
`TOCONFIG
`RESOURCES
`
`35
`
`30
`
`PRGM CONRG
`E-UNIT
`STORE
`
`I
`
`31
`
`IR 1 -4 - - - - l -~ - - '
`
`26
`
`28
`
`20
`
`MP
`
`RISA
`
`21
`
`DATA
`PORTS
`
`37
`
`(TO VIDEO, AUDIO.
`ISOLATED MEMORY.
`ETC.)
`
`14
`
`36
`
`USER
`INTERFACE
`
`16
`
`OTHER
`PROCESSING
`RESOURCES
`
`FIG.1
`
`Intel Exhibit 1037 - 3
`
`
`
`e • 00 •
`
`105
`
`EXTNL
`DATA/
`INSTR/
`ADDR
`PORTS
`
`-------1IAR
`115
`
`TO
`EXTNL
`MEMORY
`
`101
`
`103
`
`141
`
`104
`
`1/0
`106 REGS
`
`REG
`FILE
`
`INCR
`
`PRIVATE
`REGS
`
`140
`
`109
`
`108
`
`l00
`
`A
`EXEC y
`UNIT
`BI CC
`
`150
`,....,,
`
`131
`
`151
`,....,,
`120
`
`FPGA
`WORK
`REGS
`
`A RJSA
`FPGA Y
`
`B
`I CC PD CD
`
`130
`
`102
`
`123
`
`122
`
`EXTNL
`CONFIG
`AND
`DATAI/O
`PORT
`
`cc l e - - - - - - t -~
`121
`110
`t+----~REGl.-----~------'---r-
`125
`
`TOEXTNL
`MEMORY
`
`126
`
`114
`DECODER I-----'-----___,_ ___ ~
`INST
`'----•• REG 1 - - - - -+ i
`OPCODE
`112
`
`111
`
`FIG.2
`
`Intel Exhibit 1037 - 4
`
`
`
`U.S. Patent
`
`Apr. 7, 1998
`
`Sheet 3 of 3
`
`5,737,631
`
`DIP OPCODE
`
`A
`
`B
`
`y
`
`IMMED
`
`200
`
`D
`
`ADD
`
`R3
`
`R4
`
`R5
`
`xxxx
`
`2 2
`
`203
`
`204
`
`2 5 2 6
`
`207
`
`20l
`
`p FFGA~ Rx
`
`Ry Rz
`
`2 8
`
`209
`
`2IO
`2 I
`212
`FIG. 3
`
`xxxx
`
`213
`
`OPCODE
`
`A
`
`B
`
`y
`
`IMMED
`
`250
`
`ADD
`
`R3
`
`R4
`
`R5
`
`xxxx
`
`2 1
`
`252
`
`253
`
`254
`
`255
`
`256--
`
`FPGAOP
`
`Rx
`
`Ry
`
`PGMD
`Rz OPCODE
`
`2 7
`
`258
`
`2 9
`
`2 0
`
`261
`
`FIG.4
`
`FIXED
`OPCODE
`
`A
`
`B
`
`y
`
`PGM
`OPCODE
`
`C
`
`IMMED
`
`AND
`
`210
`
`R3
`
`27ll
`
`R4 I RS I FffiA~ IR22 I
`t
`l
`21s
`273
`274
`
`27~
`
`xxxx
`
`27~
`
`FIG. 5
`
`Intel Exhibit 1037 - 5
`
`
`
`1
`REPROGRAMMABLE INSTRUCTION SET
`ACCELERATOR
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`The present invention relates to techniques to improve the
`speed of microprocessors using reprogrammable hardware;
`and more particularly to the use of reprogrammable execu(cid:173)
`tion units in parallel with predefined execution units in a data
`processing system.
`2. Description of Related Art
`General purpose computers based on current micropro(cid:173)
`cessors have a single predefined instruction set. The instruc(cid:173)
`tion set is devised to improve the speed of a large number of 15
`typical applications. given a limited amount of logic with
`which to implement the instructions. General purpose pro(cid:173)
`cessors include so called complex instruction set computers
`(CISC) which have sophisticated instruction sets designed
`for commercial settings. and reduced instruction set com(cid:173)
`puters (RISC). where the basic instruction set is kept to a
`minimum that gives good performance over a broad range of
`applications.
`In the interest of good overall performance for a given
`amount of logic. general purpose processors using both
`CISC and RISC approaches leave off instructions that may
`be beneficial for some problems. Therefore. the left off
`instructions must be replaced with a sequence of the pre(cid:173)
`defined instructions. so that these special problems take
`longer to solve. Some types of instructions that are com(cid:173)
`monly left off general purpose processors include floating
`point arithmetic. graphics manipulations. and bit field
`extraction used in encryption. data compression and image
`processing.
`If the need for a given instruction is great enough in some
`applications. some users may benefit from a special purpose
`instruction set accelerator. The instruction set accelerator
`intercepts the instructions and interprets them in place of the
`general purpose processor. This has been done with floating
`point arithmetic (for example. the Intel 387 class of floating
`point processors) and for graphics operations. This solution
`is only cost effective if many or most computer users need
`the additional speed for those operations so that the cost of
`developing the special purpose hardware accelerator for a
`specific instruction is shared. Although most computer users
`can use fast graphics, very few need, for example. fast
`encryption. Thus. special purpose instruction set accelera(cid:173)
`tors may not be developed for the encryption algorithm,
`even though great improvements in performance could be
`achieved.
`Special purpose processors have been built to provide
`very high speed solutions to compute intensive problems,
`such as encryption and image processing. These processors
`replace not a single instruction. but whole programs.
`Because only a few people need these special processors,
`they are expensive but provide a huge improvement in
`performance. Instructions to a special purpose processor are
`typically sent by commands from the host general purpose
`processor. however. not as instructions that the special 60
`purpose processor intercepts. Thus, special interface soft(cid:173)
`ware is required to access the special purpose processor.
`Computer users are also faced with the prospect of
`numerous instruction set accelerators and special purpose
`processors in their computers. one for each special applica- 65
`tion they may do. This adds to size. weight and expense of
`computers. Further. even the commonly used special opera-
`
`2
`tions are not needed always. so most of the hardware
`accelerators will be unused at any one time.
`An alternative technique for reconfiguring a general pur(cid:173)
`pose processor involves a use of writable microstores. One
`5 method of building a general purpose processor implements
`instructions by emulating them with microcode. Microcode
`comprises instructions that control flow through a set of
`functional units in the microprocessor. Each instruction on
`the general purpose processor is emulated by several micro-
`10 instructions. The general purpose processor has a microcon(cid:173)
`troller that reads the microcode from the microstore. and
`uses the instruction value to determine where in the micros(cid:173)
`tore to execute and what to do to perform the logic for the
`instruction.
`Typically. a manufacturer stores microcode in read only
`memory. However. some microprogrammed computers have
`been built with a writable microstore. In these machines. a
`user can write a program that emulates a new instruction set.
`However. these systems require an embedded
`20 microcontroller. and deal with a fixed set of functional units.
`Also. the microinstruction fetching technique divides each
`instruction into a number of instructions. rather than replac(cid:173)
`ing a slow instruction with a fast one. Thus. these systems
`have limited use for improving performance of systems that
`25 need special purpose instructions.
`One prior art approach to improving performance of
`general purpose processors involves the use of field pro(cid:173)
`grammable gate array (FPGA) logic configured as a
`co-processor attached to the same bus as the host processor.
`See for example U.S. Pat. No. 5.361373 to Gilson. This
`approach involves capturing entire sub-routines. detecting
`when the host CPU enters the captured sub-routine. and then
`taking over execution of the programmed function in the
`35 FPGA hardware. When the function completes. the FPGA
`returns control to the host CPU itself. However. this
`approach requires complex coordination with the host CPU.
`including maintaining CPU state and the like while the field
`programmable co-processor executes the sub-routine. The
`40 cost of overhead of the process. such as maintaining and
`restoring CPU state via dump and un-dump operations on
`the CPU. limits application of the field programmable
`co-processor to relatively complex sub-routines.
`Such reprogrammable hardware accelerators. like dedi-
`45 cated special purpose processors before them, are targeted at
`huge speed improvements in large scale operations.
`Therefore. their applicability is limited. They tend to be
`large and complicated, and only help with a limited number
`of special problems. Further. interfacing to such devices is
`50 non-standard. because they do not interpret instructions on
`the microprocessor bus.
`Accordingly, it is desirable to provide a technique for
`using reprogrammable logic to accelerate special instruc(cid:173)
`tions for a general purpose processor which is practical to
`55 implement. and provides a significant performance improve(cid:173)
`ment over prior art systems. This will provide the ability to
`a user to reprogram the host processor such that it includes
`an instruction set based on defined and programmed instruc(cid:173)
`tions optimized for the users particular applications.
`
`30
`
`SUMMARY OF THE INVENTION
`_ The present invention provides a technique providing a
`reprogrammable instruction set accelerator (RISA). The
`RISA can be programmed with reprogrammable logic to do
`small scale data manipulation and computation, just like the
`instruction set accelerators currently in use. Furthermore, the
`RISA can be tightly coupled to instruction and data paths in
`
`Intel Exhibit 1037 - 6
`
`
`
`5,737,631
`
`3
`parallel with the predefined execution units in microproces(cid:173)
`sors. This provides fast and efficient execution of new
`instructions and a significant improvement in performance.
`The RISA provides the capability for users to program
`instructions that may be difficult to do in the general purpose 5
`processor. and not in wide enough use to warrant a hardware
`accelerator. Further, the reprogrammable instruction set
`accelerator can be reprogrammed with different instructions
`at different times. saving space and cost in the computer.
`Some examples of the kinds of operations that may be 10
`implemented using the reprogrammable instruction set
`accelerator include the following:
`bit rotation/field extraction. used in instruction emulation
`or encryption or decryption;
`on - bit counting:
`polynomial evaluation:
`spreadsheet resolution. such as an instruction that calcu-
`lates each cell in the spreadsheet to be resolved;
`searching in a document;
`spell-checking;
`database access routines;
`procedure invoke/return operations:
`programming language interpreters:
`emulation of another processor; and
`context switching for multi-processing.
`The instruction set accelerator may be reprogrammed for
`each program that runs on the computer. or a manufacturer
`may select a few instructions and ship them stored with the
`computer. A program may reprogram the reprogrammable
`instruction set accelerator several times during the program
`to speed up different parts of the program. The logic space
`in the reprogrammable instruction set accelerator may be
`allocated by the computer system. and instruction sets
`swapped. using operations similar to overlays. virtual
`memory or caching.
`Accordingly. the present invention can be characterized as
`a data processor which comprises a defined execution unit
`coupled to internal buses of the processor for execution of a
`predefined set of instructions. combined with a program(cid:173)
`mable execution unit coupled to the internal buses for
`execution of a programmed instruction. The programmable
`execution unit may comprise a field programmable gate
`array having a configuration store. and resources for access(cid:173)
`ing the configuration store to program the programmable
`execution unit.
`In one aspect of the invention. an instruction register is
`included in the data processor which holds a current instruc(cid:173)
`tion for execution. and is coupled to an instruction data path
`to supply the instruction to the defined execution unit and to
`the programmable instruction unit in parallel. through
`appropriate decoding resources.
`The processor may include instruction fetching resources
`and other logic which are responsive to condition codes. A
`condition code register is connected to receive condition
`codes from both the defined execution unit and from the
`programmable execution unit.
`In addition. because the programmable execution unit
`may be reprogrammed after manufacture. the timing for
`execution of a programmed instruction may not be well
`predictable. Thus. the programmable execution unit includes
`logic to signal the instruction fetching resources to provide
`a next instruction when execution of the programmed
`instruction is done.
`The programmable execution unit may comprise a con(cid:173)
`figuration store, and resources for accessing the configura-
`
`4
`tion store to program the programmable execution unit are
`provided. which according to one alternative utilize the
`internal buses of the data processor. Thus. the programmable
`execution unit may be reconfigured under control of the
`defined execution unit. Alternatively. the programmable
`execution unit may include a configuration port which is
`independent of the internal buses of the data processing
`system. which allows access to the configuration store for
`reprogramming the programmable execution unit through a
`separate port.
`The instructions according to one aspect of the invention
`will have a pre-specified format. including an opcode field
`specifying an operation by one of the defined and program(cid:173)
`mable execution units, and plurality of address fields speci-
`15 fying addresses of operand data and result data. The pre(cid:173)
`specified format according to one alternative may include a
`defined/programmed flag. specifying a defined or pro(cid:173)
`grammed instruction. Thus. the decoder will be responsive
`to the flag to enable or disable the programmable execution
`20 unit for the purposes of access to the internal buses and
`register files on the device. According to another alternative.
`the pre-specified format may include an immediate data
`field. such that programmed instructions use the opcode field
`to identify the instruction as a programmed instruction. and
`25 the immediate data field to identify a programmed operation.
`A third alternative instruction format includes both an
`opcode for the defined execution unit and an opcode for the
`programmable execution unit.
`Accordingly. the present invention provides a new
`30 method for executing a computer program which includes a
`particular function. The method includes providing a defined
`instruction execution unit and programmable instruction
`execution unit in parallel with the defined instruction execu(cid:173)
`tion unit. The programmable instruction execution unit is
`35 programmed to execute at least a portion of the particular
`function in response to a programmed instruction. A
`sequence of instructions is supplied including the defined
`instructions and the programmed instruction. The defined
`instructions are executed in the sequence in the defined
`40 instruction execution unit and the programmed instruction is
`executed in the programmable instruction execution unit.
`The programmable instruction execution unit can be repro(cid:173)
`grammed when the user changes from one application to the
`next using a configuration port for the programmable
`45 instruction execution unit.
`Thus. the reprogrammable instruction set accelerator may
`be programmed through internal processor data paths or
`through separate, dedicated programing paths initiated by an
`instruction from the general purpose. defined execution unit
`50 Selecting the instructions to emulate can be done manually.
`by inspecting the instructions or procedures to be executed.
`and crafting new instructions to implement chosen function(cid:173)
`alities. A compiler may be used to automate the addition of
`programmed instructions to programs it compiles to improve
`55 their speed.
`Instruction selection may also be done automatically. by
`profiling the program to see how frequently various proce(cid:173)
`dures or lines of code are used. then replacing them with
`programmed instructions. This requires software to compile
`60 from the programing language to logic gates. This capability
`now exists with high level synthesis from VHDL. VER(cid:173)
`ILOG or other languages for specifying logic. Given such
`capability. it is simple to visualize a high level synthesis
`system that takes a high level programming language such
`65 as "C" as input and generates logic for the RISA.
`Extraction of instructions may also be done on the fly. by
`profiling analysis of the instructions to the general purpose
`
`Intel Exhibit 1037 - 7
`
`
`
`5,737,631
`
`5
`processor. then compiling from those instructions to logic in
`the reprogrammable instruction set accelerator. This profil(cid:173)
`ing may be done before hand. or may be done during
`execution. In the latter case. the computer learns which
`instructions are frequently executed and optimizes them as
`it runs.
`Alternatively. many instructions for the RISA can be
`defined in advance and the compiler may use all or a subset
`of these additional instructions when compiling. The needed
`instructions are loaded into the RISA when the program 10
`runs.
`Accordingly. the present invention provides a technique
`for improving the performance and flexibility of general
`purpose processors based on the use of reprogrammable
`logic techniques. The invention provides greater perfor- 15
`mance improvements and more flexibility than prior art
`attempts to optimize instruction execution in general pur(cid:173)
`pose processors.
`Other aspects and advantages the present invention can be
`seen upon review of the drawings. the detailed description
`and the claims which follow.
`
`BRJEF DESCRIPTION OF THE FIGURES
`
`FIG. 1 is a simplified block diagram of data processing
`system utilizing the reprogrammable instruction set accel(cid:173)
`erator (RISA) according to the present invention.
`FIG. 2 is a schematic diagram of a integrated circuit
`microprocessor including a defined execution unit and a
`reprogrammable execution unit according to the present
`invention.
`FIG. 3 illustrates one example instruction format for use
`according to the present invention.
`FIG. 4 illustrates an alternative example instruction for(cid:173)
`mat for use according to the present invention.
`FIG. 5 illustrates another alternative example instruction
`format for use according to the present invention.
`
`DETAILED DESCRIPTION OF THE DRAWINGS
`
`6
`which includes a configuration store 31. The field program(cid:173)
`mable gate array 30 is coupled to the internal buses 22 and
`23. and supplies a result through multiplexer 2S to bus 22.
`The instruction path through instruction register 26 supplies
`5 an instruction to the field programmable gate array 30 in
`substantially the same manner and timing as it supplies
`instructions to the defined execution unit 24. The field
`programmable gate array 30 also supplies conditions codes
`through multiplexer 27 to the condition code register 28.
`A configuration store 31 is coupled with the field pro(cid:173)
`grammable gate array 30. The configuration store 31 may
`accessible through a dedicated port generally 35. or by
`means of the internal buses 22 and 23 for dynamically
`reprogramming in a field programmable gate array 30.
`In the embodiment illustrated in FIG. 1. the RISA 21 is
`implemented using field programmable gate array logic. The
`field programmable gate array logic may take a variety of
`forms. such as the dynamically reconfigurable architecture
`described in our co-pending U.S. patent entitled A PRO-
`20 GRAMMABLE LOGIC DEVICE WHICH STORES
`MORE THAN ONE CONFIGURATION AND MEANS
`FOR SWITCHING CONFIGURATIONS. U.S. Pat. No.
`5.426378. issued Jun. 20. 1995. invented by Randy T. Ong.
`which was owned at the time of invention and is currently
`25 owned by the same assignee as the present application. and
`which is incotporated by reference as if fully set forth
`herein. Alternative programmable logic structures may be
`utilized. For instance. the RAM based configuration store of
`typical FPGA designs may be replaced using reprogram-
`30 mable non-volatile stores such as EEPROM. Also. the
`configuration store may be programmable during manufac(cid:173)
`ture rather than dynamically. Thus. the manufacturer may
`use more permanent programming technology such as.
`anti-fuses or the like to configure a new instruction into a
`35 previously defined instruction set.
`As shown in FIG. 1. the field programmable gate array is
`used as an execution unit which is reprogrammable. and
`connected in parallel with the defined execution unit 24. The
`system expects the field programmable gate array 30 to
`40 return results in a same manner as the standard defined
`instruction execution unit. The field programmable gate
`array 30 in the embodiment shown uses the same write back
`path as the defined execution unit. However. a separate write
`back path may be used for the FPGA 30 if desired.
`The RISA 21 includes an optional data port or ports 36.
`37 dedicated for use by the RISA 30. The data port 36 is
`coupled to the system memory 12. 13 across bus 11. The
`data port 37 is coupled to an external data source. such as a
`50 video data source. an audio data source or memory isolated
`from the system bus 11.
`The field programmable gate array 30. executes instruc(cid:173)
`tions that take an amount of time which is not predictable
`prior to configuration. and do not match well with the
`55 pipeline speed of the fixed execution unit 24. Thus. logic is
`included to hold the processor during execution of a pro(cid:173)
`grammed instruction. and to signal the processor when the
`programmed instruction is complete.
`In addition. one embodiment of the RISA 21 operates in
`60 an overlapped fashion with the defined instruction execution
`unit 24 for some operations. taking advantage of the parallel
`execution units in the system for greater performance.
`FIG. 2 provides a more detailed block diagram of an
`integrated circuit microprocessor which includes the RISA
`according to the present invention. Those skilled in the art
`will recognize the basic components of the microprocessor.
`Thus the figure is intended to represent wide variety of
`
`45
`
`A detailed description of the preferred embodiments of
`the present invention is provided with reference to the
`figures. in which FIG. 1 illustrates data processing system
`using the reprogrammable instruction set accelerator accord(cid:173)
`ing to the present invention. As shown in the figure. the data
`processing system includes a host CPU generally 10 coupled
`to the system bus 11. Also coupled to the buses are instruc(cid:173)
`tion memory 12. data memory 13. user interface 14. possibly
`a disk drive 1S. and other processing resources generally 16.
`The host CPU is made up of the basic microprocessor (MP)
`components generally 20. and a reprogrammable instruction
`set accelerator (RISA) generally 21. The basic microproces(cid:173)
`sor includes internal buses 22 and 23. also called buses A and
`B. respectively. A defined instruction execution unit 24 (or
`ALU) is coupled to buses 22 and 23. and supplies a result
`through a multiplexer 2S to bus 22. A register file 33 is
`accessible across the buses 22 and 23. An instruction path
`schematically represented by the instruction register IR 26
`supplies an instruction to the defined instruction execution
`unit 24. Also. the defined instruction execution unit 24
`generates condition codes which are supplied through mul(cid:173)
`tiplexer 27 to a condition code register 28 which is used by
`the processor as well known in the art in instruction sequenc(cid:173)
`ing and the like.
`In parallel with the general purpose microprocessor is the 65
`reprogrammable instruction set accelerator RISA 21. The
`RISA 21 comprises a field programmable gate array 30.
`
`Intel Exhibit 1037 - 8
`
`
`
`5,737,631
`
`7
`microprocessor architectures. including complex instruction
`set processors. reduced instruction set processors, and other
`data processing architectures.
`Thus. the system includes a execution unit 100 which is
`optimized for a predefined instruction set. The execution 5
`unit 100 is coupled to internal buses 101 and 102 for
`receiving operand data at its ports A and B. and supplying
`result data on its port Y back to one of the internal buses. e.g.
`bus 101. A register file 103. and input/output registers 104
`are also coupled to the buses 101 and 102. and act as sources 10
`for operands. and locations to store result data. The registers
`accessible by the executing unit 100 include private registers
`140 which are dedicated for use by the defined execution
`unit 100. and not by the programmable execution unit
`(FPGA RISA 120 described below). In an optional 15
`embodiment. the private registers 140 are directly coupled
`with the defined execution unit 100 as indicated by line 141.
`In another embodiment. internal bus 101 is broken into
`two sections at point 150. thereby allowing independent
`operation of execution unit 100 and RISA 120. In yet 20
`another embodiment. internal bus 101 is broken into two
`sections at point 151. thereby allowing independent outputs
`with shared inputs.
`Also. coupled to the internal buses 101 and 102 is an
`instruction address register 105. and other resources in the 25
`instruction address fetching path. The instruction address
`register 105 includes basic incrementing logic 106 for
`sequencing through a sequence of instruction addresses for
`an instruction memory off chip. Also. an instruction control
`state machine 107 is coupled to the instruction address 30
`register 105 for managing the instruction stream as known in
`the art. The instruction control state machine 107 is also
`coupled to other components on the chip as appropriate.
`A condition code register 108 supplies condition codes to 35
`the instruction control state machine 107 involved in the
`instruction sequencing decisions. and also as indicated by
`arrow 109 to other processing resources in the system that
`suits the particular implementation. Condition codes are
`generated on line 110 by the defined execution unit 100 as 40
`known in the art.
`An instruction register 111 receives instructions generated
`in response to instruction address register 105. Instructions
`are supplied from the instruction register 111 to decoder
`resources 112. The decoder resources 112 supply control
`signals. generally represented by arrow 113. throughout the
`device to control accessing of the register files. bus timing.
`and other functions on the processor. The decoder also
`generates an opcode on line 114 which is supplied at the
`instruction input I on the defined execution unit 100.
`External data. instruction. and address ports 115 are
`included as known in the art. for managing flow of data.
`instructions and addresses into and out of the chip. The
`external data/instruction/address ports 115 are coupled to the
`1/0 registers 104. the instruction address register 105. and 55
`the instruction register 111.
`According to the present invention. a reprogrammable
`instruction set accelerator is included on the chip. Thus. a
`RISA field programmable gate array (RISA FPGA) 120 is
`coupled to the internal buses 101 and 102. to receive 60
`operand data at ports A and B and supply result data at port
`Y to and from the buses. The opcode from line 114 is
`supplied at an ins