`Taylor
`
`[54] IMPLEMENTATION OFA SELECTED
`INSTRUCTION SET CPU 110
`PROGRAMMABLE HARDWARE
`
`_
`[75] Inventor: Brad Taylon Oakland’ Cahf'
`.
`.
`,
`,
`73 Ass1 nee: G a
`rations C0 oration.
`g
`Bgk?ggeca?f‘
`[
`1
`l‘P
`/
`
`[21] AP p 1' NO': 449563
`[22] Filed:
`May 24, 1995
`
`Related U.S. Application Data
`
`abandoned.
`27,
`NO.
`Continuation
`[51] Int. Cl.6 ........................... .. G06F 9/00; G06F 15/177
`[52] U.S. c1. ................. .. 395/500; 364/2294; 364/2328;
`364/240, 364/2813, 364/2621? 3364/1316. 1
`58 F. M f S rch ’
`’
`3’95/375 800
`[
`]
`1e 0 ea
`................................... ..
`393/506
`
`[56]
`
`.
`References cued
`U.S. PATENT DOCUMENTS
`364/716
`4 037 094 7/19,” vmdjemdonck
`395/375
`4:250:545
`2/1981 Blahut et a1.
`395/375
`4,422,141 1211983 Shoji ............. ..
`5,042,004
`8/1991 Agrawa et a1. ....................... .. 395/275
`
`US005652875A
`Patent Number:
`[11]
`[45] Date of Patent:
`
`5,652,875
`Jul. 29, 1997
`
`5,175,824 12/1992 Soderberg et a1. ................... .. 395/325
`5,208,490 5/1993 Yetter .................................... .. 307/452
`
`Primary Examiner-Kevin J. Teska
`Assistant Examiner—Ayni Mohamed
`Al'tomey, Agent, or Firm—Adam H. Tachner; Crosby,
`Heafey, Roach & May
`[57]
`ABSTRACT
`A method of designing a CPU for implementation in a
`con?gurable hardware device by identifying a series of
`operations in a logic scheme which are suitable for imple
`mentation in the device, identifying an executable function
`and any needed parameters in the logic scheme. identifying
`thg logic ?ow in [he schenleq providing for at least two
`mmlcFted systemmsomes ‘°_i1PP1°m'=nt “21°89 “11mm,
`“15mg 3“ 01’ wk“ and Pmvldmg ‘‘ Way ‘0 “"Plemcm Fhc
`Various components needed to call and execute the function
`according to the logic scheme. A useful op code may invoke
`a sysmm resource. implement the logic scheme. pass a
`parameter to the function, or invoke the function. The
`con?gurable hardware system can function as a CPU, using
`logic resources including a next address RAM, one or more
`registers, a function execution controller, and one or more
`busses for passing signals and data between the components
`and funcuons'
`
`8 Claims, 1 Drawing Sheet
`
`CALL
`
`ll
`
`3 CALL_INST E
`
`NEXT INST
`
`10
`S
`INST ;
`MUX
`
`0000 BIT
`l r
`
`,4
`'
`
`20
`
`70
`INST__ADDR }
`40
`f
`
`30
`f
`
`FUN CALLS
`
`RET
`<
`
`hag,
`L
`
`\1
`
`12 f
`
`BAD, RWR, RRD
`w
`L
`‘f
`|
`
`IWAIT —G' W
`
`v
`I REG ADDR RD
`
`D REGS
`
`0
`,460
`
`21;
`
`I
`\
`
`I
`[/0 BUS
`j N
`I
`
`FUNS
`
`L
`R
`
`l
`
`‘
`o D R0
`
`l
`
`____,____...-—-> :
`
`FUN 1
`
`2
`3
`Z
`
`FUN RETURN
`
`/‘\
`
`H)
`
`31
`
`1
`
`EX 1023
`IPR of Pat. No. 6,892,304
`
`
`
`U.S. Patent
`
`Jul. 29, 1997
`
`5,652,875
`
`5dub
`
`~_8<:5z_
`
`<
`
`<
`
`:<3.92
`
`57:m
`
`5233
`
`:5
`
`2
`
`
`
`
`
`5,652,875
`
`1
`EVIPLEMENTATION OF A SELECTED
`INSTRUCTION SET CPU IN
`PROGRAMIVIABLE HARDWARE
`
`This is a continuation of application Ser. No. 08/127,859
`?led on Sep. 27, 1993 now abandoned, entitled IMPLE
`MENTATION OF A SELECTED INSTRUCTION SET
`CPU IN PROGRAMMABLE HARDWARE.
`
`FIELD OF THE INVENTION
`
`This invention relates to a method of implementing a
`selected instruction set central processing unit in a program
`mable hardware device and a system with such a CPU.
`
`BACKGROUND OF THE INVENTION
`
`Most computational or process control tasks can be sub
`divided into a series of relatively simple steps or decisions.
`An engineer can analyze a task and design a logic scheme
`which can be implementedin a custom-designed circuit such
`as an ASIC or in a general purpose CPU such as an 8086 or
`Z-80. In a traditional implementation. a general purpose
`CPU can perform a wide variety of functions but must
`operate in a synchronous manner, processing only one step
`at a time. An ASIC can perform a variety of functions
`simultaneously and to a large extent asynchronously, but an
`ASIC can only implement the scheme for which it was
`designed. Thus an ASIC provides high speed but only for a
`speci?c functionality and a general purpose CPU provides
`great ?exibility but at limited speed.
`A central processing unit or “CPU”, as used in this
`disclosure, is taken to mean a von Neumann machine. A
`minimum von Neumann machine can read input, write
`output which is dependent on the input. and includes both
`invert and add functions. These principle components and
`functions provide the basis for some very sophisticated
`devices but each is still a CPU.
`A logic scheme can also be implemented in a program
`mable logic device (PLD) such as a ?eld programmable gate
`array (FPGA). PLDs are available from several
`manufacturers, including Xilinx, AT&T, Altera, Atmel and
`others. In general, a PLD can operate much faster than a
`general purpose CPU and can handle asynchronous pro
`cesses. A PLD is rarely as fast as an ASIC but does allow
`changes to the logic scheme.
`The ?eld of con?gurable hardware is well lmown and has
`been the subject of intense engineering development for the
`last few years. See in general the background section of
`co-pending. commonly-assigned US. patent application Ser.
`No. 07/972,933. ?led Nov. 5, 1992, and now abandoned.
`entitled “SYSTEM FOR COMPILING ALGORITHMIC
`LANGUAGE SOURCE CODE INTO HARDWARE,”
`which is incorporated in full herein by reference.
`Previous implementations of PLDs. however, generally
`have been used only for a speci?c logic scheme which is
`changed only if the logic needs to be redesigned. Up until
`now, PLDs have been used to implement logical functions
`without using op codes since providing logical resources to
`interpret and execute op codes takes up precious resources.
`A typical PLD application is as a monitor controller on a
`video board. As new monitors are released on the market or
`as the board designer develops an improved algorithm, the
`vendor may release revised con?guration software for the
`PLD. This software can be distributed through traditional
`channels such as downloading from a bulletin board or
`asking the user to visit a distributor to have the new
`
`20
`
`25
`
`35
`
`45
`
`55
`
`65
`
`2
`con?guration loaded. Such revisions are relatively
`infrequent, possibly months or years apart.
`An alternative way to implement logic functions is to load
`functions in hardware only as needed. This is particularly
`useful when certain functions are needed only rarely and
`only a limited amount of hardware resources are available.
`Thus the engineer would prefer to use the available
`resources for the most frequently used functions. Using
`techniques well known in cacheing schemes. the engineer
`can provide a variety of functions. each in a form ready to
`load as a con?guration ?le, and load or unload functions as
`needed
`Until this time, engineers have only begun to consider the
`possibility of “cacheing” functions as hardware con?gura
`tions. The present method provides a way to implement a
`wide variety of functions in programmable hardware
`devices.
`
`SUMIMARY OF THE INVENTION
`
`This invention provides a method and system for design
`ing and implementing a selected instruction set (SISC) CPU
`in programmable hardware. The method involves identify
`ing a series of operations in a logic scheme which are
`suitable for implementation in the device. identifying an
`executable function and any needed parameters in the logic
`scheme, identifying the logic ?ow in the scheme, providing
`for at least two connected system resources to implement the
`logic scheme, selecting an op code. and providing a way to
`implement the various components needed to call and
`execute the function according to the logic scheme. A useful
`op code may invoke a system resource, implement the logic
`scheme, pass a parameter to the function, or invoke the
`function. The con?gurable hardware system can function as
`a CPU. using logic resources including a next address RAM,
`one or more registers, a function execution controller, and
`one or more busses for passing signals and data between the
`components and functions.
`This CPU can initiate additional functions which may be
`quite complex. In general. the architecture can be used to
`implement a variety of related functions and provide a
`compact, fast and economical system.
`One object of the invention is provide a method of
`analyzing a logic scheme and implementing it in con?g
`urable hardware as a CPU which processes op codes to
`initiate and coordinate any necessary functions.
`Another object of the invention is to provide a speci?c
`instruction set CPU.
`Yet another object of the invention is to provide a system
`to process multiple programs in the speci?c instruction set
`CPU.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 illustrates a preferred con?guration of a down
`loadable CPU.
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENTS
`Many computational or process control tasks fall into one
`of two classes—inherently sequential or inherently parallel.
`A sequential task. such as controlling a DRAM, follows a
`linear thread and executes only one step at a time.
`An inherently parallel task. such as inverting an image or
`searching certain databases. may involve manipulating a
`number (often a very large number) of data elements in more
`or less the same way. Such a process can often be imple
`
`3
`
`
`
`5 ,652,875
`
`3
`mented as a sequential process but since the results of
`manipulating one set of data elements and manipulating
`another set of data elements are independent, the manipu
`lations also can be done simultaneously. A parallel task is
`multithreaded and multiple threads can be launched simul
`taneously. A thread, in ‘turn, may launch additional threads.
`Referring to FIG. 1, a simple CPU can be implemented
`with instruction multiplexer 10, address RAM 20, function
`execution controller 30, and register address RAM 40, all
`connected by instruction address bus 70. Each component is
`discussed below in more detail.
`In a preferred implementation each of 20. 30, 40, and 60
`is a register, RAM or other memory portion. A program
`mable logic device such as a Xilinx 4003 can easily be
`con?gured to include one or more such memory compo
`nents. Since the CPU is designed to implement a speci?c
`logic scheme, each portion can be scaled as needed. For
`example, the available (or needed) address space for one
`implementation may be fully addressable using only 2 bits,
`in which case instruction address bus 70 and the output of
`address RAM 10 can be two bits wide but a di?’erent
`implementation might require a larger address, in which case
`address RAM output and the bus must be correspondingly
`larger.
`In a preferred implementation, address RAM 20 is used to
`hold the address of the next instruction to be executed. This
`provides a simple mechanism to execute a sequential pro
`gram and allows for complex branching on conditions. The
`size of address RAM 20 can be selected to accommodate the
`largest address needed in a speci?c implementation. One
`means of handling branch on conditional is to provide a
`primary instruction store but wherever a branch can occur,
`providing an alternate instruction store so that a branch on
`condition can branch by simply evaluating the condition and
`selecting one instruction store if the condition is true and the
`other instruction store if the condition is false.
`Function execution controller 30 can be con?gured with
`any number of bits, allocating one bit position for each of 11
`functions to be executed under control of the CPU. The exact
`number of functions needed for any speci?c implementation
`will, in general, vary and the size of function execution
`controller 30 may take an integer value corresponding to the
`number of needed functions.
`Separate circuits are provided for function 1 31, function
`2 (not shown), and so on through function u (not shown).
`Each function may be partially or, preferably, completely
`implemented in con?gurable logic of a PLD. Each function
`may be independently connected to registers 60 or a constant
`store (not shown) as needed. Alternatively, some or all
`functions and appropriate resources can be connected
`together through a bus such as I/O bus 80. Thus each
`function can access any needed parameters, e.g. from
`registers, or global constants, e.g. from a constant store.
`When appropriate, a function can be connected to an exter
`nal signal such as an external bus or an interrupt line. In a
`preferred implementation, for every function that must
`return a value or indicate completion, a “retum” signal is
`connected to multi-input OR gate 35.
`Register address RAM 40 is preferably a look-up table
`with pointers to actual registers 60. In general, any register
`can be accessed directly, thereby providing a store of local
`variables. By comparison, in the “C” software language. it
`is common to have a stack of local variables, which can be
`stored in registers or held in a stack. These registers provide
`a convenient means for passing parameters when invoking
`or returning from a function.
`
`4
`Instruction multiplexer 10 may be connected to external
`devices through lines 11. which may be a system bus, e.g. an
`ISA bus. Instruction multiplexer 10 as shown is also con
`nected to address RAM 20 through line 12. This allows
`ready implementation of branch commands such as:
`If (condition)
`Then execute next instruction
`Else skip to following instruction
`If evaluation of a function (not shown) requires a skip. this
`condition can be signalled on condition line 21 and for
`warded from address RAM 20 over line 12 to instruction
`multiplexer 10 which then discards the pending instruction
`and provides the next instruction in sequence.
`Using this general structure a wide variety of functions
`can be implemented. One particular advantage of this type
`of structure is that the general CPU sequencer structure can
`be used with a great many functions. This is particularly
`useful, for example, for controlling nested “for” or “while”
`loops. The speci?c functions called by the loop may vary
`widely, but the present structure provides a mechanism for
`starting. controlling, and returning from such functions.
`Another particular advantage is that it is relatively easy to
`design new op codes in software but relatively time con
`suming to design and con?gure the illustrated CPU. Since
`many functions are invoked with a single bit over a single
`line, it is easy to provide a large number of functions in
`con?gurable hardware and connect or disconnect them to
`correspond to a current set of op codes. Using
`downloadable, recon?gurable hardware such as a Xilinx
`4003, a new set of software op codes can be loaded in
`microseconds while it requires several milliseconds to com
`pletely recon?gure the CPU device.
`Thus, if a large program would bene?t from two or more
`CPUs of this general design but one implementation requires
`14 op codes and calls 7 functions and another implementa
`tion requires only 3 op codes but calls 12 functions, it is
`easier to design one CPU that can handle 14 op codes
`(probably designed for 16) and between 12 and 19 function
`call lines (depending on the speci?c nature of the functions
`and any possible overlap) and then load only the op codes
`needed to operate the CPU during a corresponding phase of
`the operation of the part.
`Afairly compact CPU can be designed using less than 16
`instructions. If additional instructions are needed, a larger
`CPU can be implemented using the principles disclosed
`herein. Alternatively, an appropriate portion of the con?gu
`ration can be stored and loaded as necessary.
`The system, and method of designing such a system, can
`be better understood by considering the following.
`The basic implementation of a CPU provides logical
`resom'ces. suitably connected, to be able to execute a series
`of instructions. A designer must select what instructions can
`be executed by the CPU and provide for execution of each
`instruction. In a preferred implementation, instructions
`include an op code and instructions are stored in memory
`which may be but need not be part of the CPU. In general,
`the actual function is implemented by additional circuitry,
`which is not shown in the ?gure.
`A very simple CPU may be implemented with even fewer
`components than the system shown in FIG. 1. The speci?c
`implementation is based on the speci?c functions that must
`be executed. A trivial function might be
`
`10
`
`15
`
`25
`
`35
`
`45
`
`55
`
`Register___1=x
`
`65
`
`or, slightly more complicated,
`
`x=fu(x).
`
`4
`
`
`
`5,652,875
`
`5
`In general. an instruction set can be constructed with an
`arbitrary number of bit ?elds. A variety of op codes can be
`implemented, but preferably center on program control
`functions such as low level reads and Writes, execution
`control. branch on conditional and such. Since multiple
`CPUs can be implemented and linked. each CPU can be
`optimized to execute a speci?c set of logic instructions and
`only the op codes needed to execute those instructions need
`be provided. In general. any computational functions are
`implemented separately and do not need an op code.
`Listing 1 includes actual C language code that was
`implemented in the CPU illustrated in FIG. 1. Reference to
`that listing may assist the reader in understanding the present
`invention. The listing includes two separate code sections, 1)
`a simple input/output “ping” and 2) a bit blt ?le CPU. For
`each CPU. the listing includes variable declarations, public
`functions, CPU functions, code for each function called by
`the CPU (“low level functions”) and code for the actual CPU
`(“host 0”) plus extensive comments describing the opcode
`and bit ?elds and a table showing a control program to be
`executed.
`In the “I/O Ping” implementation, op codes include:
`NAD Abit ?eld for the next instruction to execute (4 bits)
`RET Return from function (1 bit)
`WAIT Wait (suspend other execution) until called func
`tion returns (1 bit)
`RAD Register address (preferably a look up table entry,
`e.g. 2 bits)
`RRD Read register (load content of selected register on
`system IIO bus 80) (1 bit)
`RWR Write register (load selected register from system
`110 bus) 80 (1 bit)
`In the preferred implementation. functions are launched
`by setting a bit in the op code corresponding to that function.
`In the I/O ping example, function__0 is “id_radd()” and
`function_5 is “dram_wr()”. The comments under “OP
`CODE FIELDS” show the instruction words that will be
`compiled from the native C code. Compare the code for
`“host()”. For example, the function “host write” begins with
`instruction 7 and completes with instruction 10. The “C”
`code for instruction 7 is
`
`10
`
`20
`
`25
`
`6
`
`CAD Conditional instruction (4 bits)
`and implements 14 functions for execution on demand.
`When a traditional C program is compiled for operation
`on a conventional CPU such as a 486 or a 68040, the
`compiler generates code that places function calls on a stack
`in memory. To pass parameters to a function. the parameters
`are loaded on a stack where they can be popped by the
`function. The calling function then waits for a “return” from
`the called function before resuming operation.
`The present invention allows a similar scheme to be
`implemented in hardware. However, the hardware imple
`mentation is more powerful than the traditional software
`implementation since the traditional software runs on hard
`ware than can in general handle only one instruction per
`execution cycle (although some pipelining in faster CPUs
`provides some speed bene?t). With the present hardware
`implementation. any number of functions can be launched
`on a single execution cycle. The only limitations are that the
`programmer (or compiler) must be careful to identify any
`data dependencies between function calls that would neces
`sitate operation in a certain order or relative time frame.
`Extensive parallelism can be achieved using the present
`implementation. but the programmer (or compiler) must also
`identify any competing calls to shared system resources and
`add or exert program flow control to reduce or eliminate
`inconsistent attempts to simultaneously access system
`resources. One system resource of particular interest is a
`system bus. such as the I/O bus 80 in FIG. 1. In general. only
`one device or function should write data to the bus at any
`point in time. However, it is possible and sometimes desir
`able for more than one device or function to read data from
`the bus at any particular time. This might be useful, for
`example, to load identical constants into multiple functions,
`especially for initializing. If implemented properly, there
`should be few, preferably no. attempts by different devices
`or functions to attempt to seize control of the bus simulta
`neously.
`The method and system described above can be very
`useful in a more comprehensive hardware implementation of
`algorithmic language code. An original source code program
`in a language such as “C” can be compiled to identify
`functionalities appropriate for implementation in the system
`described above and then loaded as a con?guration in a
`PLD. See. in general. the methods and systems described in
`co-pending. commonly-assigned US. patent application Ser.
`
`35
`
`and the corresponding instruction is:
`
`Inst NAD RET WT RAD RRD RWR F110
`7
`8
`O
`0
`2
`1
`0
`l
`
`Fnl
`O
`
`Fn2 F113
`O
`0
`
`which corresponds to
`
`select register 2 (entry in LUT in 40)
`read the contents of the register and load onto IIO bus 80
`execute function 0 (which in turn is de?ned under low level functions)
`and then execute instruction 8.
`
`Instruction 8. which follows, launches two simultaneous
`functions: ld_dram_row() {Fun 1} and another call to
`ld__radd(hx) {function 0. but now selecting register 1}. If
`appropriate. each function could be launched simulta
`neously.
`The bit blt CPU uses the same basic structure. but
`includes more functions and more complex instructions. The
`listing for the bit blt tile CPU adds an opcode
`
`55
`
`No. 07/972,933. ?led Nov. 5. 1992, and now abandoned.
`entitled “SYSTEM FOR COMPILING ALGORITHMIC
`LANGUAGE SOURCE CODE INTO HARDWARE.”
`which is incorporated in full herein by reference.
`The present system is particularly useful when used in
`conjunction with an automatic compiler that analyzes a
`given input source code to ?rst identify functions or portions
`of functions that can be usefully implemented using the
`60 present system. In particular. most “for” or “while” loops are
`likely candidates for implementation as a speci?c instruction
`set CPU. As described in detail above. the management of
`the loop is generally suitable for the new CPU while the
`actual functions executed in the loops can be implemented
`separately using a variety of known techniques. Other inher
`ently sequential operations may also be suitable for imple
`mentation using the present system.
`
`65
`
`5
`
`
`
`5,652,875
`
`7
`Such a compiler should identify a loop or series of
`sequential operations, then collect information about each
`function called within that code segment, and any param
`eters required by each function. If the number of functions
`is not large, preferably less than about 16, then the compiler
`should identify a suitable set of op codes for passing any
`needed parameters to each function and for interlocking
`functions, as needed. Each function plus the speci?c instruc
`tion set CPU can be compiled and implemented in an
`appropriate section of programmable hardware, together 10
`with any needed connections.
`
`8
`A general description of the system and method of using
`the present invention as well as a preferred embodiment of
`the present invention has been set forth above. One skilled
`in the art will recognize and be able to practice additional
`variations in the methods described and variations on the
`device described which fall within the teachings of this
`invention. The spirit and scope of the invention should be
`limited only as set forth in the claims which follow.
`
`Listing 1
`
`PGA
`part: "4003PC84-6";
`
`pio
`pio
`pin
`
`I
`int
`int4
`int4
`bit
`bit
`bit
`bit
`bit
`bit
`void
`bit
`i11t10
`int8
`intlO
`bit
`bit
`bit
`
`/
`I0 PINS
`pio mbus_data:{15,16,17,18, 19,20,23,24, 5,26,27,28, 36,37,38,41};
`mbus_nfun;
`mbus_flm={78,57,35,39};
`fast mbus_ok=40;
`mbus_select=71;
`mbus_nok;
`mbus_master,
`mbus_slave;
`dram_slave;
`mbus_return();
`reg_se1;
`uctr radd;
`pio dout dmm_data={77,70,69,68,67,66,65,62};
`pin fast dranL_add1:{79,80,8l,82,83,84, 3, 4, 5, 6};
`ras ,cas ,wr ,rd ;
`pin cas_;
`pin ras_,wr_,\'d_;
`loc ras_= 9;
`Ice cas_=14;
`loc w1'_ = 8;
`loc rd_ = 7;
`nclk mas;
`bit
`pin test0=72;
`bit
`pin test1:51;
`bit
`pin test2=E;
`bit
`int8 pio 1bus_data={61,60,59,58, 56,50,49,48};
`int4 pio lbus_fun ={47,46,45,44};
`/****************
`mt
`mfun grad4_rd_reg
`void
`mfun grad4_wr_reg
`void
`mfun grad4_wr_addr
`void
`mfun grad4_wr_dram
`int
`mfun grad4 rd dram
`void
`mfun grad4_refresh
`void
`mfun grad4_slave
`void
`mfun gpu_go_tile
`void
`mfun gpu_wr_add.r
`I
`CPUS
`void
`cpu host(int hosLd);
`
`#*******#**=I*************/
`)=0xf;
`(int al
`)=0xe;
`(int :10 ,d0
`(int mod,alo,ahi)=0xb;
`(int data
`)=0xa;
`(
`)=0x9;
`(
`)=0x4;
`(int id
`)=0x1;
`(
`)=0x7;
`(int wraO,wra1,wra2,wra3)=0x8;
`1'
`
`FUNC'I'IONS********************** ****#**/
`
`/*******#**# *****
`void main();
`void
`post host_add();
`void
`post h0st_1'ef();
`void
`post host_wr (int h0st_wr_d);
`void
`post host _rd 0;
`void
`post go_tile 0;
`/**********#*$****
`void
`post ld_1add(intl0 radd_d);
`void
`post ld_dram_1ow();
`void
`post ld_dranLc0l();
`
`post dram_wr (intlO dram_wr_d);
`void
`xfun dram_end();
`int
`void post refO;
`void
`post ld_ctr (intlO ld_ctr_d);
`intlO
`post xfun rd_cn'();
`
`********$***I
`
`6
`
`
`
`9
`
`5 ,652,875
`
`-continued
`
`10
`
`Listing 1
`
`xfun inc_rcg(int10 inc__reg_d);
`intlO
`post inc_cn'();
`void
`post CiI_.P°50;
`bit
`post 1bus_1'd O;
`im8
`post lbus_wr (int8 lbus_wr_d);
`void
`/*******************$*
`*******#****=I***********l
`intlO
`uctr nctr,
`int4
`lfun;
`/: HOST CPU
`/*
`OPCODE FIELDS
`NADqiext instruction
`RAD=register address
`RRD=registcr is read onto bus when instruction is called
`RWR qegister written from bus when function returns
`WAIT=wait for called function to return
`RET =retum to caller
`FUNO=ca11 ldJaddO;
`FUN1=call ld_dram_row();
`FUN2=call ld_dram c010;
`
`*/
`
`FUN5=call dram_wr();
`
`INST NAD :RET :WAII‘ :RAD
`O
`O
`O
`O
`O
`1
`2
`O
`O
`1
`2
`O
`1
`0
`2
`3
`4
`0
`O
`2
`4
`5
`0
`0
`1
`5
`6
`O
`1
`O
`6
`O
`1
`1
`l
`7
`8
`O
`O
`2
`8
`9
`O
`O
`l
`9
`1O
`0
`O
`O
`10
`0
`1
`1
`1
`
`rd
`end
`col
`row
`radd
`RRD RWR :FUNO FUNI FUN2 ".FUN3 .FUN4
`O
`O
`0
`O
`O
`O
`0
`O
`1
`O
`0
`O
`O
`O
`O
`1
`0
`0
`O
`0
`0
`l
`O
`1
`0
`0
`O
`O
`1
`O
`1
`1
`0
`0
`0
`O
`1
`0
`0
`1
`0
`1
`0
`1
`O
`O
`O
`1
`O
`1
`O
`1
`O
`O
`0
`O
`1
`O
`1
`1
`O
`O
`O
`O
`O
`O
`O
`1
`0
`O
`0
`1
`0
`0
`O
`1
`0
`
`wr
`:FUNS
`O
`O
`O
`0
`O
`0
`O
`O
`O
`1
`0
`
`*/
`l$ HOST CPU
`host()
`{
`
`*/
`
`1* allocate a register ?le of 3 regs */
`
`int hd,hx,hy;
`host add:
`lnt=host_d;
`:hy=host__d;
`host._r¢
`
`:1d_dranLr0w();
`:1d_dram_col();
`:hx=dram_end();
`host_wr:
`
`:ld__dram_xow();
`:1d_dram_col();
`:hx=dram_end();
`
`return;
`
`ld_radd(hy);
`1d_radd(hX);
`hd=dram_1d();
`return;
`
`*/
`/* inst 1
`/* inst 2 */
`
`/* inst 3 */
`/* inst 4 */
`/* inst 4 */
`/* inst 6 */
`
`ld_ra4'kl(hy);
`ld_radd(hx);
`dranLwr?id);
`ret-um;
`
`/* inst 7 */
`/* inst 8 */
`/* inst 9 */
`/* inst 10 *l
`
`*/
`
`:REl‘zWAIT :RAD :RRD :RWR TUNO ZFUNI :FUNZ :FU'N3 :FUN4 :FUNS , . . .7
`
`}
`I: BIT BLT TILE CPU
`/*
`OPCODE FIELDS
`NAD=next instruction
`CAD=conditional instruction
`RAD=register address
`RRD=register is read onto bus when instruction is called
`RWR =register written from bus when function returns
`WA1T=wait for called flmction to return
`RET =return to caller
`FUNO =1d__ctr();
`FUNl =rd_ctr();
`FUNZ mtr_pos())
`FUN3 =ld__radd();
`FUN4 =1d_dram_row();
`FU‘NS =dram_rd();
`FU'N6 =inc_ctr();
`FUN7 =1bus_wr();
`FU'NB =1d_dram c010;
`FUN9 =dram__wr();
`FUN10=1bus_IdO;
`FUN11=dram_end()
`FUN12=inc_reg();
`FUN13=i11C_IBgO;
`INST NAD :CAD
`
`7
`
`
`
`Listing 1
`
`11
`
`5,652,875
`
`-continued
`
`OPHOOOOOOOOl-‘OHOO
`
`OQOOOOOOQOOOOOOO
`
`OOOOOOOOOOOOOOOO
`
`OOOOOOOOQOOOQOOO
`
`12
`
`OOOOOOOOOQOOOOOO
`
`OOOOOOOOOOOOOOOO
`
`oooooooooooboooo
`
`1d_dram_col();
`
`:return;
`
`/= Low LEVEL FUNCTIONS :*/
`inc_xeg()
`
`retum(nctr);
`
`'I-POSO
`
`who
`
`return(nclr.9);
`
`8
`
`
`
`13
`
`5,652,875
`
`-continucd
`
`14
`
`Listing 1
`
`/**#******$Ik*******lll**
`
`************************l
`
`if(mbus__slave)
`
`: retum(host_xegs);
`}
`if( regJd)
`
`: retum(tile_mgs);
`
`9
`
`
`
`15
`
`5,652,875
`
`-continued
`
`16
`
`Listing 1
`
`}
`}
`grad4_wr_dram()
`if(mbus_slave)
`
`}
`grad4_rd__dram()
`if(mbus_slave)
`retum(host_regs);
`host_rd();
`
`}
`sradLwfreshO
`
`/ ___ FlX LOCATION OF REGISTERS
`10c host_.rad ={ 0xl2,0xl2, 0x13,0xl3 };
`10c host_regs={ 0x22,0x22, 0x23,0x23, 0x24,0x24, 0x25,0x25, 0x26,0x26,
`0x27,0x27, 0x28,0x28, 0x29,0x29 };
`
`loc tile_regF{ 0x42,0x42, 0x43,0x43, 0x44,0x44, 0x45,0x45, Ox46,0x46};
`1* 100 radd
`= 0x36; */
`weight xio=99;
`weight mbus_data=99;
`
`,
`
`DEN'D
`
`What is claimed is:
`1. A method of designing a CPU for implementation in a
`con?gurable hardware device, said method comprising:
`providing a logic scheme,
`providing a con?gurable hardware device,
`identifying a plurality of operations in a logic scheme
`which are suitable for implementation in said con?g
`urable hardware device,
`identifying an executable function in said logic scheme,
`identifying a parameter, if any, required for said execut
`able function,
`identifying the logic ?ow in said scheme and providing
`for at least two connected, con?gurable system
`resources, each of selected size, to implement said logic
`scheme,
`selecting an op code of con?gurable, selected size to
`control said executable function, and
`providing a way to implement in con?gurable hardware:
`each of said connected, con?gurable system resources,
`a means to provide any needed parameter to said
`executable function,
`a means to pass said op code to said system resources,
`and
`
`35
`
`45
`
`55
`
`a means to call said executable function, where said op
`code is selected from the group consisting of
`an op code to invoke at least one of said system
`resources and implement said logic scheme,
`an op code to pass a parameter to said executable
`function, and
`an op code to invoke said executable function.
`2. The method of claim 1 wherein said plurality of
`operations is inherently sequential. _
`3. The method of claim 1 wherein said plurality of
`operations is inherently parallel.
`4. The method of claim 1 further comprising providing a
`bus line to pass said parameter.
`5. The method of claim 4 further comprising identifying
`the logic ?ow in said scheme and providing for at least two
`connected system resources to implement said logic scheme.
`6. The method of claim 1 further comprising executing
`said function.
`7. The method of claim 1 wherein said plurality of
`operations includes a ?rst and a second of said operations
`capable of executing simultaneously.
`8. The method of claim 1 wherein said plurality of
`operations includes a plurality of said operations capable of
`executing simultaneously.
`
`*****
`
`10