throbber
United States Patent [191
`Taylor
`
`[54] IMPLEMENTATION OFA SELECTED
`INSTRUCTION SET CPU 110
`PROGRAMMABLE HARDWARE
`
`_
`[75] Inventor: Brad Taylon Oakland’ Cahf'
`.
`.
`,
`,
`73 Ass1 nee: G a
`rations C0 oration.
`g
`Bgk?ggeca?f‘
`[
`1
`l‘P
`/
`
`[21] AP p 1' NO': 449563
`[22] Filed:
`May 24, 1995
`
`Related U.S. Application Data
`
`abandoned.
`27,
`NO.
`Continuation
`[51] Int. Cl.6 ........................... .. G06F 9/00; G06F 15/177
`[52] U.S. c1. ................. .. 395/500; 364/2294; 364/2328;
`364/240, 364/2813, 364/2621? 3364/1316. 1
`58 F. M f S rch ’
`’
`3’95/375 800
`[
`]
`1e 0 ea
`................................... ..
`393/506
`
`[56]
`
`.
`References cued
`U.S. PATENT DOCUMENTS
`364/716
`4 037 094 7/19,” vmdjemdonck
`395/375
`4:250:545
`2/1981 Blahut et a1.
`395/375
`4,422,141 1211983 Shoji ............. ..
`5,042,004
`8/1991 Agrawa et a1. ....................... .. 395/275
`
`US005652875A
`Patent Number:
`[11]
`[45] Date of Patent:
`
`5,652,875
`Jul. 29, 1997
`
`5,175,824 12/1992 Soderberg et a1. ................... .. 395/325
`5,208,490 5/1993 Yetter .................................... .. 307/452
`
`Primary Examiner-Kevin J. Teska
`Assistant Examiner—Ayni Mohamed
`Al'tomey, Agent, or Firm—Adam H. Tachner; Crosby,
`Heafey, Roach & May
`[57]
`ABSTRACT
`A method of designing a CPU for implementation in a
`con?gurable hardware device by identifying a series of
`operations in a logic scheme which are suitable for imple
`mentation in the device, identifying an executable function
`and any needed parameters in the logic scheme. identifying
`thg logic ?ow in [he schenleq providing for at least two
`mmlcFted systemmsomes ‘°_i1PP1°m'=nt “21°89 “11mm,
`“15mg 3“ 01’ wk“ and Pmvldmg ‘‘ Way ‘0 “"Plemcm Fhc
`Various components needed to call and execute the function
`according to the logic scheme. A useful op code may invoke
`a sysmm resource. implement the logic scheme. pass a
`parameter to the function, or invoke the function. The
`con?gurable hardware system can function as a CPU, using
`logic resources including a next address RAM, one or more
`registers, a function execution controller, and one or more
`busses for passing signals and data between the components
`and funcuons'
`
`8 Claims, 1 Drawing Sheet
`
`CALL
`
`ll
`
`3 CALL_INST E
`
`NEXT INST
`
`10
`S
`INST ;
`MUX
`
`0000 BIT
`l r
`
`,4
`'
`
`20
`
`70
`INST__ADDR }
`40
`f
`
`30
`f
`
`FUN CALLS
`
`RET
`<
`
`hag,
`L
`
`\1
`
`12 f
`
`BAD, RWR, RRD
`w
`L
`‘f
`|
`
`IWAIT —G' W
`
`v
`I REG ADDR RD
`
`D REGS
`
`0
`,460
`
`21;
`
`I
`\
`
`I
`[/0 BUS
`j N
`I
`
`FUNS
`
`L
`R
`
`l
`
`‘
`o D R0
`
`l
`
`____,____...-—-> :
`
`FUN 1
`
`2
`3
`Z
`
`FUN RETURN
`
`/‘\
`
`H)
`
`31
`
`1
`
`EX 1023
`IPR of Pat. No. 6,892,304
`
`

`
`U.S. Patent
`
`Jul. 29, 1997
`
`5,652,875
`
`5dub
`
`~_8<:5z_
`
`<
`
`<
`
`:<3.92
`
`57:m
`
`5233
`
`:5
`
`2
`
`
`
`

`
`5,652,875
`
`1
`EVIPLEMENTATION OF A SELECTED
`INSTRUCTION SET CPU IN
`PROGRAMIVIABLE HARDWARE
`
`This is a continuation of application Ser. No. 08/127,859
`?led on Sep. 27, 1993 now abandoned, entitled IMPLE
`MENTATION OF A SELECTED INSTRUCTION SET
`CPU IN PROGRAMMABLE HARDWARE.
`
`FIELD OF THE INVENTION
`
`This invention relates to a method of implementing a
`selected instruction set central processing unit in a program
`mable hardware device and a system with such a CPU.
`
`BACKGROUND OF THE INVENTION
`
`Most computational or process control tasks can be sub
`divided into a series of relatively simple steps or decisions.
`An engineer can analyze a task and design a logic scheme
`which can be implementedin a custom-designed circuit such
`as an ASIC or in a general purpose CPU such as an 8086 or
`Z-80. In a traditional implementation. a general purpose
`CPU can perform a wide variety of functions but must
`operate in a synchronous manner, processing only one step
`at a time. An ASIC can perform a variety of functions
`simultaneously and to a large extent asynchronously, but an
`ASIC can only implement the scheme for which it was
`designed. Thus an ASIC provides high speed but only for a
`speci?c functionality and a general purpose CPU provides
`great ?exibility but at limited speed.
`A central processing unit or “CPU”, as used in this
`disclosure, is taken to mean a von Neumann machine. A
`minimum von Neumann machine can read input, write
`output which is dependent on the input. and includes both
`invert and add functions. These principle components and
`functions provide the basis for some very sophisticated
`devices but each is still a CPU.
`A logic scheme can also be implemented in a program
`mable logic device (PLD) such as a ?eld programmable gate
`array (FPGA). PLDs are available from several
`manufacturers, including Xilinx, AT&T, Altera, Atmel and
`others. In general, a PLD can operate much faster than a
`general purpose CPU and can handle asynchronous pro
`cesses. A PLD is rarely as fast as an ASIC but does allow
`changes to the logic scheme.
`The ?eld of con?gurable hardware is well lmown and has
`been the subject of intense engineering development for the
`last few years. See in general the background section of
`co-pending. commonly-assigned US. patent application Ser.
`No. 07/972,933. ?led Nov. 5, 1992, and now abandoned.
`entitled “SYSTEM FOR COMPILING ALGORITHMIC
`LANGUAGE SOURCE CODE INTO HARDWARE,”
`which is incorporated in full herein by reference.
`Previous implementations of PLDs. however, generally
`have been used only for a speci?c logic scheme which is
`changed only if the logic needs to be redesigned. Up until
`now, PLDs have been used to implement logical functions
`without using op codes since providing logical resources to
`interpret and execute op codes takes up precious resources.
`A typical PLD application is as a monitor controller on a
`video board. As new monitors are released on the market or
`as the board designer develops an improved algorithm, the
`vendor may release revised con?guration software for the
`PLD. This software can be distributed through traditional
`channels such as downloading from a bulletin board or
`asking the user to visit a distributor to have the new
`
`20
`
`25
`
`35
`
`45
`
`55
`
`65
`
`2
`con?guration loaded. Such revisions are relatively
`infrequent, possibly months or years apart.
`An alternative way to implement logic functions is to load
`functions in hardware only as needed. This is particularly
`useful when certain functions are needed only rarely and
`only a limited amount of hardware resources are available.
`Thus the engineer would prefer to use the available
`resources for the most frequently used functions. Using
`techniques well known in cacheing schemes. the engineer
`can provide a variety of functions. each in a form ready to
`load as a con?guration ?le, and load or unload functions as
`needed
`Until this time, engineers have only begun to consider the
`possibility of “cacheing” functions as hardware con?gura
`tions. The present method provides a way to implement a
`wide variety of functions in programmable hardware
`devices.
`
`SUMIMARY OF THE INVENTION
`
`This invention provides a method and system for design
`ing and implementing a selected instruction set (SISC) CPU
`in programmable hardware. The method involves identify
`ing a series of operations in a logic scheme which are
`suitable for implementation in the device. identifying an
`executable function and any needed parameters in the logic
`scheme, identifying the logic ?ow in the scheme, providing
`for at least two connected system resources to implement the
`logic scheme, selecting an op code. and providing a way to
`implement the various components needed to call and
`execute the function according to the logic scheme. A useful
`op code may invoke a system resource, implement the logic
`scheme, pass a parameter to the function, or invoke the
`function. The con?gurable hardware system can function as
`a CPU. using logic resources including a next address RAM,
`one or more registers, a function execution controller, and
`one or more busses for passing signals and data between the
`components and functions.
`This CPU can initiate additional functions which may be
`quite complex. In general. the architecture can be used to
`implement a variety of related functions and provide a
`compact, fast and economical system.
`One object of the invention is provide a method of
`analyzing a logic scheme and implementing it in con?g
`urable hardware as a CPU which processes op codes to
`initiate and coordinate any necessary functions.
`Another object of the invention is to provide a speci?c
`instruction set CPU.
`Yet another object of the invention is to provide a system
`to process multiple programs in the speci?c instruction set
`CPU.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 illustrates a preferred con?guration of a down
`loadable CPU.
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENTS
`Many computational or process control tasks fall into one
`of two classes—inherently sequential or inherently parallel.
`A sequential task. such as controlling a DRAM, follows a
`linear thread and executes only one step at a time.
`An inherently parallel task. such as inverting an image or
`searching certain databases. may involve manipulating a
`number (often a very large number) of data elements in more
`or less the same way. Such a process can often be imple
`
`3
`
`

`
`5 ,652,875
`
`3
`mented as a sequential process but since the results of
`manipulating one set of data elements and manipulating
`another set of data elements are independent, the manipu
`lations also can be done simultaneously. A parallel task is
`multithreaded and multiple threads can be launched simul
`taneously. A thread, in ‘turn, may launch additional threads.
`Referring to FIG. 1, a simple CPU can be implemented
`with instruction multiplexer 10, address RAM 20, function
`execution controller 30, and register address RAM 40, all
`connected by instruction address bus 70. Each component is
`discussed below in more detail.
`In a preferred implementation each of 20. 30, 40, and 60
`is a register, RAM or other memory portion. A program
`mable logic device such as a Xilinx 4003 can easily be
`con?gured to include one or more such memory compo
`nents. Since the CPU is designed to implement a speci?c
`logic scheme, each portion can be scaled as needed. For
`example, the available (or needed) address space for one
`implementation may be fully addressable using only 2 bits,
`in which case instruction address bus 70 and the output of
`address RAM 10 can be two bits wide but a di?’erent
`implementation might require a larger address, in which case
`address RAM output and the bus must be correspondingly
`larger.
`In a preferred implementation, address RAM 20 is used to
`hold the address of the next instruction to be executed. This
`provides a simple mechanism to execute a sequential pro
`gram and allows for complex branching on conditions. The
`size of address RAM 20 can be selected to accommodate the
`largest address needed in a speci?c implementation. One
`means of handling branch on conditional is to provide a
`primary instruction store but wherever a branch can occur,
`providing an alternate instruction store so that a branch on
`condition can branch by simply evaluating the condition and
`selecting one instruction store if the condition is true and the
`other instruction store if the condition is false.
`Function execution controller 30 can be con?gured with
`any number of bits, allocating one bit position for each of 11
`functions to be executed under control of the CPU. The exact
`number of functions needed for any speci?c implementation
`will, in general, vary and the size of function execution
`controller 30 may take an integer value corresponding to the
`number of needed functions.
`Separate circuits are provided for function 1 31, function
`2 (not shown), and so on through function u (not shown).
`Each function may be partially or, preferably, completely
`implemented in con?gurable logic of a PLD. Each function
`may be independently connected to registers 60 or a constant
`store (not shown) as needed. Alternatively, some or all
`functions and appropriate resources can be connected
`together through a bus such as I/O bus 80. Thus each
`function can access any needed parameters, e.g. from
`registers, or global constants, e.g. from a constant store.
`When appropriate, a function can be connected to an exter
`nal signal such as an external bus or an interrupt line. In a
`preferred implementation, for every function that must
`return a value or indicate completion, a “retum” signal is
`connected to multi-input OR gate 35.
`Register address RAM 40 is preferably a look-up table
`with pointers to actual registers 60. In general, any register
`can be accessed directly, thereby providing a store of local
`variables. By comparison, in the “C” software language. it
`is common to have a stack of local variables, which can be
`stored in registers or held in a stack. These registers provide
`a convenient means for passing parameters when invoking
`or returning from a function.
`
`4
`Instruction multiplexer 10 may be connected to external
`devices through lines 11. which may be a system bus, e.g. an
`ISA bus. Instruction multiplexer 10 as shown is also con
`nected to address RAM 20 through line 12. This allows
`ready implementation of branch commands such as:
`If (condition)
`Then execute next instruction
`Else skip to following instruction
`If evaluation of a function (not shown) requires a skip. this
`condition can be signalled on condition line 21 and for
`warded from address RAM 20 over line 12 to instruction
`multiplexer 10 which then discards the pending instruction
`and provides the next instruction in sequence.
`Using this general structure a wide variety of functions
`can be implemented. One particular advantage of this type
`of structure is that the general CPU sequencer structure can
`be used with a great many functions. This is particularly
`useful, for example, for controlling nested “for” or “while”
`loops. The speci?c functions called by the loop may vary
`widely, but the present structure provides a mechanism for
`starting. controlling, and returning from such functions.
`Another particular advantage is that it is relatively easy to
`design new op codes in software but relatively time con
`suming to design and con?gure the illustrated CPU. Since
`many functions are invoked with a single bit over a single
`line, it is easy to provide a large number of functions in
`con?gurable hardware and connect or disconnect them to
`correspond to a current set of op codes. Using
`downloadable, recon?gurable hardware such as a Xilinx
`4003, a new set of software op codes can be loaded in
`microseconds while it requires several milliseconds to com
`pletely recon?gure the CPU device.
`Thus, if a large program would bene?t from two or more
`CPUs of this general design but one implementation requires
`14 op codes and calls 7 functions and another implementa
`tion requires only 3 op codes but calls 12 functions, it is
`easier to design one CPU that can handle 14 op codes
`(probably designed for 16) and between 12 and 19 function
`call lines (depending on the speci?c nature of the functions
`and any possible overlap) and then load only the op codes
`needed to operate the CPU during a corresponding phase of
`the operation of the part.
`Afairly compact CPU can be designed using less than 16
`instructions. If additional instructions are needed, a larger
`CPU can be implemented using the principles disclosed
`herein. Alternatively, an appropriate portion of the con?gu
`ration can be stored and loaded as necessary.
`The system, and method of designing such a system, can
`be better understood by considering the following.
`The basic implementation of a CPU provides logical
`resom'ces. suitably connected, to be able to execute a series
`of instructions. A designer must select what instructions can
`be executed by the CPU and provide for execution of each
`instruction. In a preferred implementation, instructions
`include an op code and instructions are stored in memory
`which may be but need not be part of the CPU. In general,
`the actual function is implemented by additional circuitry,
`which is not shown in the ?gure.
`A very simple CPU may be implemented with even fewer
`components than the system shown in FIG. 1. The speci?c
`implementation is based on the speci?c functions that must
`be executed. A trivial function might be
`
`10
`
`15
`
`25
`
`35
`
`45
`
`55
`
`Register___1=x
`
`65
`
`or, slightly more complicated,
`
`x=fu(x).
`
`4
`
`

`
`5,652,875
`
`5
`In general. an instruction set can be constructed with an
`arbitrary number of bit ?elds. A variety of op codes can be
`implemented, but preferably center on program control
`functions such as low level reads and Writes, execution
`control. branch on conditional and such. Since multiple
`CPUs can be implemented and linked. each CPU can be
`optimized to execute a speci?c set of logic instructions and
`only the op codes needed to execute those instructions need
`be provided. In general. any computational functions are
`implemented separately and do not need an op code.
`Listing 1 includes actual C language code that was
`implemented in the CPU illustrated in FIG. 1. Reference to
`that listing may assist the reader in understanding the present
`invention. The listing includes two separate code sections, 1)
`a simple input/output “ping” and 2) a bit blt ?le CPU. For
`each CPU. the listing includes variable declarations, public
`functions, CPU functions, code for each function called by
`the CPU (“low level functions”) and code for the actual CPU
`(“host 0”) plus extensive comments describing the opcode
`and bit ?elds and a table showing a control program to be
`executed.
`In the “I/O Ping” implementation, op codes include:
`NAD Abit ?eld for the next instruction to execute (4 bits)
`RET Return from function (1 bit)
`WAIT Wait (suspend other execution) until called func
`tion returns (1 bit)
`RAD Register address (preferably a look up table entry,
`e.g. 2 bits)
`RRD Read register (load content of selected register on
`system IIO bus 80) (1 bit)
`RWR Write register (load selected register from system
`110 bus) 80 (1 bit)
`In the preferred implementation. functions are launched
`by setting a bit in the op code corresponding to that function.
`In the I/O ping example, function__0 is “id_radd()” and
`function_5 is “dram_wr()”. The comments under “OP
`CODE FIELDS” show the instruction words that will be
`compiled from the native C code. Compare the code for
`“host()”. For example, the function “host write” begins with
`instruction 7 and completes with instruction 10. The “C”
`code for instruction 7 is
`
`10
`
`20
`
`25
`
`6
`
`CAD Conditional instruction (4 bits)
`and implements 14 functions for execution on demand.
`When a traditional C program is compiled for operation
`on a conventional CPU such as a 486 or a 68040, the
`compiler generates code that places function calls on a stack
`in memory. To pass parameters to a function. the parameters
`are loaded on a stack where they can be popped by the
`function. The calling function then waits for a “return” from
`the called function before resuming operation.
`The present invention allows a similar scheme to be
`implemented in hardware. However, the hardware imple
`mentation is more powerful than the traditional software
`implementation since the traditional software runs on hard
`ware than can in general handle only one instruction per
`execution cycle (although some pipelining in faster CPUs
`provides some speed bene?t). With the present hardware
`implementation. any number of functions can be launched
`on a single execution cycle. The only limitations are that the
`programmer (or compiler) must be careful to identify any
`data dependencies between function calls that would neces
`sitate operation in a certain order or relative time frame.
`Extensive parallelism can be achieved using the present
`implementation. but the programmer (or compiler) must also
`identify any competing calls to shared system resources and
`add or exert program flow control to reduce or eliminate
`inconsistent attempts to simultaneously access system
`resources. One system resource of particular interest is a
`system bus. such as the I/O bus 80 in FIG. 1. In general. only
`one device or function should write data to the bus at any
`point in time. However, it is possible and sometimes desir
`able for more than one device or function to read data from
`the bus at any particular time. This might be useful, for
`example, to load identical constants into multiple functions,
`especially for initializing. If implemented properly, there
`should be few, preferably no. attempts by different devices
`or functions to attempt to seize control of the bus simulta
`neously.
`The method and system described above can be very
`useful in a more comprehensive hardware implementation of
`algorithmic language code. An original source code program
`in a language such as “C” can be compiled to identify
`functionalities appropriate for implementation in the system
`described above and then loaded as a con?guration in a
`PLD. See. in general. the methods and systems described in
`co-pending. commonly-assigned US. patent application Ser.
`
`35
`
`and the corresponding instruction is:
`
`Inst NAD RET WT RAD RRD RWR F110
`7
`8
`O
`0
`2
`1
`0
`l
`
`Fnl
`O
`
`Fn2 F113
`O
`0
`
`which corresponds to
`
`select register 2 (entry in LUT in 40)
`read the contents of the register and load onto IIO bus 80
`execute function 0 (which in turn is de?ned under low level functions)
`and then execute instruction 8.
`
`Instruction 8. which follows, launches two simultaneous
`functions: ld_dram_row() {Fun 1} and another call to
`ld__radd(hx) {function 0. but now selecting register 1}. If
`appropriate. each function could be launched simulta
`neously.
`The bit blt CPU uses the same basic structure. but
`includes more functions and more complex instructions. The
`listing for the bit blt tile CPU adds an opcode
`
`55
`
`No. 07/972,933. ?led Nov. 5. 1992, and now abandoned.
`entitled “SYSTEM FOR COMPILING ALGORITHMIC
`LANGUAGE SOURCE CODE INTO HARDWARE.”
`which is incorporated in full herein by reference.
`The present system is particularly useful when used in
`conjunction with an automatic compiler that analyzes a
`given input source code to ?rst identify functions or portions
`of functions that can be usefully implemented using the
`60 present system. In particular. most “for” or “while” loops are
`likely candidates for implementation as a speci?c instruction
`set CPU. As described in detail above. the management of
`the loop is generally suitable for the new CPU while the
`actual functions executed in the loops can be implemented
`separately using a variety of known techniques. Other inher
`ently sequential operations may also be suitable for imple
`mentation using the present system.
`
`65
`
`5
`
`

`
`5,652,875
`
`7
`Such a compiler should identify a loop or series of
`sequential operations, then collect information about each
`function called within that code segment, and any param
`eters required by each function. If the number of functions
`is not large, preferably less than about 16, then the compiler
`should identify a suitable set of op codes for passing any
`needed parameters to each function and for interlocking
`functions, as needed. Each function plus the speci?c instruc
`tion set CPU can be compiled and implemented in an
`appropriate section of programmable hardware, together 10
`with any needed connections.
`
`8
`A general description of the system and method of using
`the present invention as well as a preferred embodiment of
`the present invention has been set forth above. One skilled
`in the art will recognize and be able to practice additional
`variations in the methods described and variations on the
`device described which fall within the teachings of this
`invention. The spirit and scope of the invention should be
`limited only as set forth in the claims which follow.
`
`Listing 1
`
`PGA
`part: "4003PC84-6";
`
`pio
`pio
`pin
`
`I
`int
`int4
`int4
`bit
`bit
`bit
`bit
`bit
`bit
`void
`bit
`i11t10
`int8
`intlO
`bit
`bit
`bit
`
`/
`I0 PINS
`pio mbus_data:{15,16,17,18, 19,20,23,24, 5,26,27,28, 36,37,38,41};
`mbus_nfun;
`mbus_flm={78,57,35,39};
`fast mbus_ok=40;
`mbus_select=71;
`mbus_nok;
`mbus_master,
`mbus_slave;
`dram_slave;
`mbus_return();
`reg_se1;
`uctr radd;
`pio dout dmm_data={77,70,69,68,67,66,65,62};
`pin fast dranL_add1:{79,80,8l,82,83,84, 3, 4, 5, 6};
`ras ,cas ,wr ,rd ;
`pin cas_;
`pin ras_,wr_,\'d_;
`loc ras_= 9;
`Ice cas_=14;
`loc w1'_ = 8;
`loc rd_ = 7;
`nclk mas;
`bit
`pin test0=72;
`bit
`pin test1:51;
`bit
`pin test2=E;
`bit
`int8 pio 1bus_data={61,60,59,58, 56,50,49,48};
`int4 pio lbus_fun ={47,46,45,44};
`/****************
`mt
`mfun grad4_rd_reg
`void
`mfun grad4_wr_reg
`void
`mfun grad4_wr_addr
`void
`mfun grad4_wr_dram
`int
`mfun grad4 rd dram
`void
`mfun grad4_refresh
`void
`mfun grad4_slave
`void
`mfun gpu_go_tile
`void
`mfun gpu_wr_add.r
`I
`CPUS
`void
`cpu host(int hosLd);
`
`#*******#**=I*************/
`)=0xf;
`(int al
`)=0xe;
`(int :10 ,d0
`(int mod,alo,ahi)=0xb;
`(int data
`)=0xa;
`(
`)=0x9;
`(
`)=0x4;
`(int id
`)=0x1;
`(
`)=0x7;
`(int wraO,wra1,wra2,wra3)=0x8;
`1'
`
`FUNC'I'IONS********************** ****#**/
`
`/*******#**# *****
`void main();
`void
`post host_add();
`void
`post h0st_1'ef();
`void
`post host_wr (int h0st_wr_d);
`void
`post host _rd 0;
`void
`post go_tile 0;
`/**********#*$****
`void
`post ld_1add(intl0 radd_d);
`void
`post ld_dram_1ow();
`void
`post ld_dranLc0l();
`
`post dram_wr (intlO dram_wr_d);
`void
`xfun dram_end();
`int
`void post refO;
`void
`post ld_ctr (intlO ld_ctr_d);
`intlO
`post xfun rd_cn'();
`
`********$***I
`
`6
`
`

`
`9
`
`5 ,652,875
`
`-continued
`
`10
`
`Listing 1
`
`xfun inc_rcg(int10 inc__reg_d);
`intlO
`post inc_cn'();
`void
`post CiI_.P°50;
`bit
`post 1bus_1'd O;
`im8
`post lbus_wr (int8 lbus_wr_d);
`void
`/*******************$*
`*******#****=I***********l
`intlO
`uctr nctr,
`int4
`lfun;
`/: HOST CPU
`/*
`OPCODE FIELDS
`NADqiext instruction
`RAD=register address
`RRD=registcr is read onto bus when instruction is called
`RWR qegister written from bus when function returns
`WAIT=wait for called function to return
`RET =retum to caller
`FUNO=ca11 ldJaddO;
`FUN1=call ld_dram_row();
`FUN2=call ld_dram c010;
`
`*/
`
`FUN5=call dram_wr();
`
`INST NAD :RET :WAII‘ :RAD
`O
`O
`O
`O
`O
`1
`2
`O
`O
`1
`2
`O
`1
`0
`2
`3
`4
`0
`O
`2
`4
`5
`0
`0
`1
`5
`6
`O
`1
`O
`6
`O
`1
`1
`l
`7
`8
`O
`O
`2
`8
`9
`O
`O
`l
`9
`1O
`0
`O
`O
`10
`0
`1
`1
`1
`
`rd
`end
`col
`row
`radd
`RRD RWR :FUNO FUNI FUN2 ".FUN3 .FUN4
`O
`O
`0
`O
`O
`O
`0
`O
`1
`O
`0
`O
`O
`O
`O
`1
`0
`0
`O
`0
`0
`l
`O
`1
`0
`0
`O
`O
`1
`O
`1
`1
`0
`0
`0
`O
`1
`0
`0
`1
`0
`1
`0
`1
`O
`O
`O
`1
`O
`1
`O
`1
`O
`O
`0
`O
`1
`O
`1
`1
`O
`O
`O
`O
`O
`O
`O
`1
`0
`O
`0
`1
`0
`0
`O
`1
`0
`
`wr
`:FUNS
`O
`O
`O
`0
`O
`0
`O
`O
`O
`1
`0
`
`*/
`l$ HOST CPU
`host()
`{
`
`*/
`
`1* allocate a register ?le of 3 regs */
`
`int hd,hx,hy;
`host add:
`lnt=host_d;
`:hy=host__d;
`host._r¢
`
`:1d_dranLr0w();
`:1d_dram_col();
`:hx=dram_end();
`host_wr:
`
`:ld__dram_xow();
`:1d_dram_col();
`:hx=dram_end();
`
`return;
`
`ld_radd(hy);
`1d_radd(hX);
`hd=dram_1d();
`return;
`
`*/
`/* inst 1
`/* inst 2 */
`
`/* inst 3 */
`/* inst 4 */
`/* inst 4 */
`/* inst 6 */
`
`ld_ra4'kl(hy);
`ld_radd(hx);
`dranLwr?id);
`ret-um;
`
`/* inst 7 */
`/* inst 8 */
`/* inst 9 */
`/* inst 10 *l
`
`*/
`
`:REl‘zWAIT :RAD :RRD :RWR TUNO ZFUNI :FUNZ :FU'N3 :FUN4 :FUNS , . . .7
`
`}
`I: BIT BLT TILE CPU
`/*
`OPCODE FIELDS
`NAD=next instruction
`CAD=conditional instruction
`RAD=register address
`RRD=register is read onto bus when instruction is called
`RWR =register written from bus when function returns
`WA1T=wait for called flmction to return
`RET =return to caller
`FUNO =1d__ctr();
`FUNl =rd_ctr();
`FUNZ mtr_pos())
`FUN3 =ld__radd();
`FUN4 =1d_dram_row();
`FU‘NS =dram_rd();
`FU'N6 =inc_ctr();
`FUN7 =1bus_wr();
`FU'NB =1d_dram c010;
`FUN9 =dram__wr();
`FUN10=1bus_IdO;
`FUN11=dram_end()
`FUN12=inc_reg();
`FUN13=i11C_IBgO;
`INST NAD :CAD
`
`7
`
`

`
`Listing 1
`
`11
`
`5,652,875
`
`-continued
`
`OPHOOOOOOOOl-‘OHOO
`
`OQOOOOOOQOOOOOOO
`
`OOOOOOOOOOOOOOOO
`
`OOOOOOOOQOOOQOOO
`
`12
`
`OOOOOOOOOQOOOOOO
`
`OOOOOOOOOOOOOOOO
`
`oooooooooooboooo
`
`1d_dram_col();
`
`:return;
`
`/= Low LEVEL FUNCTIONS :*/
`inc_xeg()
`
`retum(nctr);
`
`'I-POSO
`
`who
`
`return(nclr.9);
`
`8
`
`

`
`13
`
`5,652,875
`
`-continucd
`
`14
`
`Listing 1
`
`/**#******$Ik*******lll**
`
`************************l
`
`if(mbus__slave)
`
`: retum(host_xegs);
`}
`if( regJd)
`
`: retum(tile_mgs);
`
`9
`
`

`
`15
`
`5,652,875
`
`-continued
`
`16
`
`Listing 1
`
`}
`}
`grad4_wr_dram()
`if(mbus_slave)
`
`}
`grad4_rd__dram()
`if(mbus_slave)
`retum(host_regs);
`host_rd();
`
`}
`sradLwfreshO
`
`/ ___ FlX LOCATION OF REGISTERS
`10c host_.rad ={ 0xl2,0xl2, 0x13,0xl3 };
`10c host_regs={ 0x22,0x22, 0x23,0x23, 0x24,0x24, 0x25,0x25, 0x26,0x26,
`0x27,0x27, 0x28,0x28, 0x29,0x29 };
`
`loc tile_regF{ 0x42,0x42, 0x43,0x43, 0x44,0x44, 0x45,0x45, Ox46,0x46};
`1* 100 radd
`= 0x36; */
`weight xio=99;
`weight mbus_data=99;
`
`,
`
`DEN'D
`
`What is claimed is:
`1. A method of designing a CPU for implementation in a
`con?gurable hardware device, said method comprising:
`providing a logic scheme,
`providing a con?gurable hardware device,
`identifying a plurality of operations in a logic scheme
`which are suitable for implementation in said con?g
`urable hardware device,
`identifying an executable function in said logic scheme,
`identifying a parameter, if any, required for said execut
`able function,
`identifying the logic ?ow in said scheme and providing
`for at least two connected, con?gurable system
`resources, each of selected size, to implement said logic
`scheme,
`selecting an op code of con?gurable, selected size to
`control said executable function, and
`providing a way to implement in con?gurable hardware:
`each of said connected, con?gurable system resources,
`a means to provide any needed parameter to said
`executable function,
`a means to pass said op code to said system resources,
`and
`
`35
`
`45
`
`55
`
`a means to call said executable function, where said op
`code is selected from the group consisting of
`an op code to invoke at least one of said system
`resources and implement said logic scheme,
`an op code to pass a parameter to said executable
`function, and
`an op code to invoke said executable function.
`2. The method of claim 1 wherein said plurality of
`operations is inherently sequential. _
`3. The method of claim 1 wherein said plurality of
`operations is inherently parallel.
`4. The method of claim 1 further comprising providing a
`bus line to pass said parameter.
`5. The method of claim 4 further comprising identifying
`the logic ?ow in said scheme and providing for at least two
`connected system resources to implement said logic scheme.
`6. The method of claim 1 further comprising executing
`said function.
`7. The method of claim 1 wherein said plurality of
`operations includes a ?rst and a second of said operations
`capable of executing simultaneously.
`8. The method of claim 1 wherein said plurality of
`operations includes a plurality of said operations capable of
`executing simultaneously.
`
`*****
`
`10

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket