throbber
as) United States
`a2) Patent Application Publication (0) Pub. No.: US 2001/0014937 Al
`(43) Pub. Date: Aug. 16, 2001
`
`Huppenthaletal.
`
`US 20010014937A1
`
`(54) MULTIPROCESSOR COMPUTER
`ARCHITECTURE INCORPORATING A
`PLURALITY OF MEMORY ALGORITHM
`PROCESSORS IN THE MEMORY
`SUBSYSTEM
`
`(76)
`
`Inventors: Jon M. Huppenthal, Colorado Springs,
`CO (US); Paul A. Leskar, Colorado
`Springs, CO (US)
`
`Correspondence Address:
`HOGAN & HARTSON LLP
`ONE TABOR CENTER,SUITE 1500
`1200 SEVENTEENTH ST
`DENVER, CO 80202 (US)
`
`(21)
`
`Appl. No.:
`
`09/755,744
`
`(22)
`
`Filed:
`
`Jan. 5, 2001
`
`Related U.S. Application Data
`
`(60) Division of application No. 09/481,902, filed on Jan.
`12, 2000, now Pat. No. 6,247,110, which is a con-
`tinuation ofapplication No. 08/992,763, filed on Dec.
`17, 1997, now Pat. No. 6,076,152.
`
`Publication Classification
`
`(ST) MntO0! sassccmancaensacn GO06F 15/80
`
`CBS WBN.
`
`cccsccesisccterecrarcctieceectensasesacortvencentit 712/15
`
`(57)
`
`ABSTRACT
`
`A multiprocessor computer architecture incorporating a plu-
`rality of programmable hardware memory algorithm pro-
`cessors (“MAP”) in the memory subsystem. The MAP may
`comprise one or more field programmable gate arrays
`(“FPGAs”) which function to perform identified algorithms
`in conjunction with, andtightly coupledto, a microprocessor
`and each MAPis globally accessible by all of the system
`processors for the purpose of executing user definable
`algorithms. A circuit within the MAP signals when the last
`operand has completed its flow thereby allowing a given
`process to be interrupted and thereafter restarted. Through
`the use of read only memory (“ROM”)located adjacent the
`FPGA,a user program may use a single commandto select
`one of several possible pre-loaded algorithms thereby
`decreasing system reconfiguration time. A computer system
`memory structure MAPdisclosed herein may function in
`normal or direct memory access (“DMA”) modesof opera-
`tion and, in the latter mode, one device may feed results
`directly to another thereby allowing pipelining or parallel-
`‘Zing execution ofa user defined algorithm. The system of
`the present
`invention also provides a user programmable
`performance monitoring capability and utilizes parallelizer
`software to automatically detect parallel regions of user
`applications containing algorithms that can be executed in
`the programmable hardware.
`
`FINE GRAINED
`i
`i
`PARALLELISM
`|; ALGORITHMS | MEMORY SPACE
`
`PROCESSORS|
`110494
`1124
`4°
`| —-@j—_1A|_—fwapy
`11245
`
`MAP
`|
`1129,
`1084,
`| [770,11(MAP
`
`1085! 1105_
`110
`——3A1__(wapy’ 8
`
`@
`124,
`
`
`
`
`108g} W0s5Ot(uae)
`;
`1109p
`|
`s\te7p
`(e7}—4! OriGm| ALY 1279
`.
`108) ~@- EtBE 112g,
`"[70gq-Oey
`-[pe} | BA~
`yit2g5
`108g} F103
`
`
`
`
`
`USERINSTRUCTIONSANDDATA
`
`PARALLEL
`REGION 1
`
`PARALLEL
`REGION 2
`
`PARALLEL
`REGION 3
`
`
`
`REGION 4
`
`
`
`COARSE GRAINED
`DECOMPOSITION
`
`MEDIUM GRAINED
`DECOMPOSITION
`| PARALLEL REGIONS|
`
`
`
`3B 1
`
`110
`
`1
`
`Sf
`
`4B
`
`
`
`Petitioner Microsoft Corporation - Ex. 1022, Cover
`Petitioner Microsoft Corporation - Ex. 1022, Cover
`
`

`

`Patent Application Publication Aug. 16,2001 Sheet 1 of 4
`
`US 2001/0014937 Al
`
`126
`
`PROCESSOR
`0
`
`c 124
`
`PROCESSOR
`4
`
`PROCESSOR
`
`MEMORY
`INTERCONNECT
`
`N 14
`
`BANK 0
`FABRIC
`
`169
`
`MEMORY
`SUBSYSTEM
`
`MEMORY
`SUBSYSTEM
`BANK 1
`
`BANK M
`
`MEMORY
`SUBSYSTEM
`
`FIG. 1
`
`Petitioner Microsoft Corporation - Ex. 1022, Sheet 1
`Petitioner Microsoft Corporation - Ex. 1022, Sheet |
`
`

`

`Patent Application Publication Aug. 16, 2001
`
`Sheet 2 of 4
`
`US 2001/0014937 Al
`
`[aw}-—+-gg)
`
`
`
`dvVW
`
`dvW}+}—+4j7-—
`
`9
`
`[avn]@=ot‘85017Yoo1s|G87,,7||auaa|Oe—{es|—)
`vez,AuN®LgoLSNOILONYLSNI
`82z11|EeHWiva
`
`
`3OVvdSANOWSW|SWHLINOONW||SNOIDSYTaTIWuvd
`eaeGoOlb||oT|Son
`857441OrbOlt©[sa]
`Sreyseheeaeovps,AmicaOt
`lerkTaal—tae8Bez44Okeiedv5~TYEn,|Vez1,7Le
`VSzZ41SWWSg1||
`az¢gw)aOLLBOL
`
`GANIVYSANIAGANIVYeSWNIdaw
`
`WSNTWaVvdNOILISOdWOO30
`Sto.IF!OLL1¥e8oL
` suosescoualwo
`
`TaTWwdvd
`
`¢NOIDSSY
`
`€NOIDSY|_|
`
`Tanvavd
`
`YZOL
`
`6a
`
` |
`
`V1LVO GNV SNOILONYLSNI YASN
`
`TVaTivavd
`
`|NOIDSY
`
`
`
`QSANIVYSASYVOO
`
`NOILISOdWOO3G
`
`Petitioner Microsoft Corporation - Ex. 1022, Sheet 2
`Petitioner Microsoft Corporation - Ex. 1022, Sheet 2
`
`
`
`
`
`
`
`

`

`Patent Application Publication Aug. 16,2001 Sheet 3 of 4
`
`US 2001/0014937 Al
`
`SYSTEMTRUNKLINES
`
`Me
`
`S
`
`co>
`of
`S
`
`Lu
`s
`
`120
`
`oF
`

`by
`~
`
`
`
`
`
`MAPASSEMBLY
`
`136
`
`CONTROL
`|
`
`
`
`MEMORYARRAY
`
`S
`-
`Se
`
`”w
`
`o
`WwW
`
`aQ
`
`OQ<
`
`x
`
`Petitioner Microsoft Corporation - Ex. 1022, Sheet 3
`Petitioner Microsoft Corporation - Ex. 1022, Sheet 3
`
`

`

`Patent Application Publication Aug. 16,2001 Sheet 4 of 4
`
`US 2001/0014937 Al
`
`174
`
`7 182
`
`ROM
`|
`
`FROM
`CONFIGURATION
`ROMS
`
`USER CONFIGURATION
`PATTERN
`256
`172
`
`
`256-BIT
`|PARALLEL-TO-SERIAL
`CONVERTER
`
`ADDRESS
`BITS
`
`128
`
`
`
`178
`
`16
`
`COMMAND
`DECODER
`
`17
`170
`
`
`
`
`CONFIGURATION
`
`MUX
`
`
`
`
`
`CONFIG.
`
`FILE
`TO
`USER
`FPGA
`
`
`3 REGISTER
`CONTROLBITS
`152
`STATUS
`(
`t
`FROM |
`PIPELINE
`
`
`STATUS
`FPGA
`EMPTY FLAG
`
`REGISTERS
`
`|| | |
`
`|
`!
`Lam
`1 LAST
`6!
`OPERAND|
`FLAG
`|
`|
`
`164
`
`158
`
`
`|
`
`
`
`
`
`
`
`
`
`PIPELINE
`
`
`11 COUNTER||
`
`STATUS WORD
`OUTPUT
`!
`FROM FPGA
`
`'
`163
`P|PELINE
`
`EQUALITY
`DEPTH BITS
`
`
`COMPARITOR
`
`
`
`
`FIG. 4
`
`“132
`
`|
`
`||;
`
`GS a a ete eee ec 4
`
`Petitioner Microsoft Corporation - Ex. 1022, Sheet 4
`Petitioner Microsoft Corporation - Ex. 1022, Sheet 4
`
`

`

`US 2001/0014937 Al
`
`Aug. 16, 2001
`
`MULTIPROCESSOR COMPUTER
`ARCHITECTURE INCORPORATINGA
`PLURALITY OF MEMORY ALGORITHM
`PROCESSORS IN THE MEMORY SUBSYSTEM
`
`BACKGROUND OF THE INVENTION
`
`in general, to the
`[0001] The present invention relates,
`field of computer architectures incorporating multiple pro-
`cessing elements. More particularly, the present invention
`relates to a multiprocessor computer architecture incorpo-
`rating a number of memory algorithm processors in the
`memory subsystem to significantly enhance overall system
`processing speed.
`[0002] All general purpose computers are based on cir-
`cuits that have some formof processing element. These may
`take the form of microprocessor chips or could be a collec-
`tion of smaller chips coupled together to form a processor.
`In any case, these processors are designed to execute pro-
`gramsthat are defined by a set of program steps. The fact
`that these steps, or commands, can be rearranged to create
`different end results using the same computer hardware is
`key to the computer’s flexibility. Unfortunately, this flex-
`ibility dictates that the hardware then be designed to handle
`a variety of possible functions, which results in generally
`slower operation than would be the case were it able to be
`designed to handle only one particular function. On the other
`hand, a single function computeris inherently not a particu-
`larly versatile computer.
`
`[0003] Recently, several groups have begun to experiment
`with creating a processor out of circuits that are electrically
`reconfigurable. This would allow the processor to execute a
`small set of functions more quickly and then be electrically
`reconfigured to execute a different small set. While this
`accelerates some program execution speeds, there are many
`functions that cannot be implemented well in this type of
`system due to the circuit densities that can be achieved in
`reconfigurable integrated circuits, such as 64-bit floating
`point math. In addition, all of these systems are presently
`intended to contain processors that operate alone. In high
`performance systems,this is not the case. Hundreds or even
`tens of thousands of processors are often used to solve a
`single problem in a timely manner. This introduces numer-
`ous issues
`that such reconfigurable computers cannot
`handle, such as sharing of a single copy of the operating
`system. In addition, a large system constructed from this
`type of custom hardware would naturally be very expensive
`to produce.
`
`SUMMARY OF THE INVENTION
`
`Particularly disclosed herein is the utilization of
`[0005]
`one or more FPGAsto perform user defined algorithms in
`conjunction with, and tightly coupled to, a microprocessor.
`More particularly, in a multiprocessor computer system, the
`FPGAs are globally accessible by all of the system proces-
`sors for the purpose of executing user definable algorithms.
`
`Ina particular implementationofthe present inven-
`[0006]
`tion disclosed herein, a circuit is provided either within, or
`in conjunction with, the FPGAs which signals, by means of
`a control bit, when the last operand has completed its flow
`through the MAP, thereby allowing a given process to be
`interrupted and thereafter restarted. In a still more specific
`implementation, one or more read only memory (“ROM”)
`integrated circuit chips may be coupled adjacent the FPGA
`to allow a user program to use a single command to select
`one of several possible algorithms preloaded in the ROM
`thereby decreasing system reconfiguration time.
`[0007] Still further provided is a computer system memory
`structure which includes one or more FPGAs for the purpose
`of using normal memory access protocol to accessit as well
`as being capable of direct memory access (“DMA”) opera-
`tion. In a multiprocessor computer system, FPGAs config-
`ured with DMA capability enable one device to feed results
`directly to another thereby allowing pipelining or parallel-
`izing execution of a user defined algorithm located in the
`reconfigurable hardware. The system and method of the
`present invention also provide a user programmable perfor-
`mance monitoring capability and utilizes parallelizer soft-
`ware to automatically detect parallel regions of user appli-
`cations containing algorithms that can be executed in
`programmable hardware.
`
`is disclosed herein is a computer
`[0008] Broadly, what
`including at least one data processor for operating on user
`data in accordance with program instructions. The computer
`includes at least one memory array presenting a data and
`address bus and comprises a memory algorithm processor
`associated with the memory array and coupled to the data
`and address buses. The memory algorithm processor is
`configurable to perform at least one identified algorithm on
`an operand received from a write operation to the memory
`array.
`
`[0009] Also disclosed herein is a multiprocessor computer
`includinga first plurality of data processors for operating on
`user data in accordance with program instructions and a
`second plurality of memory arrays, each presenting a data
`and address bus. The computer comprises a memory algo-
`rithm processor associated with at least one of the second
`plurality of memory arrays and coupled to the data and
`address bus thereof. The memory algorithm processor is
`configurable to perform at least one identified algorithm on
`an operand received from a write operation to the associated
`one of the second plurality of memory arrays.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`In response to these shortcomings, SRC Comput-
`[0004]
`ers, Inc., Colorado Springs, Colo., assignee of the present
`invention, has developed a Memory Algorithm Processor
`(“MAP”) multiprocessor computer architecture that utilizes
`very high performance microprocessors in conjunction with
`user reconfigurable hardware elements. These reconfig-
`urable elements, referred to as MAPs, are globally acces-
`sible by all processors in the systems.
`In addition,
`the
`manufacturing cost and design time of a particular multi-
`processor computer system is relatively low inasmuchas it
`can be built using industry standard, commodity integrated
`circuits and,
`in a preferred embodiment, each MAP may
`
`comprise a Field Programmable Gate Array (“FPGA”) oper- (O011] FIG.1is a simplified, high level, functional block
`ating as a reconfigurable functional unit.
`diagram of a standard multiprocessor computer architecture;
`
`[0010] The aforementioned and other features and objects
`of the present invention and the manner of attaining them
`will become more apparent and the inventionitself will be
`best understood by reference to the following description of
`a preferred embodiment
`taken in conjunction with the
`accompanying drawings, wherein:
`
`Petitioner Microsoft Corporation - Ex. 1022, p. 1
`Petitioner Microsoft Corporation - Ex. 1022, p. 1
`
`

`

`US 2001/0014937 Al
`
`Aug. 16, 2001
`
`[0012] FIG. 2 is a simplified logical block diagram of a
`possible
`computer application program decomposition
`sequence for use in conjunction with a multiprocessor
`computer architecture utilizing a number of memory algo-
`rithm processors (“MAPs”) in accordance with the present
`invention;
`
`(0013] FIG. 3 is a more detailed functional block diagram
`of an individual one of the MAPsofthe preceding figure and
`illustrating the bank control logic, memory array and MAP
`assembly thereof; and
`
`[0014] FIG. 4 is a more detailed functional block diagram
`of the control block of the MAP assembly of the preceding
`illustration illustrating its interconnection to the user FPGA
`thereof.
`
`DESCRIPTION OF A PREFERRED
`EMBODIMENT
`
`[0015] With reference now to FIG. 1, a conventional
`multiprocessor computer 10 architecture is shown. The
`multiprocessor computer LO incorporates N processors 12,,
`through 12,, which are bi-directionally coupled to a memory
`interconnect fabric 14. The memory interconnectfabric 14 is
`then also coupled to M memory banks comprising memory
`bank subsystems 16, (Bank 0) through 16,, (Bank M).
`
`[0016] With reference now to FIG. 2, a representative
`application program decomposition for a multiprocessor
`computer architecture 100 incorporating a plurality of
`memory algorithm processors in accordance with the present
`invention is shown. The computerarchitecture 100 is opera-
`live in response to user instructions and data which,
`in a
`coarse grained portion of the decomposition, are selectively
`directed to one of (for purposes of example only) four
`parallel regions 102, through 102, inclusive. The instruc-
`tions and data output from each ofthe parallel regions 102,
`through 102, are respectively input to parallel regions seg-
`regated into data areas 104,
`through 104, and instruction
`areas 106, through 106,. Data maintained in the data areas
`104,
`through 104, and instructions maintained in the
`instruction areas 106, through 106, are then supplied to, for
`example, corresponding pairs of processors 108,, 108, (P1
`andP2); 108.,, 108, (P3 and P4); 108., 108, (P5 and P6); and
`108,, 108, (P7 and P8) as shown, Atthis point, the medium
`grained decomposition of the instructions and data has been
`accomplished.
`
`is
`[0017] A fine grained decomposition, or parallelism,
`effectuated by a further algorithmic decomposition wherein
`the output of each of the processors 108,
`through 108, is
`broken up, for example,
`into a number of fundamental
`algorithms 110,,, 110,,, 110,,, 110,,,
`through 110,,, as
`shown. Each of the algorithms is then supplied to a corre-
`sponding one of the MAPs 112,,, 112,,, 112,,, 112.,,
`through 112,,, in the memory space of the computer archi-
`tecture 100 for execution therein as will be more fully
`described hereinafter.
`
`[0018] With reference additionally now to FIG, 3, a
`preferred implementation of a memory bank 120 in a MAP
`system computer architecture 100 of the present invention is
`shown for a representative one of the MAPs 112 illustrated
`in the preceding figure. Each memory bank 120 includes a
`bank control logic block 122 bi-directionally coupled to the
`computer system trunk lines, for example, a 72 line bus 124.
`
`logic block 122 is coupled to a bi-
`The bank control
`directional data bus 126 (for example 256 lines) and supplies
`addresses on an address bus 128 (for example 17 lines) for
`accessing data at specified locations within a memory array
`130.
`
`‘The data bus 126 and address bus 128 are also
`[0019]
`coupled to a MAP assembly 112. The MAP assembly 112
`comprises a control block 132 coupled to the address bus
`128. The control block 132 is also bi-directionally coupled
`to a user field programmable gate array (“FPGA”) 134 by
`means of a numberofsignal lines 136. The user FPGA 134
`is coupled directly to the data bus 126.
`In a particular
`embodiment, the FPGA 134 may be provided as a Lucent
`Technologies OR3T80 device.
`
`[0020] The computer architecture 100 comprises a multi-
`processor system employing uniform memory access across
`common shared memory with one or more MAPs 112
`located in the memory subsystem, or memory space. As
`previously described, each MAP 112 contains at least one
`relatively large FPGA 134 that is used as a reconfigurable
`functional unit.
`In addition, a control block 132 and a
`preprogrammed or dynamically programmable configura-
`tion read-only memory (“ROM” as will be more fully
`described hereinafter) contains the information needed by
`the reconfigurable MAP assembly 112 to enable it to per-
`form a specific algorithm. It is also possible for the user to
`directly download a new configuration into the FPGA 134
`under program control, although in someinstances this may
`consume a number of memory accesses and mightresult in
`an overall decrease in system performance if the algorithm
`was short-lived.
`
`FPGAs have particular advantages in the applica-
`[0021]
`tion shown for several reasons. First, commercially avail-
`able, off-the-shelf FPGAs now contain sufficient
`internal
`logic cells to perform meaningful computational functions.
`Secondly, they can operate at speeds comparable to micro-
`processors, which eliminates the need for speed matching
`buffers. Still
`further,
`the internal programmable routing
`resources of FPGAs are now extensive enough that mean-
`ingful algorithms can now be programmedwithout the need
`to reassign the locations of the input/output (“I/O”) pins.
`
`[0022] By placing the MAP 112 in the memory subsystem
`or memory space, it can be readily accessed through the use
`of memory read and write commands, which allows the use
`of a variety of standard operating systems. In contrast, other
`conventional
`implementations propose placement of any
`reconfigurable logic in or near the processor. This is much
`less effective in a multiprocessor environment because only
`one processor has rapid access to it. Consequently, recon-
`figurable logic must be placed by every processor in a
`multiprocessor system, which increases the overall system
`cost. In addition, MAP 112 can access the memory array 130
`itself,
`referred to as Direct Memory Access (“DMA”),
`allowing it to execute tasks independently and asynchro-
`nously of the processor. In comparison, were it were placed
`near the processor,
`it would have to compete with the
`processors for system routing resources in order to access
`memory, which deleteriously impacts processor perfor-
`mance. Because MAP112 has DMAcapability, (allowingit
`to write to memory), and because it receives its operands via
`writes to memory,il is possible to allow a MAP 112to feed
`results to another MAP 112. This is a very powerful feature
`
`Petitioner Microsoft Corporation - Ex. 1022, p. 2
`Petitioner Microsoft Corporation - Ex. 1022, p. 2
`
`

`

`US 2001/0014937 Al
`
`Aug. 16, 2001
`
`[0027] The command decoder 150 also supplies a five bit
`that allows for very extensive pipelining and parallelizing of
`control signal to a configuration multiplexer (*MUX”) 170
`large tasks, which permits them to complete faster.
`as shown. The configuration mux 170 receives a single bit
`[0023] Many of the algorithms that may be implemented
`output of a 256 bit parallel-serial converter 172 on line 176.
`will receive an operand and require many clock cycles to
`The inputs of the 256 bit parallel-to-serial converter 172 are
`produce a result. One such example may be a multiplication
`coupled to a 256 bit user configuration pattern bus 174. The
`that takes 64 clock cycles. This same multiplication may
`configuration mux 170 alsoreceives sixteen single bit inputs
`also need to be performed on thousands of operands.In this
`from the configuration ROMs(illustrated as ROM 182) on
`situation,
`the
`incoming operands would be presented
`bus 178 and providesasingle bit configuration file signal on
`sequentially so that while the first operand requires 64 clock
`line 180 to the user FPGA 134 as selected by the control
`cycles to produce results at the output, the second operand,
`signals from the command decoder 150 on the bus 168.
`arriving one clock cycle later at the input, will show results
`one clock cyclelater at the output. Thus, after an initial delay
`of 64 clock cycles, new output data will appear on every
`consecutive clock cycle until the results of the last operand
`appears. This is called “pipelining”.
`[0024]
`Ina multiprocessor system, it is quite common for
`the operating system to stop a processor in the middle ofa
`task, reassign it to a higher priority task, and then return it,
`or another, to complete the initial task. When this is com-
`bined with a pipelined algorithm, a problem arises (if the
`processor stops issuing operands in the middle of a list and
`stops accepting results) with respect
`to operands already
`issued but not yet through the pipeline. To handle this issue,
`a solution involving the combination of software and hard-
`ware is disclosed herein.
`
`In operation, when a processor 108 is halted by the
`[0028]
`operating system,
`the operating system will
`issue a last
`operand command to the MAP 112 through the use of
`commandbits embedded in the address field on bus 128.
`This commandis recognized by the command decoder 150
`of the control block 132 and it initiates a hardware pipeline
`counter 158. When the algorithm wasinitially loaded into
`the FPGA 134, several output bits connected to the control
`block 132 were configured to display a binary representation
`of the number of clock cycles required to get through its
`pipeline (i.e. pipeline “depth”)on bus 136 input
`to the
`equality comparitor 160. Afier receiving the last operand
`command, the pipeline counter 158 in the control block 132
`counts clock cycles until its count equals the pipeline depth
`for that particular algorithm. At
`that point,
`the equality
`comparitor 160 in the control block 132 de-asserts a busy bit
`on line 164 in an internal group ofstatus registers 152. After
`issuing the last operand signal,
`the processor 108 will
`repeatedly read the status registers 152 and accept any
`output data on bus 166. When the busy flag is de-asserted,
`the task can be stopped and the MAP 112 utilized for a
`different task. It should be noted that it is also possible to
`leave the MAP 112 configured, transfer the program to a
`different processor 108 and restart the task whereit left off.
`
`[0025] To make use of any type of conventional reconfig-
`urable hardware, the programmer could embed the neces-
`sary commandsin his application program code. The draw-
`back to this approach is that a program would then have to
`be tailored to be specific to the MAP hardware. The system
`of the present invention eliminates this problem, Multipro-
`cessor computers often use software called parallelizers. The
`purpose of this software is to analyze the user’s application
`code and determine how best
`to split
`it up among the
`processors. The present
`invention provides significant
`advantages over a conventional parallelizer and enables it to
`recognize portions ofthe user code that represent algorithms
`that exist in MAPs 112 for that system and to then treat the
`MAP112 as another computing element. The parallelizer
`then automatically generates the necessary code to utilize
`the MAP 112. This allows the user to write the algorithm
`directly in his code, allowing it
`to be more portable and
`reducing the knowledge ofthe system hardware that he has
`to have to utilize the MAP 112.
`
`[0026] With reference additionally now to FIG. 4, a block
`diagram of the MAPcontrol block 132 is shownin greater
`detail. The control block 132 is coupled to receive a number
`of command bits (for example, 17) from the address bus 128
`ata command decoder 150. The command decoder 150 then
`supplies a number ofregister control bits to a group ofstatus
`registers 152 on an eight bit bus 154. The command decoder
`150 also supplies a single bit last operand flag on line 156
`to a pipeline counter 158. The pipeline counter 158 supplies
`an cight bit output to an equality comparator 160 on bus 162.
`The equality comparitor 160also receives an eight bit signal
`from the FPGA 134 on bus 136 indicative of the pipeline
`depth. When the equality comparitor determines that
`the
`pipeline is empty, it provides a single bit pipeline empty flag
`on line 164 for input to the status registers 152. The status
`registers are also coupled to receive an eight bit status signal
`from the FPGA 134 on bus 136 andit produces a sixty four
`bit status word output on bus 166 in response to the signals
`on bus 136, 154 and line 164.
`
`In order to evaluate the effectiveness ofthe use of
`[0029]
`the MAP 112 in a given application, some form offeedback
`to the use is required. Therefore,
`the MAP 112 may be
`equipped with internal registers in the control block 132 that
`allow it
`to monitor efficiency related factors such as the
`number of input operands versus output data, the number of
`idle cycles over time and the number of system monitor
`interrupts received over time. One of the advantages that the
`MAP 112 hasis that because of its reconfigurable nature, the
`actual function and type of function that are monitored can
`also changeas the algorithm changes. This provides the user
`with an almost infinite number of possible monitored factors
`without having to monitor all factors all of the time.
`
`[0030] While there have been described above the prin-
`ciples of the present invention in conjunction with a specific
`multiprocessor architecture it is to be clearly understood that
`the foregoing description is made only by way of example
`and not as a limitation to the scope of the invention.
`Particularly, it is recognized that the teachings of the fore-
`going disclosure will suggest other modifications to those
`persons skilled in the relevant art. Such modifications may
`involve other features which are already known per se and
`which may be used instead c. or in addition to features
`already described herein. Although claims have been for-
`mulated in this application to particular combinations of
`features,
`it should be understood that
`the scope of the
`disclosure herein also includes any novel feature or any
`novel combination offeatures disclosed either explicitly or
`
`Petitioner Microsoft Corporation - Ex. 1022, p. 3
`Petitioner Microsoft Corporation - Ex. 1022, p. 3
`
`

`

`US 2001/0014937 Al
`
`Aug. 16, 2001
`
`implicitly or any generalization or modification thereof
`which would be apparent to persons skilled in the relevant
`art, whether or not such relates to the same invention as
`presently claimed in any claim and whether or not it miti-
`gatesany orall of the same technical problems as confronted
`by the present invention. The applicants hereby reserve the
`right to formulate new claims to such features and/or com-
`binations of such features during the prosecution of the
`present application or of any further application derived
`therefrom.
`
`What is claimed is:
`1. A computer including at least one data processor for
`operating on user data in accordance with program instruc-
`tions, said computer further including at least one memory
`array presenting a data and address bus, said computer
`comprising:
`
`a memory algorithm processor associated with said
`memory array and coupled to said data and address
`buses, said memory algorithm processor being config-
`urable to perform atleast one identified algorithm on an
`operand received from a write operation to said
`memory array.
`2. The computer of claim 1 wherein said memory algo-
`rithm processor comprises a field programmable gate array.
`3. The computer of claim 1 wherein said memory algo-
`rithm processor is operative to access said memory array
`independently of said processor.
`least one
`4. The computer of claim 1 wherein said at
`identified algorithm is preprogrammed into said memory
`algorithm processor.
`5. The computer of claim 4 wherein said at least one
`identified algorithm is preprogrammed into a memory
`device associated with said memory algorithm processor.
`6. The computer of claim 5 wherein said memory device
`comprises at least one read only memory device.
`7. The computer of claim 1
`further comprising a first
`plurality of said data processors and a second plurality of
`said memoryarrays, each of said memory arrays comprising
`an associated memory algorithm processor.
`8. The computer of claim 7 wherein a memory algorithm
`processor associated with a first one of said second plurality
`of said memory arrays Is operative to pass a result of a
`processed operand to another memory algorithm processor
`associated with a second one ofsaid secondplurality of said
`memory arrays.
`9. The computer of claim 1 wherein said memory algo-
`rithm processor further comprises:
`
`a contro] block including a command decoder coupled to
`said address bus and a counter coupled to said com-
`mand decoder, said command decoder for providing a
`last operand flag to said counter in response to a last
`operand command from an operating system ofsaid at
`least one processor.
`10. The computer of claim 9 wherein said memory
`algorithm processor further comprises:
`
`an equality comparator coupled to receive a pipeline
`depth signal and an output of said counter for providing
`a pipeline empty flag to at least one status register.
`11. The computer of claim 10 wherein said status register
`is coupled to said command decoder to receive a register
`control signal and a status signal to provide a status word
`output signal.
`
`12. A multiprocessor computer including a first plurality
`of data processors for operating on user data in accordance
`with program instructions anda second plurality of memory
`arrays, each presenting a data and address bus, said com-
`puter comprising:
`
`a memory algorithm processor associated with at least one
`of said second plurality of memory arrays and coupled
`io said data and address bus thereof, said memory
`algorithm processor being configurable to perform at
`least one identified algorithm on an operand received
`from a write operation to said associated one of said
`second plurality of memory arrays.
`13. The multiprocessor computer of claim 12 wherein said
`memory algorithm processor associated with one ofsaid
`second plurality of memory arrays is accessible by more
`than one of saidfirst plurality of data processors.
`14. The multiprocessor computer of claim 13 wherein said
`memory algorithm processor associated with one of said
`secondplurality of memory arrays is accessible by all of said
`first plurality of data processors.
`15. The multiprocessor computer of claim 12 wherein said
`memory algorithm processor comprises:
`
`a control block operative to provide a last operand flag in
`response to a last operand having been processed in
`said memory algorithm processor.
`16. The multiprocessor computer of claim 12 further
`comprising:
`
`at least one memory device associated with said memory
`algorithm processor for storing a numberof pre-loaded
`algorithms.
`17. The multiprocessor computer of claim 16 wherein said
`at least one memory device is responsive to a predetermined
`command to enable a selected one of said number of
`pre-loaded algorithms to be implemented by said memory
`algorithm processor.
`18. The multiprocessor computer of claim 16 wherein said
`at least one memory device comprisesat least one read only
`memory device.
`19. The multiprocessor computer of claim 12 wherein said
`memory algorithm processor comprises a field program-
`mable gate array.
`20. The multiprocessor computer of claim 12 wherein said
`memory algorithm processor is accessible through normal
`memory access protocol.
`21. The multiprocessor computer of claim 12 wherein said
`memory algorithm processor has direct memory access
`capability to said associated one of said second plurality of
`memory arrays.
`22. The multiprocessor computer of claim 12 wherein a
`memory algorithm processor associated with a first one of
`said second plurality of said memory arrays is operative to
`pass a result of a processed operand to another memory
`algorithm processor associated with a second one ofsaid
`second plurality of said memory arrays.
`23. The multiprocessor computer of claim 12 wherein said
`computer
`is operative to automatically detect parallel
`regions of application program code that are capable of
`being executed in said memory algorithm processor.
`24. The multiprocessor computer of claim 23 wherein said
`memory algorithm processor is configurable by said appli-
`cation program code.
`
`Petitioner Microsoft Corporation - Ex. 1022, p. 4
`Petitioner Microsoft Corporation - Ex. 1022, p. 4
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket