throbber
Homayoun
`
`Reference 18
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2130, p. 1
`
`

`

`US007856545B2
`
`(12) United States Patent
`Casselman
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7.856,545 B2
`Dec. 21, 2010
`
`(54) FPGA CO-PROCESSOR FOR ACCELERATED
`COMPUTATION
`
`2005, OO27970 A1
`2008, 0028.187 A1
`
`2/2005 Arnold
`1/2008 Casselman et al.
`
`(75) Inventor: Steven Casselman, Sunnyvale, CA (US)
`
`(73) Assignee: DRC Computer Corporation,
`Sunnyvale, CA (US)
`
`(*) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 727 days.
`
`(21) Appl. No.: 11/829,801
`(22) Filed:
`Jul. 27, 2007
`O
`O
`Prior Publication Data
`US 2008/OO2818.6 A1
`Jan. 31, 2008
`
`(65)
`
`Related U.S. Application Data
`(60) Provisional application No. 60/820,730, filed on Jul.
`28, 2006.
`
`(51) Int. Cl.
`(2006.01)
`G06F 5/00
`(2006.01)
`G06F 5/76
`(52) U.S. Cl. ........................................................ 712/34
`(58) Field of Classification Search .................... 71.2/34
`See application file for complete search history.
`References Cited
`
`(56)
`
`U.S. PATENT DOCUMENTS
`6.961,841 B2 * 1 1/2005 Huppenthal et al. ........... T12/34
`7.210,022 B2 * 4/2007 Jungcket al. ................. T12/34
`7,386,704 B2 * 6/2008 Schulz et al. ................. 712/15
`2004,025 0046 A1
`12/2004 Gonzalez
`
`
`
`FOREIGN PATENT DOCUMENTS
`WO PCT/US2007/074660
`T 2008
`WOWO PCT/US 2007/
`T 2008
`O74661
`WO WO 2008/O14493 A3 10, 2008
`
`OTHER PUBLICATIONS
`Blume et al.; Integration of High-Performance ASICs into
`Reconfigurable Systems Providing Additional Multimedia Function
`ality: 2000; IEEE.*
`
`(Continued)
`Primary Examiner Eddie P Chan
`Assistant Examiner Corey Faherty
`(74) Attorney, Agent, or Firm The Webostad Firm
`
`ABSTRACT
`(57)
`A co-processor module for accelerating computational per
`formance includes a Field Programmable Gate Array
`(“FPGA') and a Programmable Logic Device (“PLD)
`coupled to the FPGA and configured to control start-up con
`figuration of the FPGA. A non-volatile memory is coupled to
`the PLD and configured to store a start-up bitstream for the
`start-up configuration of the FPGA. A mechanical and elec
`trical interface is for being plugged into a microprocessor
`socket of a motherboard for direct communication with at
`least one microprocessor capable of being coupled to the
`motherboard. After completion of a start-up cycle, the FPGA
`is configured for direct communication with the at least one
`microprocessor via a microprocessorbus to which the micro
`processor Socket is coupled.
`
`14 Claims, 6 Drawing Sheets
`
`Processor
`
`10.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2130, p. 2
`
`

`

`US 7.856,545 B2
`Page 2
`
`OTHER PUBLICATIONS
`Fawcett: Taking Advantage of Reconfigurable Logic; 1994; IEEE.*
`Blume etal. Integration of High-Performance ASICs Into
`Reconfigurable Systems Providing Additional Multimedia Function
`ality; 2000; IEEE.
`Letter from Jody Bishop to Steve Casselman dated Nov. 26, 2008.
`Letter from Jody Bishop to Steve Casselman dated Dec. 16, 2008.
`Maya Gokhale & Paul S. Graham, Reconfigurable Computing:
`Accelerating Computation with Field-Programmable Gate Arrays,
`2003, pp. 4-5. The Netherlands, Springer Pub.
`
`Jeffrey M. Arnold, Duncan A.Buell, Dzung T. Hoang, Daniel V.
`Pryor, Nabeel Shirazi, Mark R. Thistle, The Splash 2 Processor and
`Applications, 1993, pp. 282-285, IEEE.
`XSA Board V1.1, V1.2 User Manual, Jun. 23, 2005, XESS Corpo
`ration.
`XSA-50 Spartan-2 Prototyping Board with 2.5V, 50,000-gate FPGA.
`1998-2008, XESS, from http:...www.xess.com/prod027.php3.
`
`* cited by examiner
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2130, p. 3
`
`

`

`U.S. Patent
`U.S. Patent
`
`Dec. 21, 2010
`Dec. 21, 2010
`
`Sheet 1 of 6
`Sheet 1 of 6
`
`US 7,856,545 B2
`US 7.856,545 B2
`
`
`
`/00
`
`104
`
`298
`V_i
`
`imininrn
`
`299
`
`7 100
`
`24f. 1
`
`102
`
`105-^
`
`/1°
`1012
`
`1103
`
`FIG. 1
`FIG. 1
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2130, p. 4
`
`

`

`U.S. Patent
`U.S. Patent
`
`Dec. 21, 2010
`Dec. 21, 2010
`
`Sheet 2 of 6
`Sheet 2 of 6
`
`US 7,856,545 B2
`US 7.856,545 B2
`
`
`
`
`
`SRAM
`
`202
`
`214
`
`PLD
`
`203
`
`213
`
`FPGA
`
`212
`
`FLASH
`
`203
`
`200
`
`201
`
`211
`
`102
`
`210
`
`DRAM
`
`104
`
`Processor
`ProceSSOr
`
`m
`101
`
`10
`
`FIG. 2
`FG.2
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2130, p. 5
`
`

`

`U.S. Patent
`U.S. Patent
`
`Dec. 21, 2010
`Dec. 21, 2010
`
`Sheet 3 of 6
`Sheet 3 of 6
`
`US 7,856,545 B2
`US 7.856,545 B2
`
`
`
`201
`
`300
`
`Dynamic RAM Interface
`Dynamic RAM Interface
`302
`302
`
`305
`
`Hypertransport
`Hypertransport
`Interface
`Interface
`
`User
`Logic
`306
`
`301
`
`DMA and
`DMA and
`Arbitration
`Arbitration
`304
`304
`
`Static RAM
`Static RAM
`Interface
`Interface
`
`303
`
`FIG. 3
`FIG. 3
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2130, p. 6
`
`

`

`U.S. Patent
`U.S. Patent
`
`Dec. 21, 2010
`Dec. 21, 2010
`
`Sheet 4 of 6
`Sheet 4 of 6
`
`US 7,856,545 B2
`US 7.856,545 B2
`
`400
`
`
`
`200
`
`v 401
`
`402
`402
`
`FIG. 4
`FIG. 4
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2130, p. 7
`
`

`

`U.S. Patent
`
`Dec. 21, 2010
`
`Sheet 5 of 6
`
`US 7.856,545 B2
`
`Main processor Writes new
`bitstream to module
`
`Module saves bitstream in
`SRAM Or DRAM
`
`Main processor gives
`bitstream address and "reprogram"
`Command to module
`50
`
`Full Reprogram
`
`Partial Reprogram
`
`
`
`PLD reprograms FPGA using
`configuration pins
`
`50
`
`Internal memory interface reads
`bitstream and passes it to
`internal configuration logic
`
`05
`
`FIG. 5
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2130, p. 8
`
`

`

`U.S. Patent
`U.S. Patent
`
`
`
`Dec. 21, 2010
`Dec. 21, 2010
`
`Sheet 6 of 6
`Sheet 6 of 6
`
`US 7,856,545 B2
`US 7.856,545 B2
`
`Translate C description to HDL
`Translate C description to HDL
`
`Generate Constraints
`Generate Constraints
`
`Synthesize HDL to FPGA netlist
`Synthesize HDL to FPGA netlist
`
`Combine with wrapper logic
`Combine with wrapper logic
`
`i r.
`Place & Route. FPGA
`Place & Route FPGA
`
`Generate FPGA bistream
`Generate FPGA bistream
`
`601
`
`602
`
`603
`
`604
`
`605
`
`606
`
`FIG. 6
`FIG. 6
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2130, p. 9
`
`

`

`1.
`FPGA CO-PROCESSOR FOR ACCELERATED
`COMPUTATION
`
`This application claims benefit to U.S. provisional patent
`application No. 60/820,730, entitled “FPGA Co-Processor
`for Accelerated Computation.” filed Jul. 28, 2006, which is
`herein incorporated by reference in its entirety.
`
`FIELD
`
`One or more embodiments generally relate to accelerators
`and, more particularly, to a co-processor module including a
`Field Programmable Gate Array (“FPGA').
`
`BACKGROUND
`
`10
`
`15
`
`Co-processors have often been used to accelerate compu
`tational performance. For example, early microprocessors
`were unable to include floating-point computation circuitry
`due to chip area limitations. Doing floating-point computa
`tions in software is extremely slow so this circuitry was often
`placed in a second chip which was activated whenever a
`floating-point computation was required. As chip technology
`improved, the microprocessor chip and the floating-point co
`processor chip were combined together.
`25
`A similar situation occurs today with specialized compu
`tational algorithms. Standard microprocessors do not include
`circuitry for performing these algorithms because they are
`often specific to only a few users. By using an FPGA (field
`programmable gate-array) as a co-processor, an algorithm
`can be designed and programmed into hardware to build a
`circuit that is unique for each application, resulting in a sig
`nificant acceleration of the desired computation.
`
`30
`
`SUMMARY
`
`35
`
`40
`
`One or more embodiments generally relate to accelerators
`and, more particularly, to a co-processor module including a
`Field Programmable Gate Array (“FPGA').
`A co-processor module for accelerating computational
`performance includes a Field Programmable Gate Array
`(“FPGA') and a Programmable Logic Device (“PLD)
`coupled to the FPGA and configured to control start-up con
`figuration of the FPGA. A non-volatile memory is coupled to
`the PLD and configured to store a start-up bitstream for the
`start-up configuration of the FPGA. A mechanical and elec
`trical interface is for being plugged into a microprocessor
`socket of a motherboard for direct communication with at
`least one microprocessor capable of being coupled to the
`motherboard. After completion of a start-up cycle, the FPGA
`50
`is configured for direct communication with the at least one
`microprocessor via a microprocessorbus to which the micro
`processor Socket is coupled.
`
`45
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`55
`
`Accompanying drawing(s) show exemplary embodi
`ment(s) in accordance with one or more embodiments; how
`ever, the accompanying drawing(s) should not be taken to
`limit the invention to the embodiment(s) shown, but are for
`explanation and understanding only.
`FIG. 1 is a diagram of an exemplary co-processor module
`which may be coupled to a motherboard with two processor
`Sockets, according to one embodiment.
`FIG. 2 is a block diagram of an exemplary co-processor
`module, including major components and busses, according
`to one embodiment.
`
`60
`
`65
`
`US 7,856,545 B2
`
`2
`FIG.3 is a block diagram of an exemplary layout of internal
`functions of the co-processor FPGA, according to one
`embodiment.
`FIG. 4 is a diagram of an exemplary expanded co-processor
`module with a daughter card containing additional logic func
`tions, according to one embodiment.
`FIG.5 is a flowchart showing a method for partially or fully
`reprogramming a co-processor module from SRAM, accord
`ing to one embodiment.
`FIG. 6 is a flowchart showing a method for creating co
`processor configuration to accelerate a specific algorithm,
`according to one embodiment.
`
`DETAILED DESCRIPTION
`
`In the following description, numerous specific details are
`set forth to provide a more thorough description of the spe
`cific embodiments of the invention. It should be apparent,
`however, to one skilled in the art, that the invention may be
`practiced without all the specific details given below. In other
`instances, well-known features have not been described in
`detail so as not to obscure the invention. For ease of illustra
`tion, the same number labels are used in different diagrams to
`refer to the same items; however, in alternative embodiments
`the items may be different. Furthermore, although particular
`integrated circuit parts are described herein for purposes of
`clarity by way of example, it should be understood that the
`Scope of the description is not limited to these particular
`numerical examples as other integrated circuit parts may be
`used.
`A multi-processor System consists of several processing
`chips connected to each other by high-speed busses. By
`replacing one or more of these processor chips by applica
`tion-specific co-processors, it is often possible to obtain a
`significant acceleration in computational speed. Each co-pro
`cessor sits in the motherboard socket designed for a standard
`processor and makes use of motherboard resources.
`According to one embodiment, the co-processor FPGA is
`located on a module which plugs into a standard micropro
`cessor socket. Motherboards are commonly available which
`have multiple microprocessor Sockets, allowing one or more
`standard microprocessors to co-exist with one or more co
`processor modules. Thus, no changes to the motherboard or
`other system hardware are required, making it easy to build
`co-processor systems. The co-processor has access to moth
`erboard resources including large amounts of memory. These
`resources need not be duplicated on the co-processor module,
`reducing the cost, size and power requirements for the co
`processor. The co-processor is connected to the main proces
`Sor by one or more high-speed low-latency busses. Many
`algorithms require frequent communication between the
`main microprocessor and the co-processor, making this inter
`face a factor in achieving high performance.
`According to another embodiment, to accelerate computa
`tional algorithms, a co-processor module is included which
`plugs into a standard microprocessor Socket on a mother
`board and communicates with the microprocessor by one or
`more high-speed, low-latency busses. The co-processor has
`access to motherboard resources through the microprocessor
`socket. The co-processor includes an FPGA which is recon
`figurable and may be loaded with a new configuration pattern
`suitable for a different algorithm under control of the micro
`processor. The configuration pattern is developed using a set
`of software tools. The co-processor module capabilities may
`be extended by adding additional piggyback cards.
`An another embodiment is an accelerator module, includ
`ing an FPGA and a Programmable Logic Device (“PLD')
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2130, p. 10
`
`

`

`US 7,856,545 B2
`
`10
`
`15
`
`30
`
`3
`coupled to the FPGA and configured to control start-up con
`figuration of the FPGA. A non-volatile memory is coupled to
`the PLD and configured to store a start-up bitstream for the
`start-up configuration of the FPGA. A mechanical and elec
`trical interface is configured for being plugged into a micro
`processor Socket of a motherboard for direct communication
`with at least one microprocessor capable of being coupled to
`the motherboard. After completion of a start-up cycle, the
`FPGA is configured for direct communication with the at
`least one microprocessor via a microprocessor bus to which
`the microprocessor Socket is coupled.
`Another embodiment generally is an accelerator system,
`comprising a first motherboard having accelerator modules
`and a second motherboard having at least one microproces
`sor. Each of the accelerator modules includes an FPGA and a
`Programmable Logic Device (“PLD) coupled to the FPGA
`and configured to control start-up configuration of the FPGA.
`A non-volatile memory is coupled to the PLD and configured
`to store a start-up bitstream for the start-up configuration of
`the FPGA. A mechanical and electrical interface is configured
`for being plugged into a microprocessor Socket of the first
`motherboard for direct communication as between the accel
`erator modules. The microprocessor Socket is coupled to a
`microprocessor bus for the direct communication between
`the accelerator modules.
`25
`Yet another embodiment generally is a method for co
`processing. An accelerator module is coupled to a micropro
`cessorbus, the accelerator module including a Field Program
`mable Gate Array (“FPGA). A microprocessorbus interface
`bitstream is loaded into the FPGA to program programmable
`logic thereof. Data is transferred to first memory of the accel
`erator module via a microprocessor bus using a microproces
`sor bus interface instantiated in the FPGA responsive to the
`microprocessor bus interface bitstream. A default configura
`tion bitstream stored in the first memory is instantiated in the
`FPGA to configure the FPGA to have the microprocessor bus
`interface with sufficient functionality to be recognized by a
`microprocessor coupled to the microprocessor bus.
`Still yet another embodiment generally is another method
`for co-processing. An accelerator module, which includes a
`Field Programmable Gate Array (“FPGA) and first memory,
`is coupled to a microprocessor bus. The first memory has a
`default configuration bitstream stored therein. The default
`configuration bitstream is loaded into the FPGA to program
`programmable logic thereof. The default configuration bit
`stream includes a microprocessorbus interface. The FPGA is
`configured with the default configuration bitstream with suf
`ficient functionality to be recognized by a microprocessor
`coupled to the microprocessor bus.
`Referring to FIG. 1, a multiprocessor motherboard 10 is
`shown containing two processor chips 100 and 101 and
`DRAM modules 104 and 105. In one embodiment, the pro
`cessor chips are Opteron microprocessors available from
`Advanced Micro Devices (AMD) although processors avail
`able from other companies such as Intel could also be used. A
`55
`typical motherboard also contains many other components
`which are omitted here for clarity. In one embodiment, the
`K8SRE (S2891) motherboard from Tyan Computer Corpo
`ration is used although many other Suitable motherboards are
`available from this and other vendors. Motherboards are
`available with various numbers of processor chips 100, 101.
`Typically, a motherboard contains between one and eight
`processor chips. In one embodiment, a motherboard with
`Sockets for at least two processor chips is required. One or
`more processor chips 100,101 are removed and replaced with
`co-processor modules 200. If the motherboard contains more
`than two processor chips, several of them may be replaced
`
`35
`
`4
`with co-processor modules 200 providing that at least one
`processor chip remains on the motherboard.
`It is also possible to build high performance computing
`systems with multiple motherboards interconnected by high
`speed busses. In such a system, some of the motherboards
`may contain only co-processor modules while other mother
`boards contain only processor chips or a mixture of processor
`chips and co-processor modules. In such a multi-board sys
`tem, there must be at least one processor chip in order to
`communicate with one or more co-processor modules.
`Returning now to FIG. 1, processor chips 100, 101 are
`attached to motherboard 10 using sockets 102, 103 which
`allow them to be easily removed. Co-processor module 200
`has the same mechanical and electrical interface via circuit
`board 299 and pins 298 as processor chips 100, 101 allowing
`easy replacement with minimal or no changes to motherboard
`10. Motherboard 10 also contains memory modules 104
`which are normally coupled for communication with a pro
`cessor chip 100 plugged in socket 102. Memory modules 105
`are similarly coupled for communication with a processor
`chip 101 plugged in socket 103. When processor chip 100 is
`replaced by co-processor 200, co-processor 200 has access to
`memory modules 104.
`Referring now to FIG. 2, a block diagram of co-processor
`module 200 is shown in more detail, along with its connec
`tions to motherboard 10. Co-processor module 200 contains
`FPGA (field-programmable gate array) 201, SRAM (static
`random access memory) 202, PLD (programmable logic
`device) 203 and flash memory 204, along with other compo
`nents such as resistors, capacitors, buffers and oscillators
`which have been omitted for clarity. In one embodiment,
`FPGA 201 is an XC4VLX60FF668 available from Xilinx
`corporation although there are numerous FPGAs available
`from Xilinx and other vendors such as Altera which would
`also be suitable. SRAM 202 may be a IDT71T75602S2OBG
`from Integrated Device Technology corporation, PLD 203
`may be an EPM7256BUC169 from Altera corporation and
`flash memory 204 may be a TC58FVM5T2AXB65 from
`Toshiba corporation, according to one embodiment. In each
`case, there are numerous alternative components which could
`be used instead. FPGA201 is connected through bus 211 and
`socket 102 to the motherboard memory module 104. It is also
`connected through bus 210 and socket 102 to the remaining
`motherboard processor chip 101. In one embodiment, bus
`210 is a hypertransport bus. The hypertransport bus has high
`bandwidth and low latency characteristics for example with
`respect to availability to processor 101, although other busses
`such as PCI, PCI Express or RapidIO could be used instead
`with the appropriate motherboard components. The hyper
`transport bus, which is a point-to-point bus, also forms a
`direct connection between processor 101 and co-processor
`module 200 without passing through any intermediate chips
`or busses. This direct connection greatly improves through
`put and latency when transferring data to the co-processor.
`FPGA 201 also connects to SRAM 202 and PLD 203 via
`bus 214. PLD 203 additionally connects to flash memory 204
`via bus 213 and to FPGA 201 via programming signals 212.
`Referring now to FIG.3, the internal logic of FPGA201 is
`described. An FPGA is a device which may be programmed to
`perform various logical functions. FPGA 201 is reprogram
`mable so it may perform a first set of logical functions, then,
`after reprogramming, a second set of logical functions. This
`allows different algorithms to be programmed depending on
`the needs of a particular customer or application. The logical
`function of FPGA201 is divided into two portions. Customer
`specific algorithms are programmed into the user logic sec
`tion 306 of FPGA 201. In addition to user logic 306, the
`
`40
`
`45
`
`50
`
`60
`
`65
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2130, p. 11
`
`

`

`5
`FPGA includes a set of interface or support functions 300. In
`one embodiment, these support functions 300 are: a hyper
`transport interface 301, a DDR (double data-rate) DRAM
`(dynamic random-access memory) interface 302, a static
`RAM (random access memory) interface 303 and a DMA and
`arbitration function 304. These support functions 300 are
`connected to user logic 306 by standard wrapper interface
`305. The wrapper interface 305 is designed to present a con
`sistent view of support functions 300 so additional functions
`may be added or functions may be changed internally without
`the need to change user logic 306. The user logic portion of
`FPGA 201 may also be reprogrammed to represent different
`algorithms while the support functions 300 continue to oper
`ate. This is necessary since many functions such as hyper
`transport interface 301 and DDR memory interface 302 can
`not be interrupted without a long restart procedure.
`The physical size of module 200 is limited because of the
`need to fit into socket 102 without interfering with other
`components which may exist on motherboard 10. At the same
`time, it is desirable to be able to expand the functionality of
`module 200 to support various applications. Expanded func
`tionality may include, for example, additional memory or
`additional hypertransport interfaces. FIG. 4 shows how mod
`ule 200 may be expanded by adding a daughter card 400
`which includes additional components. The daughter card
`400 is attached to module 200 by connectors 401,402.
`Referring now to FIG. 5, the process of configuring FPGA
`201 on module 200 is described with renewed reference to
`FIGS. 1-3. When power is initially supplied or the processor
`reset signal is applied, FPGA 201 is programmed automati
`cally from flash memory 204. FPGA 201 may also be repro
`grammed automatically from flash memory 204 if it ceases to
`operate due to various conditions. Monitor logic is built into
`FPGA 201 and PLD 203 which checks for correct operation
`of FPGA201 and initiates reprogramming if it senses a fault
`condition. The programming and reprogramming processes
`are controlled by PLD 203. Xilinx and others supply logic
`circuits and detailed instructions for programming an FPGA
`from a flash memory. In order to initially program flash
`memory 204, a configuration pattern is loaded into FPGA 201
`using a JTAG connector on module 200. This configuration
`pattern is sufficient to operate hypertransport interface 301.
`Hypertransport interface 301 is then used to transfer data to
`flash memory 204 under control of PLD 203. Flash memory
`204 normally contains a default FPGA configuration for Sup
`port functions 300 that is sufficient to operate the hypertrans
`port interface 301, memory interfaces 302,303 and DMA and
`arbitration function 304 but does not include configuration
`information for user logic 306. PLD 203 is initially config
`ured using a JTAG (Joint Test Action Group standard 1149.1)
`connector on module 200. Alternatively, flash memory 204
`and PLD 203 may be initially loaded with a default configu
`ration before being soldered onto module 200. Flash memory
`204 and PLD 203 may be reloaded while FPGA 201 is oper
`ating, by transferring new data over hypertransport interface
`301. Flash memory 204 is intended to provide semi-perma
`nent storage for the default FPGA configuration and is
`changed infrequently. PLD 203 provides basic support func
`tions for module 200 and is also changed infrequently.
`Once the default configuration pattern (bitstream) is loaded
`into FPGA 201, module 200 becomes visible over the hyper
`transport bus to a main processor 101 in the system. At 501,
`the main processor transfers a new configuration pattern over
`hypertransport bus 210 for writing to FPGA 201 of module
`200. This new configuration pattern typically contains a user
`logic function 306 and may also contain new definitions for
`support functions 300. At 502, FPGA 201 of module 200
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 7,856,545 B2
`
`5
`
`10
`
`15
`
`6
`saves the new configuration pattern into either SRAM or
`DRAM using the memory interfaces 302 or 303. If full recon
`figuration of FPGA201 is planned, the configuration pattern
`must be saved into SRAM. DRAM cannot be used for full
`reconfiguration because the configuration data would be lost
`when DRAM interface 302 ceases to operate during the con
`figuration process. SRAM may be controlled using PLD 203
`instead of SRAM interface 303 in FPGA201 so the configu
`ration data is retained while FPGA201 is reprogrammed. The
`processors 501 and 502 may operate concurrently since the
`amount of data required to configure. FPGA201 may be very
`large. At 503, main processor 101 uses the hypertransport bus
`to send FPGA201 of module 200 the address of the configu
`ration pattern in SRAM or DRAM, along with a command to
`reprogram itself. A decision 506 is then made whether to do
`full or partial reconfiguration.
`During partial reconfiguration, support functions 300
`remain active and only enough data must be transferred over
`hypertransport bus 210 to configure user logic 306. This
`allows partial reconfiguration to be much faster than full
`reconfiguration, making partial reconfiguration the preferred
`alternative in most situations. Data for partial reconfiguration
`may be saved in either DRAM or SRAM. When module 201
`is used to accelerate computational algorithms, frequent
`reconfiguration is often necessary and reconfiguration time
`becomes a limiting factor in determining the amount of accel
`eration that may be obtained. Partial reconfiguration at 505
`involves FPGA 201 loading the reconfiguration data, where
`an internal memory interface of FPGA 201 is used to read a
`bitstream and pass it to user logic 306. After loading is com
`plete, new logic functions specified by the new configuration
`become active and may be used.
`If full reconfiguration is desired at 504 of FIG. 5 PLD 203
`takes over control of SRAM 202, erases FPGA 201 and
`transfers a complete new configuration pattern to FPGA 201.
`This is similar to initial programming except that the configu
`ration data comes from SRAM 202 instead of flash memory
`204
`With additional reference to FIG. 6, the process of gener
`ating user logic 306 is described. Co-processor module 200
`may accelerate computational algorithms. These algorithms
`are typically described in a computer language Such as C.
`Unfortunately, the C language is designed to execute on a
`sequential processor such as the Opteron from AMD or the
`Pentium from Intel. Using an FPGA co-processor directly to
`execute an algorithm described in the Clanguage would offer
`little or no acceleration since it would not utilize the primary
`advantages of the co-processor. The primary advantages of an
`FPGA co-processor compared to a sequential processor are a
`vast amount of parallelism and a potentially much higher
`memory bandwidth. In order to use the FPGA efficiently, the
`initial C description must be translated into a hardware
`description language (“HDL), such as VHDL or Verilog.
`This is shown in 601 of FIG. 6. Tools are available from
`companies such as Celoxica that do this translation. Addition
`ally, there are variations of the C language such as UPC
`(unified parallel C) in which some parallelism is made visible
`to the user. These dialects of C may be translated more effi
`ciently into FPGA co-processors.
`At 602, constraints are generated for the user design. These
`include both physical and timing constraints. Physical con
`straints are necessary to ensure that user logic 306 connects
`correctly and does not conflict with support functions 300.
`Timing constraints determine the operating speed of user
`logic 306 and prevent other potential timing problems such as
`race conditions.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2130, p. 12
`
`

`

`7
`At 603, user logic 306 is synthesized. Synthesis converts
`the design from an HDL description to a netlist of FPGA
`primitives. The Xilinx tool XST may be used.
`At 604, the user logic 306 is combined with the pre-de
`signed support functions 300. The support functions 300, as
`well as wrapper interface 305 associated therewith, have a
`pre-assigned fixed placement so they may be combined with
`arbitrary user logic without affecting operation of Support
`functions 300. Sections of the support functions 300 are very
`sensitive to timing and correct operation could not be guar
`anteed without fixing the placement.
`At 605, the design for instantiation in user logic 306 is
`placed and routed. Placement and routing is performed by the
`appropriate FPGA software tools. These are available from
`the FPGA vendor. Constraints generated at 602 guide the
`place and route 605 as well as synthesis 603 to ensure that the
`desired speed and functionality are achieved.
`At 606 a full or partial configuration pattern (or bitstream)
`for the FPGA is generated. This may be performed by a tool
`supplied by the FPGA vendor. The bitstream is then ready for
`download into co-processor FPGA 201.
`While the foregoing describes exemplary embodiment(s)
`in accordance with one or more embodiments, other and
`further embodiment(s) in accordance with the one or more
`embodiments may be devised without departing from the
`scope thereof, which is determined by the claim(s) that follow
`and equivalents thereof. Claim(s) listing steps do not imply
`any order of the steps. Trademarks are the property of their
`respective owners.
`What is claimed is:
`1. A method for co-processing, comprising:
`coupling an accelerator module to a microprocessor bus,
`the accelerator module including a Field Programmable
`Gate Array (“FPGA');
`loading a microprocessor bus interface bitstream into the
`FPGA to program programmable logic thereof;
`transferring data to first memory of the accelerator module
`via a microprocessor bus using a microprocessor bus
`interface instantiated in the FPGA responsive to the
`microprocessor bus interface bitstream;
`instantiating a default configuration bitstream stored in the
`first memory in the FPGA to configure the FPGA to have
`the microprocessor bus interface with sufficient func
`tionality to be recognized by a microprocessor coupled
`to the microprocessor bus; and
`communicating under control of the microprocessor a con
`figuration pattern to second memory of the accelerator
`module using a first memory interface instantiated in the
`FPGA responsive to the instantiating of the default con
`figuration bitstream.
`2. The method according to claim 1, wherein the loading is
`via a JTAG interface of the FPGA.
`3. The method according to claim 2, wherein the loading
`and the transferring are under control of a Programmable
`Logic Device (“PLD), the PLD being included as part of the
`accelerator module.
`
`8
`4. The method according to claim 1, further comprising
`sending control information from the microprocessor to the
`FPGA to indicate location of the configuration pattern in the
`second memory for instantiation in user programmable logic
`of the FPGA.
`5. The method according to claim 4, wherein the control
`information is for partial reconfiguration of the user program
`mable logic of the FPGA.
`6. The method according to claim 5, wherein the first
`memory is flash memory; and wherein the second memory is
`either Static Random Access Memory (“SRAM) or
`Dynamic Random Access Memory (“DRAM).
`7. The method according to claim 1, wherein the micro
`processor bus interface bitstream and the default configura
`tion bitstream are instantiated in the FPGA with pre-assigned
`fixed placement.
`8. A method for co-processing, comprising:
`coupling an accelerator module to a microprocessor bus,
`the accelerator module including a Field Programmable
`Gate Array (“FPGA) and first memory, the first
`memory having a default configuration bitstream stored
`therein;
`loading the default configuration bitstream into the FPGA
`to program programmable logic thereof, the default con
`figuration bitstream including a microprocessor bus
`interface; and
`configuring the FPGA with the default configuration bit
`stream with sufficient functionality to be recogniz

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket