throbber
El-Ghazawi
`
`Reference [BUEL07]
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2168, p. 1
`
`

`

`G U E S T
`
`E D I T O R S ’
`
`I N T R O D U C T I O N
`
`High-performance reconfigurable
`computers have the potential to exploit
`coarse-grained functional parallelism as
`well as fine-grained instruction-level
`parallelism through direct hardware
`execution on FPGAs.
`
`High-
`Performance
`Reconfigurable
`Computing
`
`Duncan Buell, University of South Carolina
`Tarek El-Ghazawi, George Washington University
`Kris Gaj, George Mason University
`Volodymyr Kindratenko, University of Illinois at
`Urbana-Champaign
`
`H igh-performance reconfigurable computers
`
`(HPRCs)1,2 based on conventional processors
`and field-programmable gate arrays (FPGAs)3
`have been gaining the attention of the high-per-
`formance computing community in the past
`few years.4 These synergistic systems have the potential
`to exploit coarse-grained functional parallelism as well
`as fine-grained instruction-level parallelism through
`direct hardware execution on FPGAs.
`HPRCs, also known as reconfigurable supercom-
`puters, have shown orders-of-magnitude improvement
`in performance, power, size, and cost over conventional
`high-performance computers (HPCs) in some compute-
`intensive integer applications. However, they still have
`not achieved high performance gains in most general
`scientific applications. Programming HPRCs is still not
`straightforward and, depending on the programming
`tool, can range from designing hardware to software
`programming that requires substantial hardware
`knowledge.
`The development of HPRCs has made substantial
`progress in the past several years, and nearly all major
`high-performance computing vendors now have HPRC
`product lines. This reflects a clear belief that HPRCs
`
`0018-9162/07/$25.00 © 2007 IEEE
`
`P u b l i s h e d b y t h e I E E E C o m p u t e r S o c i e t y
`
`March 2007
`
`23
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2168, p. 2
`
`

`

`have tremendous potential and that resolving all remain-
`ing issues is just a matter of time.
`This special issue will shed some light on the state
`of the field of high-performance reconfigurable
`computing.
`
`WHAT ARE HIGH-PERFORMANCE
`RECONFIGURABLE COMPUTERS?
`HPRCs are parallel computing systems that contain
`multiple microprocessors and multiple FPGAs. In cur-
`rent settings, the design uses FPGAs as coprocessors that
`are deployed to execute the small
`portion of the application that takes
`most of the time—under the 10-90
`rule, the 10 percent of code that
`takes 90 percent of the execution
`time. FPGAs can certainly accom-
`plish this when computations lend
`themselves to implementation in
`hardware, subject to the limitations
`of the current FPGA chip architec-
`tures and the overall system data
`transfer constraints.
`In theory, any hardware reconfigurable devices that
`change their configurations under the control of a pro-
`gram can replace the FPGAs to satisfy the same key con-
`cepts behind this class of architectures. FPGAs, however,
`are the currently available technology that provides the
`most desirable level of hardware reconfigurability. Xilinx,
`followed by Altera, dominates the FPGA market, but
`new startups are also beginning to enter this market.
`FPGAs are based on SRAM, but they vary in struc-
`ture. Figure A in the “FPGA Architecture” sidebar
`shows an FPGA’s internal structure based on the Xilinx
`architecture style. The configurable logic block (CLB) is
`the basic building block for creating logic. It includes
`RAM used as a lookup table and flip-flops for buffer-
`ing, as well as multiplexers and carry logic. A side-by-
`side 2D array of switching matrices for programmable
`routing connects the 2D array of CLBs.
`
`PROGRESS IN SYSTEM HARDWARE AND
`PROGRAMMING SOFTWARE
`During the past few years, many hardware systems
`have begun to resemble parallel computers. When such
`systems originally appeared, they were not designed to
`be scalable—they were merely a single board of one or
`more FPGA devices connected to a single board of one
`or more microprocessors via the microprocessor bus or
`the memory interface.
`The recent SRC-6 and SRC-7 parallel architectures
`from SRC Computers use a crossbar switch that can be
`stacked for further scalability. In addition, traditional
`high-performance computing vendors—specifically,
`Silicon Graphics Inc. (SGI), Cray, and Linux Networx—
`have incorporated FPGAs into their parallel architec-
`
`24
`
`Computer
`
`HPRCs are
`parallel computing
`systems that contain
`multiple
`microprocessors
`and multiple FPGAs.
`
`tures. In addition to the SRC-7, models of such HPC
`systems include the SGI RASC RC100 and the Cray
`XD1 and XT4. The Linux Networx work focuses on
`the design of the acceleration boards and on coupling
`them with PC nodes for constructing clusters.
`On the software side, SRC Computers provides a
`semi-integrated solution that addresses the hardware
`(FPGA) and software (microprocessor) sides of the
`application separately. The hardware side is expressed
`using Carte C or Carte Fortran as a separate function,
`compiled separately and linked to the compiled C (or
`Fortran) software side to form one
`application.
`Other hardware vendors use a
`third-party software tool, such as
`Impulse C, Handel-C, Mitrion C, or
`DSPlogic’s RC Toolbox. However,
`these tools handle only the FPGA side
`of the application, and each machine
`has its own application interface to
`call those functions. At present,
`Mitrion C and Handel-C support the
`SGI RASC, while Mitrion C, Impulse
`C, and RC Toolbox support the Cray XD1. Only a
`library-based parallel tool such as the message-passing
`interface can handle scaling an application beyond one
`node in a parallel system.
`
`RESEARCH CHALLENGES AND
`THE EVOLVING HPRC COMMUNITY
`FPGAs were first introduced as glue logic and even-
`tually became popular in embedded systems. When
`FPGAs were applied to computing, they were introduced
`as a back-end processing engine that plugs into a CPU
`bus. The CPU in this case did not participate in the com-
`putation, but only served as the front end (host) to facil-
`itate working with the FPGA.
`The limitations of each of these scenarios left many
`issues that have not been explored, yet they are of great
`importance to HPRC and the scientific applications it
`targets. These issues include the need for programming
`tools that address the overall parallel architecture. Such
`tools must be able to exploit the synergism between
`hardware and software execution and should be able to
`understand and exploit the multiple granularities and
`localities in such architectures.
`The need for parallel and reconfigurable performance
`profiling and debugging tools also must be addressed.
`With the multiplicity of resources, operating system sup-
`port and middleware layers are needed to shield users
`from having to deal with the hardware’s intricate details.
`Further, application-portability issues should be thor-
`oughly investigated. In addition, new chip architectures
`that can address the floating-point requirements of sci-
`entific applications should be explored. Portable
`libraries that can support scientific applications must be
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2168, p. 3
`
`

`

`FPGA Architecture
`Ross Freeman, one of the founders of Xilinx (www.
`xilinx.com), invented field-programmable gate arrays
`in the mid-1980s. 1 Other current FPGA vendors
`include Altera (www.altera.com), Actel
`(www.actel.com), Lattice Semiconductor (www.
`latticesemi.com), and Atmel (www.atmel.com).
`As Figure A shows, an FPGA is a semiconductor
`device consisting of programmable logic elements,
`interconnects, and input/output (I/O) blocks
`(IOBs)—all runtime user-configurable—that allow
`implementing complex digital circuits. The IOBs
`form a ring around the outer edge of the microchip;
`each IOB provides individually selectable I/O access
`to one of the I/O pins on the exterior of the FPGA
`package. A rectangular array of logic blocks lies
`inside the IOB ring.
`A typical FPGA logic block consists of a four-input
`lookup table (LUT) and a flip-flop. Modern FPGA
`devices also include higher-level functionality
`embedded into the silicon, such as generic DSP
`blocks, high-speed IOBs, embedded memories, and
`embedded processors. Programmable interconnect
`wiring is implemented so that it’s possible to connect
`logic blocks to logic blocks and IOBs to logic blocks
`arbitrarily.
`A slice (using Xilinx terminology) or adaptive
`logic module (using Altera terminology), which
`contains a small set of basic building blocks—for
`example, two LUTs, two flip-flops, and some control
`logic—is the basic unit area when determining an
`FPGA-based design’s size. Configurable logic blocks
`(CLBs) consist of multiple slices. Modern FPGAs
`consist of tens of thousands of CLBs and a program-
`mable interconnection network arranged in a rec-
`tangular grid.
`Unlike a standard application-specific integrated
`circuit that performs a single specific function for a
`chip’s lifetime, an FPGA chip can be reprogrammed to
`perform a different function in a matter of microsec-
`onds. Typically, either source code written in a hard-
`ware description language, such as VHDL or Verilog, or
`a schematic design provides the functionality that an
`FPGA assumes at runtime.
`As Figure B shows, in the first step, a synthesis
`process generates a technology-mapped netlist. A
`map, place, and route process then fits the netlist
`to the actual FPGA architecture. The process gener-
`ates a bitstream—the final binary configuration
`file—that can be used to reconfigure the FPGA.
`Timing analysis, simulation, and other verification
`methodologies can validate the map, place, and
`route results.
`
`IOB
`
`IOB
`
`IOB
`
`IOB
`
`IOB
`
`IOB
`
`IOB
`
`CLB
`
`CLB
`
`CLB
`
`CLB
`
`CLB
`
`CLB
`
`IOB
`
`IOB
`
`CLB
`
`BRAM
`
`BRAM
`
`BRAM
`
`CLB
`
`IOB
`
`IOB
`
`CLB
`
`CLB
`
`CLB
`
`CLB
`
`CLB
`
`CLB
`
`IOB
`
`IOB
`
`CLB
`
`DSP
`
`DSP
`
`DSP
`
`CLB
`
`IOB
`
`IOB
`
`CLB
`
`CLB
`
`CLB
`
`CLB
`
`CLB
`
`CLB
`
`IOB
`
`IOB
`
`IOB
`
`IOB
`
`IOB
`
`IOB
`
`IOB
`
`Figure A. FPGA internal structure based on the Xilinx architecture
`style. An FPGA can be described as “islands” of (reconfigurable)
`logic in a “sea” of (reconfigurable) connectors.
`
`Algorithm
`
`Verification
`
`HDL
`implementation
`
`Functional
`simulation
`
`Synthesis
`Synthesis
`
`Netlist
`
`Implementation:
`map, place,
`and route
`
`Bitstream
`
`Postsynthesis
`simulation
`
`Timing
`simulation
`
`Figure B.Typical FPGA design flow.
`
`Reference
`1. S.M. Trimberger, ed., Field-Programmable Gate Array Tech-
`nology, Kluwer Academic, 1994.
`
`March 2007
`
`25
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2168, p. 4
`
`

`

`sought, and the need for more closely integrated micro-
`processor and FPGA architectures to facilitate the data-
`intensive hardware/software interactions should be
`further studied.
`As researchers pursue developments to meet a wide
`range of HPRC requirements, the failure to incorporate
`standardization into some of these efforts would be
`detrimental. It can be particularly useful if academia,
`industry, and government work together to create a com-
`munity that can approach these problems with the full
`intellectual intensity it deserves, subject to the needs of
`the end users and the experience of the implementers.
`Some of this community-forming has been already
`observed. On the one hand, OpenFPGA (www.openfpga.
`org) has recently been formed as a consortium that
`mainly pursues standardization. On
`the other, the NSF has recently
`granted to the University of Florida
`and George Washington University
`an Industry/University Center for
`High-Performance Reconfigurable
`Computing
`(http://chrec.ufl.edu)
`award. The center includes more
`than 20 industry and government
`members who will guide the univer-
`sity research projects.
`
`IN THIS ISSUE
`We have selected five articles for this special issue that
`represent the latest trends and developments in the
`HPRC field. The first two cover particularly important
`topics: a C-to-FPGA compiler and a library framework
`for code portability across different RC platforms. The
`third article describes an extensive collection of FPGA
`software development patterns, and the last two describe
`HPRC applications.
`In “Trident: From High-Level Language to Hardware
`Circuitry,” Justin Tripp, Maya Gokhale, and Kristopher
`Peterson describe an effort undertaken at the Los
`Alamos National Laboratory to build Trident, a high-
`level-language to hardware-description-language com-
`piler that translates C language programs to FPGA
`hardware circuits. While several such compilers are com-
`mercially available, Trident’s unique characteristics
`include its open source availability, open framework,
`ability to use custom floating-point libraries, and ability
`to retarget to new FPGA board architectures. The
`authors enumerate the compiler framework’s building
`blocks and provide some results obtained on the Cray
`XD1 platform.
`“V-Force: An Extensible Framework for Recon-
`figurable Computing” by Miriam Leeser and her col-
`leagues and students from Northeastern University and
`the College of the Holy Cross outlines their efforts to
`implement the Vforce framework. Based on the object-
`oriented VSIPL++ standard, Vforce encapsulates hard-
`
`26
`
`Computer
`
`High-performance
`reconfigurable computing
`has demonstrated its
`potential to accelerate
`demanding computational
`applications.
`
`ware-specific implementations behind a standard API,
`thus insulating application-level code from hardware-
`specific details. As a result, as long as the third-party
`hardware-specific implementation is available, the same
`application code can run on different reconfigurable
`computer architectures with no change. The authors
`include examples of applications and results from using
`Vforce for application development.
`In “Achieving High Performance with FPGA-Based
`Computing,” Martin Herbordt and his students from
`Boston University share a valuable collection of FPGA
`software design patterns. The authors start with an
`observation that the performance of HPC applications
`accelerated with FPGA coprocessors is “unusually sen-
`sitive” to the quality of the implementation. They exam-
`ine reasons for such a “sensitivity,”
`list numerous methods and tech-
`niques to avoid generating “imple-
`mentational heat,” and provide a
`few application examples that
`greatly benefit from the uncovered
`design patterns.
`“Sparse Matrix Computations
`on Reconfigurable Hardware,” by
`Gerald Morris and Viktor Prasanna
`describes implementations of conju-
`gate gradient and Jacobi sparse
`matrix solvers. In “Using FPGA Devices to Accelerate
`Biomolecular Simulations,” Sadaf Alam and her col-
`leagues from the Oak Ridge National Laboratory and
`SRC Computers describe an effort to port a production
`supercomputing application, a molecular dynamics code
`called Amber, to a reconfigurable supercomputer plat-
`form. Although the speedups obtained while porting these
`applications—highly optimized for the conventional
`microprocessors—to an SRC-6 reconfigurable computer
`are not spectacular, these articles accurately capture the
`overall trend.
`Reconfigurable supercomputing has demonstrated its
`potential to accelerate computationally demanding appli-
`cations and is rapidly entering the mainstream HPC world.
`
`H igh-performance reconfigurable computing has
`
`demonstrated its potential to accelerate demanding
`computational applications. Much, however, must
`be done before this technology becomes a mainstream
`computing paradigm. The articles in this issue highlight
`a small subset of challenging problems that must be
`addressed. We encourage you to get involved with HPRC
`and contribute to this newly developing field. ■
`
`References
`1. D.A. Buell, J.M. Arnold, and W.J. Kleinfelder, eds., Splash 2:
`FPGAs in a Custom Computing Machine, IEEE CS Press, 1996.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2168, p. 5
`
`

`

`2. M.B. Gokhale and P.S. Graham, Reconfigurable Computing:
`Accelerating Computation with Field-Programmable Gate
`Arrays, Springer, 2005.
`3. S.M. Trimberger, ed., Field-Programmable Gate Array Tech-
`nology, Kluwer Academic, 1994.
`4. T. El-Ghazawi et al., “Reconfigurable Supercomputing
`Tutorial,” Int’l Conf. High-Performance Computing, Net-
`(SC06); http://sc06.
`working, Storage and Analysis
`supercomputing.org/schedule/event_detail.php?evid=5072.
`
`Tarek El-Ghazawi is a professor in the Department of Elec-
`trical and Computer Engineering at the George Washington
`University, Washington, D.C. El-Ghazawi received a PhD
`in electrical and computer engineering from New Mexico
`State University. Contact him at tarek@gwu.edu.
`
`Kris Gaj is an associate professor in the Department of
`Electrical and Computer Engineering at George Mason
`University, Fairfax, Virginia. Gaj received a PhD in electri-
`cal engineering from Warsaw University of Technology,
`Poland. Contact him at kgaj@gmu.edu.
`
`Duncan Buell is a professor in the Department of Com-
`puter Science and Engineering at the University of South
`Carolina, Columbia. Buell received a PhD in mathe-
`matics from the University of Illinois at Chicago. Con-
`tact him at buell@sc.edu.
`
`Volodymyr Kindratenko is a senior research scientist at the
`National Center for Supercomputing Applications, Uni-
`versity of Illinois at Urbana-Champaign, Urbana. He
`received a DSc in analytical chemistry from the University
`of Antwerp, Belgium. Contact him at kindr@ncsa.uiuc.edu.
`
`March 2007
`
`27
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2168, p. 6
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket