throbber
The Splash 2 Processor and Applications
`
`Jeffrey M. Arnold, Duncan A. Buell, Dzung T. Hoang
`Daniel V. Pryor, Nabeel Shirazi, Mark R. Thistle
`IDA Supercomputing Research Center
`17100 Science Drive
`
`Bowie, MD 20715
`
`Abstract
`
`Splash 2’ is an attached parallel processor in which
`the computing elements are user programmable FPGA
`devices. The architecture ofSplash 2 is designed to ac-
`celerate the solution of problems which ezhibit at least
`modest amounts of temporal or data parallelism. Ap-
`plications are deoelopecl by writing descriptions of al-
`gorithms in VHDL, which are then iteratively refined
`and debugged within a simulator. Once an application
`is determined to be functionally correct in simulation.
`it is compiled to a gate list and optimized by logic syn-
`thesis. The gate list is then mapped onto the FPGA
`architecture by automatic placement and routing tools
`to form a loadable FPGA object module. A Clangaage
`library and a symbolic debugger comprise the execution
`environment. The Splash 9 system has been shown to
`he efiectioe on a variety of applications, including text
`searching, sequence analysis, and image processing.
`
`1
`
`Introduction
`
`In recent years the emergence of reconfigurable
`Field Programmable Gate Arrays (FPGAs)[3l[7} has
`given rise to a new form of computing in which the ar-
`chitecture of a computer may evolve over time, chang—
`ing to fit the needs of each application it executes.
`With such a computer, the programmer may tailor the
`architecture to fit the problem solution, rather than
`vice versa. Several FPGA based architecture have
`been proposed and built recently. These machines all
`have very different programming models, but each re-
`lies on the use of reconfigurable hardware to customize
`the architecture to particular applications. (Seem for
`a more extensive bibliography.)
`in
`Splash 2[2]
`is an attached parallel processor
`which the computing engines are user programmable
`FPGA devices. The architecture of Splash 2 is de-
`signed to accelerate the solution of problems which
`
`1063-6401993 $08.00 9 1993 $333
`
`43?-
`
`exhibit at. least modest amounts of temporal paral-
`lelism (pipelining) or data parallelism (single instruo
`tion multiple data stream). High [/0 bandwidth
`makes Splash 2 particularly useful for problems with
`large data sets or continuous data streams. Finally,
`because the architecture may be shaped to fit
`the
`problem, Splash 2 is well suited to problems in which
`the size of the fundamental data objects do not match
`a conventional architecture (eg. streams of character
`text}.
`
`2 Splash 2 Architecture
`
`The Splash 2 system consists of a Sun Sparcsta—
`tion host, an interface board, and from one to sixteen
`Splash array boards, as shown in figure 1. The in—
`terface board contains a programmable system clock
`and provides DMA access to the host memory through
`two user programmable FPGA devices, XL and KR,
`which are used to process incoming and outgoing data
`streams.
`
`Each array board contains 16 processing elements
`{PEs}, X1 through X16, arranged in a linear array and
`fully connected via a 16x16 crossbar switch which is
`regulated by the 17th control element, X0. Each of the
`17 computing elements consists of a Xilinx X04010
`FPGA and 256K 16 bit words of memory, which is
`also directly addressable by the host. Each processing
`element has 36 bit. iii-directional data paths to its left
`neighbor,
`to its right neighbor, and to the crossbar
`switch. The input to the array is provided by XL
`through the SIMD bus to X1 of the first board and to
`X0 of every board. Multiple array boards are linked
`together by extending the linear data path from the
`right side of one board to the left side of the next. The
`last. board in the chain is connected to the “Bus, and
`thence to KR. on the interface board.
`
`(cid:51)(cid:72)(cid:87)(cid:76)(cid:87)(cid:76)(cid:82)(cid:81)(cid:72)(cid:85)(cid:3)(cid:48)(cid:76)(cid:70)(cid:85)(cid:82)(cid:86)(cid:82)(cid:73)(cid:87)(cid:3)(cid:38)(cid:82)(cid:85)(cid:83)(cid:82)(cid:85)(cid:68)(cid:87)(cid:76)(cid:82)(cid:81)(cid:3)(cid:16)(cid:3)(cid:40)(cid:91)(cid:17)(cid:3)(cid:20)(cid:19)(cid:25)(cid:19)(cid:15)(cid:3)(cid:83)(cid:17)(cid:3)(cid:23)(cid:27)(cid:21)
`Petitioner Microsoft Corporation - EX. 1060, p. 482
`
`

`

`Splash Boards
`
`
`
`
`
`flii-
`--
`annnnmn
`
`
`
`Figure 1. The architecture of the Splash 2 system.
`
`The crossbar switch is capable of storing up to eight
`separate, dynamically selectable configurations, with
`each configuration specifying a different set of con-
`nections among the 16 ports. The configuration in
`use in any given clock cycle is selected by the con-
`trol element, X0, which shares the 36 bit connection
`to the crossbar with X16. X0 may therefore receive
`broadcast data from the host on the SIMD bus and
`rebroadcast it through the crossbar to each of the 16
`PEs.
`
`The Sparcstation host performs a variety of con—
`trol functions for Splash 2. The host is responsible
`for downloading the configuration information to the
`FPGA computing elements, XD—Xlfi, the interface el~
`ements, XL and XR, and the crossbar switch. There
`are a variety of both synchronous and asynchronous
`mechanisms with which the host may communicate
`with the Splash system, including direct access to the
`computing element memories and the ability to start
`and stop the system clock.
`
`As a research vehicle Splash 2 was designed to
`support a variety of programming models,
`includ-
`ing a single—instruction multiple—data stream {SIMD}
`
`model, a one dimensional pipelined systolic modelI
`and several higher dimensional systolic models. SIMD
`applications may be programmed on Splash 2 by using
`the X0 element and the crossbar switch on each board
`
`to broadcast instructions and data to all processing el—
`ements simultaneously. The instruction stream is sent
`from the host to the X0 chip on each board via the
`SIMD Bus. The X0 elements may in turn decode this
`instruction stream, possibly by fetching microcode
`from their external memories. This decoded instruc-
`
`tion is then broadcast through the crossbar switch to
`each of the 16 processing elements on the board. Each
`processing element is programmed with one or more
`identical SIMD computing engines. These SIMD en»
`gines simultaneously receive and execute the decoded
`instruction, performing nearest neighbor communica-
`tion through the linear data path. Global synchro-
`nization is supported through an AND ,1“ OR reduction
`network.
`
`One dimensional systolic applications may be
`mapped onto Splash 2 by using the linear data path
`to form a continuous pipeline from the host1 through
`the array, and back to the host. The crossbar switch
`
`483
`
`(cid:51)(cid:72)(cid:87)(cid:76)(cid:87)(cid:76)(cid:82)(cid:81)(cid:72)(cid:85)(cid:3)(cid:48)(cid:76)(cid:70)(cid:85)(cid:82)(cid:86)(cid:82)(cid:73)(cid:87)(cid:3)(cid:38)(cid:82)(cid:85)(cid:83)(cid:82)(cid:85)(cid:68)(cid:87)(cid:76)(cid:82)(cid:81)(cid:3)(cid:16)(cid:3)(cid:40)(cid:91)(cid:17)(cid:3)(cid:20)(cid:19)(cid:25)(cid:19)(cid:15)(cid:3)(cid:83)(cid:17)(cid:3)(cid:23)(cid:27)(cid:22)
`Petitioner Microsoft Corporation - Ex. 1060, p. 483
`
`

`

`may be configured to bypass certain PEs during some
`or all of the processing time. Many higher dimen-
`sional applications can be mapped onto Splash by us-
`ing both the linear data path and the dynamic selec-
`tion capability of the crossbar switch to build higher
`dimensional topologies, such as trees, hypercubes, and
`toroids.
`
`3 Programming Splash 2
`
`The programming environment for Splash 2 is
`based upon the VHSIC Hardware Description Lan—
`guage (VHDL)[5], simulation, and logic synthesis. Ap—
`plications for Splash are developed by writing be-
`havioral descriptions of algorithms in VHDL which
`are then iteratively refined and debugged within the
`Splash 2 simulator. During the course of this iter-
`ation, the VHDL implementation is manually parti—
`tioned into a set of Splash computing element pro-
`grams. Once the implementation is determined to be
`functionally correct in simulation it is compiled into a
`gate list and optimized by logic synthesis techniques.
`The gate list is then mapped onto the F PGA architec—
`ture by automatic placement and routing tools to form
`a loadable FPGA object module. Static timing analy~
`sis tools are applied to the object module to determine
`the maximum operating frequency and the set of crit-
`ical paths. This information may be used to manually
`optimize the design. Finally, a C language interface
`library and a symbolic debugger form the execution
`environment.
`
`4 Applications
`
`4.1 Text Searching
`
`Splash 2 has been programmed by the last three
`authors to detect key words from a dictionary in a
`stream of text [6].
`In a preproceasiug step, a dic-
`tionary is streamed past a hash function. The hash
`function computes an address from each word in the
`dictionary and sets a presence bit in a memory table.
`During the processing, the text is streamed through
`the hash function, an address computed {rom words,
`and the corresponding memory location examined. A
`set presence bit indicates a potential key word hit.
`The individual probabilities of false hits multiply if
`multiple (assumed to be) independent hash functions
`are used, so the overall probability of a false hit can
`be made as small as one likes.
`
`The interface board chip XL unpacks four-character
`(32—bit)
`input words in two two-character “super-
`bytee” which are broadcast on the SIMD bus to X0
`on the Splash 2 array boards. X0 uses its attached
`memory as a translation table to convert upper case
`letters to lower case, and white space characters to a
`simple end—of-word marker. This translation table can
`be used to accommodate multiple alphabets or other
`special processing requirements.
`X0 sends the translated superbyte to X] over the
`crossbar. Xilinx chips Xl-lei run the same program
`with di'fl'erent hash functions receiving data from the
`linear data path, computing an updated hash function
`(difierent for each chip), and performing a lookup into
`memory if the processed superbyte contained an end—
`of-word marker.
`If the lookup is done, the presence
`bit is ANDed to the incoming accumulated presence
`bits and passed onto the next chip in the linear array.
`The final determination of a hit is made by XR on
`the interface board, which reports the existence and
`position in the file of the hit.
`The memories attached to Xl-Xlfi each are 223 bits
`in size. The English language has about 213 words
`(ignoring case—sensitivity).
`In the simplistic scheme
`adopted here, the set of keywords is doubled in size
`to accommodate end-of-word markers in either the
`
`odd or even byte positions. The full English language
`would thus hash into a Splash 2 memory that is about
`If 8 full, and 16 independent hash functions would pro-
`duce false hits with a probability of 1 in 2“, which is
`certainly low enough for nearly any application. A
`“real” dictionary or keyword list could be expected
`to be much smaller (the Unix spell-checking program
`uses a dictionary of about 30,000 words);
`the corre-
`sponding substantial reduction in the probability of
`false hits will be more than make up for the fact that
`text is in fact not “random”.
`
`The program as written for Splash 2 runs at 25
`MHz, processing two bytes of text per clock, for an
`effective text throughput rate of 50 Mbytes per second.
`
`4.2 Macromolecule Sequence Compari-
`son
`
`Splash 2 has been programmed by the third au—
`thor to perform DNA and protein sequence compar-
`isons [4]. The well—known dynamic programming al—
`gorithm is used to compute the edit-distance between
`two strings of characters. The edit distances for all
`prefixes ofa source sequence 5' = 8183...8n and a tar-
`get sequence T = haul,“ form a matrix. With 5.-
`labelling row 1' and t;
`labelling column j of the ma—
`trix,
`the entry in the (£,j) cell of the matrix is the
`
`(cid:51)(cid:72)(cid:87)(cid:76)(cid:87)(cid:76)(cid:82)(cid:81)(cid:72)(cid:85)(cid:3)(cid:48)(cid:76)(cid:70)(cid:85)(cid:82)(cid:86)(cid:82)(cid:73)(cid:87)(cid:3)(cid:38)(cid:82)(cid:85)(cid:83)(cid:82)(cid:85)(cid:68)(cid:87)(cid:76)(cid:82)(cid:81)(cid:3)(cid:16)(cid:3)(cid:40)(cid:91)(cid:17)(cid:3)(cid:20)(cid:19)(cid:25)(cid:19)(cid:15)(cid:3)(cid:83)(cid:17)(cid:3)(cid:23)(cid:27)(cid:23)
`Petitioner Microsoft Corporation - EX. 1060, p. 484
`
`

`

`E. Keith, W. Kleinfelder, D. Kopetaky, A. Kopser,
`J. Kuehn, S. Lucas, M. Mascsgni, M. McCarty, R.
`Minnich, M. Moran, F. More, L. Podrazik, C. Reese,
`J. Schlesinger, P. Schneck, B. Schott, D. Smitley, D.
`Sweely, C. Tscharner. and K. Wallgren.
`
`References
`
`[1} J. M. Arnold, “The Splash 2 software enViron-
`merit,” Proc., IEEE Workshop on FPGAs for Gus-
`tom Computing Machines, 1993I to appear.
`
`[2] J. M. Arnold, D. A. Buell, and E. G. Davis,
`“Splash 2", Proc, 41!: Annual ACM Symp. on Pan-
`oHel Algorithms and Architectures, pp. 316-322,
`1992.
`
`[3] S. D. Brown. R. J. Francis, J. Rose, and Z.
`G. Vraneeic, Field—Programmable Gate Arrays,
`Kluwer Academic Publishers, 1992.
`
`[4] D. Hoang, “Searching genetic databases on Splash
`2”, Prom, IEEE Workshop on FPGAs for Custom
`Computing Machines. 1993, to appear.
`
`{5} IEEE Standard VHDL Language Reference Man-
`ual, IEEE Std 1076—4987, New York, NY, 1988.
`
`[6} D. Pryor, M. Thistle, and N. Shirasi, “Text search—
`ing on Splash 2“, Proc., IEEE Workshop on FP-
`GAs for Custom Computing Machines, 1993, to
`appear.
`
`[7] The X04000 Data Book, Xilinx Inc., San Jose, CA,
`1992.
`
`to I}; plus the cost of
`
`edit-distance from S.- : 31....” to T,- = t3...t,'. This
`distance is the smaller of three quantities:
`1) The distance from SF; to T,- plus the “cost” of
`appending s,- to 33.1.
`2) The distance from S.-
`appending t,- to Cit--1.
`3) The distance from 8,4 to 13.; plus the cost of
`substituting t,- for 5;.
`For DNA sequences, the costs in 1) and 2) are usu-
`ally taken to be 1 and the cost in 3) to be 0 if t; = s.-
`and 2 if not.
`The locality of the information needed for edit dis-
`tance computation permits a systolic method to be
`used which is also highly parallel because all the cells
`on a given antidiagonal can be updated simultane»
`ously.
`In Splash 2 implementation of this algorithm
`for DNA comparisons, 14 copies of a cell-update pro-
`cessing element fit on a single Xilinx chip for a total
`of 224 processing elements per Splash 2 board. This
`implementation can process 12 million characters per
`second. In a. protein sequence version, 8 million char-
`acters per second can be processed.
`
`4. 3 Edge Detection
`
`The high 1,10 bandwidth and the pipelined nature
`of Splash make it well suited to many image process-
`ing applications. One such application which has been
`implemented is a 3x3 gradient operator edge detector.
`The input image is streamed into the array at video
`rates (up to 16 MPixels/sec). For the duration of one
`scan line, the input stream is written to one PE mem-
`ory while the previous two scan lines are read from two
`other memories. The input scan line together with the
`two previous scan lines are used to compute the X and
`Y partial gradients. Another PE memory is used as a.
`table lookup to compute the magnitude and the angle
`of the gradient. At the end of each scan line, the three
`scan line bufl'ers exchange roles so that the second pre-
`vious buffer is over written with the new scan line, and
`the most recently written buffer becomes the previous.
`This is accomplished by rotating among three cross-
`bar configurations at the end of each scan line. An
`“image” consisting of the angle and magnitude of the
`gradient at each pixel is streamed out at the input
`data rate with a. latency of three scan line times.
`
`Acknowledgements
`
`those who have contributed
`We acknowledge all
`to the Splash 2 project, including D. Burns, N. Go—
`letti, S. Cuccaro, B. Fross, M. Gokhale, W. Holmes,
`
`(cid:51)(cid:72)(cid:87)(cid:76)(cid:87)(cid:76)(cid:82)(cid:81)(cid:72)(cid:85)(cid:3)(cid:48)(cid:76)(cid:70)(cid:85)(cid:82)(cid:86)(cid:82)(cid:73)(cid:87)(cid:3)(cid:38)(cid:82)(cid:85)(cid:83)(cid:82)(cid:85)(cid:68)(cid:87)(cid:76)(cid:82)(cid:81)(cid:3)(cid:16)(cid:3)(cid:40)(cid:91)(cid:17)(cid:3)(cid:20)(cid:19)(cid:25)(cid:19)(cid:15)(cid:3)(cid:83)(cid:17)(cid:3)(cid:23)(cid:27)(cid:24)
`Petitioner Microsoft Corporation - Ex. 1060, p. 485
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket