throbber
FPGA Implementation of Systolic Sequence Alignment
`Dzung T. Hoang✁
`
`Department of Computer Science
`Brown University, Providence, RI 02912, USA
`email: dth@cs.brown.edu
`
`Daniel P. Lopresti✂
`
`Matsushita Information Technology Laboratory
`182 Nassau Street, Princeton, NJ 08542-7072, USA
`email: dpl@mitl.com
`
`Abstract
`
`This paper describes an implementation of a novel sys-
`tolic array for sequence alignment on the SPLASH re-
`configurable logic array. The systolic array operates in
`two phases. In the first phase, a sequence comparison
`array due to Lopresti [2] is used to compute a matrix
`of distances which is stored in local RAM. In the sec-
`ond phase, the stored distances are used by the alignment
`array to produce a binary encoding of the sequence align-
`ment. Preliminary benchmarks show that the SPLASH
`implementation performs several orders of magnitude
`faster than implementation on supercomputers.
`
`1 Introduction
`
`The work presented in this paper was begun during
`one co-author’s summer internship at the National Can-
`cer Institute’s Laboratory of Mathematical Biology in
`Fredrick, Maryland. The goal was to develop genetic
`sequence analysis algorithms for the SPLASH reconfig-
`urable logic array [3]. A systolic sequence compari-
`son algorithm that computes the edit distance between
`a pair of sequences had already been implemented on
`SPLASH [4]. Certain applications of interest to biol-
`ogists at the laboratory, such as multiple alignment of
`genetic (DNA and RNA) sequences, however, require
`more than just the edit distance: a more informative
`analysis of the similarity, or homology, of the sequences
`in the form of an alignment is required. In this paper,
`we describe an implementation of a systolic algorithm
`
`and afterwards by an NSF Graduate Fellowship.
`
`✄ A longer version of this paper can be found in [1].
`☎ Supported during Summer 1991 by an NIH Summer Internship
`✆ Supported by NSF grant MIP-9020570.
`
`for computing sequence alignments on SPLASH. Prior
`to our work, we know of no systolic array for computing
`sequence alignments.
`
`1.1 Sequence Comparison and Alignment
`
`Given a source sequence ✝✟✞✡✠ 1✠ 2 ☛☞☛☞☛ ✠✍✌
`, the edit distance between ✝
`sequence ✎✡✞✑✏ 1✏ 2 ☛☞☛✒☛ ✏✔✓
`and✎
`to✎
`
`and a target
`
`is defined to be the minimum cost of transforming
`through a series of the following edit operations:
`deleting a character, inserting a character, and substitut-
`ing one character for another1.
`In some applications, such as approximate multiple
`sequence comparison [5] and protein folding [6], in ad-
`dition to the edit distance, we need to know the series
`of edit operations that leads to a minimum cost transfor-
`mation. A standard way to represent the transformation
`is with an alignment. In an alignment, the characters
`of the source and target sequences are arranged in a
`matrix with two rows. The source sequence, possibly
`’, is placed in the first
`row. Similarly, the characters of the target sequence
`are placed in the second row. The matrix is analyzed
`indicates dele-
`in-
`
`with embedded null characters, ‘✕
`column-wise. A column containing ✖✒✗✘✚✙
`tion of the character ✛ ; a column containing ✖ ✘✜ ✙
`dicates insertion of the character ✢ ; and a column ✖✣✗✜ ✙
`indicates substitution of ✢
`for✛
`ment: ✖✥✤ ✦ ✤ ✧ ★ ✤ ✘ ✦ ✦★ ✦ ✘ ✧ ★ ✤ ✤ ✦ ✧ ✙ . For a given
`
`. A column consisting of
`two nulls is not allowed. Here is an example of an align-
`
`cost function, there may be more than one minimum-
`cost alignment. The alignment algorithm presented here
`
`1A different set of edit operations may be defined to suit a particular
`application. For example, in text processing, a swap of two adjacent
`characters may be considered an edit operation. However, a different
`algorithm than presented here may be required to accomodate these
`additional edit operations.
`
`Petitioner Microsoft Corporation - Ex. 1039, p. 1
`
`1
`
`✝
`

`

`SPLASH is a reconfigurable logic array developed at the
`Supercomputer Research Center (SRC) as a coprocessor
`card for the Sun VME bus. The SPLASH board con-
`tains 32 Xilinx XC3090 field-programmable gate arrays
`(FPGA) [9] with local connections to 32 1M-bit (128K
`by 8) static RAM chips. The FPGA’s are connected lin-
`early in a ring with input coming from a 32-bit FIFO
`queue connected to chip 0 and output going to a 32-bit
`FIFO queue connected to chip 31. A RAM chip is con-
`nected between each pair of adjacent FPGA chips and
`can be accessed by either FPGA. The data path connect-
`ing the FIFO’s to the array consists of 36 unidirectional
`lines, 32 for data and 4 for control signals. Adjacent
`FPGA’s, except for chips 0 and 31, are joined by a 68-bit
`programmable bidirectional bus, which shares connec-
`tions to the local RAM. Chips 0 and 31 are connected
`with a 35-bit data path. This “wrap-around” connection
`allows data flow through the array in either direction.
`At the heart of the SPLASH board are the Xilinx
`XC3090 FPGA’s. Each FPGA contains 320 configurable
`
`logic blocks (CLB’s) arranged in a 20❏ 16 grid and sur-
`
`rounded by 144 input/output blocks (IOB’s). The 144
`IOB’s surrounding each XC3090 FPGA provides con-
`nections to the control bus and programmable intercon-
`nections between adjacent chips and local RAM. Each
`IOB can be configured as either an input port, an output
`port, or a bidirectional input/output port, with optional
`latch or flip-flop operation. The programmability of the
`IOB’s allows for flexibility in the interchip connections.
`For example, when the local RAM is not needed, it
`can be disabled and the IOB’s connected to the RAM’s
`address and data lines can be used for communication
`between adjacent FPGA’s.
`The reader is referred to [3] for a more complete de-
`scription of SPLASH.
`Using the FPGA technology in SPLASH, we were
`able to rapidly prototype the systolic array without hav-
`ing to construct any additional hardware. This approach
`also has advantages over software-only simulations in
`that it allowed us to detect and correct race conditions
`present in early prototypes.
`
`3 Systolic Array
`Alignment
`
`for Sequence
`
`Our systolic array for sequence alignment operates in
`two phases. In the first phase, the systolic array oper-
`ates in sequence comparison mode to compute entries in
`the dynamic programming table and store them in local
`RAM. In the second phase, the stored table is used to
`construct an alignment with a marker passing systolic
`
`Petitioner Microsoft Corporation - Ex. 1039, p. 2
`
`2
`
`✁✂ ✄✄✄ ☎☎✆✂✂ ✄☎☎✆✄ ✄ ✂ ✄☎☎✆ ☎☎✆ ✂ ✄✄ ☎☎✆ ✂ ✄ ✂ ✄☎☎✆ ✄ ✄ ✁✄✄✄✄✄✄✄✄ ✂✂✂✂✂✂✂✂☎☎✆ ✄☎☎✆ ✄✂☎☎✆ ✄✂☎☎✆ ✄✂☎☎✆ ✄✂✁ ✄✂☎☎✆✂ ✄✄✄✁ ✄✂☎☎✆ ✄✂☎☎✆ ✄✂☎☎✆ ☎☎✆ ✂ ✂ ✂✂☎☎✆ ✄✂☎☎✆✂✂✄✂☎☎✆✝☎☎✆✄✂ ✄☎☎✆☎☎✆ ✄☎☎✆ ✂ ✄ ✄ ✁ ✂✂ ✄☎☎✆ ✄ ✁ ✂ ✂✞ ✂ ✂ ✄ ✂ ✄☎☎✆ ☎☎✆✂ ✄☎☎✆ ✁ ✂ ✂ ✂ ✄
`
`5
`
`6
`
`7
`
`6
`
`5
`
`6
`
`7
`
`4
`
`5
`
`6
`
`5
`
`4
`
`5
`
`6
`
`5
`
`6
`
`5
`
`6
`
`7
`
`8
`
`4
`
`5
`
`6
`
`7
`
`5
`
`6
`
`5
`
`6
`
`4
`
`5
`
`6
`
`7
`
`T G C T A A G C
`
`Figure 1: Dynamic programming table with minimiza-
`tion pointers
`
`computes one such alignment.
`
`1.2 Dynamic Programming
`
`The edit distance can be computed sequentially with a
`well-known dynamic programming algorithm [7,8] in
`be the source se-
`
`✟✡✠☞☛✍✌✏✎
`time. Let ✝ ✞ ✠ 1✠ 2✠ 3 ☛☞☛✒☛ ✠✍✌
`quence,✎ ✞ ✏ 1✏ 2✏ 3 ☛☞☛✒☛ ✏✔✓ be the target sequence, and✑✓✒✕✔✖
`be the edit distance between the subsequences✠ 1✠ 2 ☛☞☛☞☛ ✠ ✒
`and ✏ 1✏ 2 ☛☞☛✒☛ ✏✖ . Then
`
`✑ 0✔ 0 0✗✑✘✒✕✔ 0
`1✰✲✱✳✰ ☛ ✗✑ 0✔✖ ✞ ✑ 0✔✖✴✙ 1✚✵✛ ✒ ✓ ✪✶★✸✷✫✹✺✮ ✗
`✞ ✑✘✒✕✙ 1✔ 0✚✜✛✣✢✥✤✧✦✩★✫✪✭✬✯✮ ✗
`1✰✼✻✡✰ ✌ ✗
`1✾ ✹✾❀❂ ❃❄❅ ✑❆✒✕✙ 1✔✖ ✚✼✛ ✢✭✤✶✦✩★❇✪ ✬✮ ✗✑ ✒✕✔✖❈✙ 1✚✼✛ ✒ ✓ ✪✧★✯✷✹✮ ✗✑❆✒✕✙ 1✔✖✴✙ 1✚✜✛ ✪☞❉❆❊☞★✫✪ ✬✔✷✹✮✣❋
`✑✘✒✽✔✖ ✞
`1✾ ✬✾❀✿✏❁
`Here✛●✢✭✤✶✦✩★❇✪✭✬❍✮ is the cost of deleting✠✣✒ ,✛ ✒ ✓ ✪✧★✯✷✫✹☞✮ is the cost
`of inserting ✏✖ , and✛●✪■❉❆❊☞★✫✪ ✬✔✷✹ ✮
`✏✖
`for ✠ ✒ .
`
`and
`
`min
`
`is the cost of substituting
`
`An alignment can be constructed by creating pointers
`to indicate the minimization choices when evaluating
`the dynamic programming recurrence. An example dy-
`namic programming table augmented with pointers is
`shown in Figure 1. By tracing a path from the lower-
`right corner to the upper-left corner, we can construct
`an alignment in reverse. The bold pointers in Figure 1
`show the path that corresponds to the alignment given in
`a previous example.
`
`2 SPLASH Reconfigurable Logic
`Array
`
`A G A C T A G G
`
`0
`
`1
`
`2
`
`3
`
`4
`
`1
`
`2
`
`3
`
`4
`
`5
`
`2
`
`3
`
`2
`
`3
`
`4
`
`3
`
`4
`
`3
`
`4
`
`5
`
`4
`
`5
`
`4
`
`3
`
`4
`
`5
`
`4
`
`5
`
`4
`
`3
`
`4
`
`6
`
`5
`
`6
`
`5
`
`4
`
`3
`
`7
`
`6
`
`5
`
`6
`
`5
`
`4
`
`8
`
`7
`
`6
`
`7
`
`6
`
`5
`
`✞
`

`

`Target
`
` 2
`
`✁ ✁✂✂ ✡ 0
`
`✗✘ ✘ ✘ ✘✘ ✘ ✘✘ ✘✗✗✗✗✗ ✗ ✗✗ ✗✗✗ ✗✗✗✙ ✙ ✙✛✚ ✜✜✜✢
`✁✂ ✂✑✂✒✂✂✑✂✒✂✁ ✁ ✁✂✂✂✂ ✓✔✁ ✓✓✂✕✂✕✂✂✕✂✕✂ ✖ ✂ ✁✁ ✁✂✕✂✕✂ ✖ ✁ ✖ ✁ ✖ ✁ ✖ ✁ ✂✑✂✒✂✑ ✓ ✙ 1✔✓ ✙ 1
`✑ ✓ ✔✓ ✙ 2
`✑ ✓ ✙ 2✔✓✑ ✓ ✔✓ ✙ 1
`✑ ✓ ✙ 1✔✓✑ ✓ ✔✓
`
`0
`
`0
`
`0
`
`0
`
`1
`
`1
`
`0
`
`0
`
`0
`
`0
`
`Reversed
`Target
`Reversed
`Source
`Marker
`Stream
`Marker Flag
`
`☛☞✂
`
`Character
`
`Traveling Distance
`
`Stored Distance
`
`✌✌✍✂
`✁✂ ✁✂✂✄✂☎✂✂☎✂☎✂✂ ✁✆ 3 ✝ 4 ✞ 5 ✞ 6 7 ✆ 8
`✁✟✠ ✡ 2
`✎✎✏✝ 1
`✡ 1
`✡ 0
`✡ 2
`✁ ✁✂✂✡ 1
` 2
`✞ 1
` 8 7 ✞ 6 ✝ 5 ✆ 4 ✞ 3
`
`Source
`
`Figure 4: Systolic array for generating alignment
`
`Figure 2: Systolic array for sequence comparison
`
`✄To RAM
`✄MemDstOut
`
`Finite State
`Machine
`
`✄✄✄ ✄✄✄✁✂ ✁✂ ✁✂✁✂
`
`Match
`SrcNull TgtNull
`
`Character
`Comparator
`
`SrcDstIn
`TgtDstOut
`
`SrcChrIn
`TgtChrOut
`
`SrcDstOut
`TgtDstIn
`
`SrcChrOut
`TgtChrIn
`
`Figure 3: Block diagram of the sequence comparison PE
`
`array.
`Since the data path to local RAM is 8-bits wide,
`for convenience, eight processing elements (PE’s) were
`placed on each FPGA chip, except for X0 and X31,
`where only four PE’s were placed to leave room for
`I/O logic. This puts a total of 248 PE’s on SPLASH,
`allowing for alignment of sequences up to 123 in length.
`
`3.1 Phase One: Dynamic Programming
`
`The dynamic programming recurrence can be mapped
`onto a linear systolic array that computes a single antidi-
`agonal of the dynamic programming table at each step,
`with each PE in the array computing the distances along
`one diagonal. The resulting systolic array (Figure 2)
`and its implementation on SPLASH is described in [4].
`The array is modified to save the dynamic programming
`table in local RAM. The first phase ends just after the
`, has been computed.
`Figure 3 shows a block diagram of a sequence com-
`parison PE. Each PE is implemented in 13 CLB’s, eight
`for the character comparator and five for the finite state
`machine.
`
`edit distance,✑ ✌ ✔✓
`
`3
`
`3.2 Phase Two: Marker Passing
`
`The pointer traceback procedure for constructing an
`alignment, as described earlier, is performed systoli-
`cally in the second phase. We can think of the trace-
`back as a marker passing process in which the marker
`hops alongs a path created by the minimization pointers.
`Using the same antidiagonal mapping of the dynamic
`programming table to PE’s as in phase one, we seek
`to move the marker from the lower-right corner of the
`table to the upper-left. Following a horizontal pointer
`would correspond to moving the marker left one PE.
`Similarly, following a vertical pointer corresponds to
`moving the marker right one PE. Finally, following a di-
`agonal pointer corresponds to keeping the marker in the
`same PE. Where there are multiple pointers, one is arbi-
`trarily chosen. In phase one, the minimization pointers
`were never actually computed. However, we can de-
`
`duce the pointers originating from position✠✱✶✗■✻ ✎ given
`✑✘✒✕✔✖ ,✑❆✒✕✙ 1✔✖ ,✑✘✒✕✙ 1✔✖❈✙ 1, ✠●✒ , and ✏✕✖ . Therefore, by reading
`
`back the distances saved in local RAM and streaming
`the source and target sequences backwards through the
`array, the minimization pointers, and thus the movement
`of the marker, can be computed. The algorithm out-
`lined above is realized by the systolic alignment array
`diagrammed in Figure 4.
`The sequence alignment PE is diagrammed in Fig-
`ure 5. The sequence alignment array uses the same
`character comparator in the sequence comparison array.
`The additional finite state machine is implemented in
`eight CLB’s, bringing the total number of CLB’s per PE
`for both phases to 21.
`Since at any step, the marker can move at most one
`PE from its current position, the marker can be regis-
`tered on a systolic stream that moves across two PE’s
`at each step. The output of the marker stream encodes
`the movement of the marker. Two consecutive 1’s in-
`dicate that the marker moved right. Two 0’s between
`successive 1’s indicate that the marker moved left. A
`pattern of 10101 indicates that the marker did not move.
`The binary pattern exiting the marker stream can be de-
`coded into a series of edit operations by a simple finite
`
`Petitioner Microsoft Corporation - Ex. 1039, p. 3
`
`✁
`

`

`1
`
`Finite State
`Machine
`
`✗ ✗✑ ✏ ✒✕✙ 1
`✑ ✏ ✒✑ ✏ ✒✂✁
`✂ ✂
`✁ ✁
`✁✂ ✁✂
`✁ ✁
`✄✄✄ ✄✄✄✁✂ ✁✂
`
`Match
`SrcNull TgtNull
`
`Character
`Comparator
`
`MarkerStrmIn
`
`SeedOut
`
`SeedInLeft
`
`MarkerOutLeft
`
`MarkerInLeft
`
`SrcChrIn
`TgtChrOut
`
`Figure 5: Block diagram of alignment PE
`
`state automaton that counts the number of 0’s between
`successive 1’s.
`
`4 Benchmarks
`
`For timing, we performed 10,000 alignments of 100-
`long sequences on SPLASH. It took 0.50 seconds to
`initialize the SPLASH array and 3.2 seconds to run the
`alignments. Normalizing for 100 alignments gives 0.032
`seconds. For comparison, the benchmarks for 100 com-
`parisons of 100-long sequences found in [4] are summa-
`rized in Figure 6. We have not completed benchmarking
`sequence alignment on conventional computers and use
`these results for preliminary comparison. Computing
`an alignment would require additional processing and
`therefore take additional time in most implementations.
`Even including initialization time, the SPLASH imple-
`mentation performs at least an order of magnitude better
`than implementations on commercial supercomputers,
`which, as tested, compute only the edit distance.
`
`5 Conclusion
`
`A systolic array for sequence alignment is presented and
`its implementation on SPLASH is described. Prelimi-
`nary benchmarks show that the SPLASH implementa-
`tion is several orders of magnitude faster than implemen-
`tations on supercomputers costing many times more.
`
`MarkerStrmOut
`SeedInRight
`
`SeedOut
`MarkerInRight
`
`MarkerOutRight
`
`SrcChrOut
`TgtChrIn
`
`System
`SPLASH
`P-NAC
`Multiflow Trace
`Sun SPARCstation 1
`Cray 2
`Convex C1
`DEC VAX 8600
`Sun 3/140
`DEC VAX 11/785
`
`Time
`0.020 s
`0.91 s
`3.7 s
`5.8 s
`6.5 s
`8.9 s
`31 s
`48 s
`54 s
`
`Speed-Up
`2,700
`60
`14
`9.3
`8.3
`6.0
`1.7
`1.1
`1.0
`
`Figure 6: Benchmarks of 100 comparisons of 100-long
`sequences [4]
`
`References
`
`[1] D. T. Hoang, “A Systolic Array for the Sequence
`Alignment Problem," Brown University, Providence,
`RI, Technical Report CS-92-22, 1992.
`
`[2] R. J. Lipton and D. P. Lopresti, “A Systolic Array
`for Rapid String Comparison," in 1985 Chapel Hill
`ConferenceonVLSI, H. Fuchs, Ed. Rockville, MD:
`Computer Science Press, pp. 363–376, 1985.
`
`[3] M. Gokhale, W. Holmes, A. Kopser, S. Lucas, R.
`Minnich, D. Sweely and D. Lopresti, “Building and
`Using a Highly Parallel Programmable Logic Array,"
`Computer, 24, no. 1, pp. 81–89, January 1991.
`
`[4] D. P. Lopresti, “Rapid Implementation of a Genetic
`Sequence Comparator Using Field-Programmable
`Logic Arrays," presented at Advanced Research in
`VLSI Conference, Santa Cruz, March 1991, Invited
`paper.
`
`[5] B. A. Shapiro, “An Algorithm for Comparing Mul-
`tiple RNA Secondary Structures," Comput. Applic.
`Biosci., 4, no. 3, pp. 387–393, 1988.
`
`[6] H. Margalit, B. A. Shapiro, A. B. Oppenheim and
`J. V. M. Jr., “Detection of Common Motifs in RNA
`Secondary Structures," Nucleic Acids Research, 17,
`no. 12, pp. 4829–4845, 1989.
`
`[7] S. B. Needleman and C. D. Wunsch, “A General
`Method Applicable to the Search for Similarities in
`the Amino-Acid Sequence of Two Proteins," Journal
`ofMolecular Biology, 48, pp. 443–453, 1970.
`
`[8] R. A. Wagner and M. J. Fischer, “The String-to-String
`Correction Problem," J.. Assn. Comput. Mach., 1,
`pp. 168–173, 1974.
`
`[9] Xilinx, Inc., The Programmable Gate Array Data
`Book. San Jose, CA, 1991.
`
`Petitioner Microsoft Corporation - Ex. 1039, p. 4
`
`4
`
`✗✗
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket