throbber
From: ISMB-94 Proceedings. Copyright © 1994, AAAI (www.aaai.org). All rights reserved.
`
`Petitioner Microsoft Corporation - Ex. 1042, p. 269
`
`

`

`dedicated to the scanning of GSDB. This algorithm
`runs on a new kind of massively parallel and low cost
`(ie the price of a. workstation.) computer now avail—
`able; the reconfigurable hardware systems. It is based
`on a. bit level operation model that enables the imple—
`ment-or to fit the hardware to the problem rather than
`distorting the problem to fit the computer.
`The main idea that has guided our work is : if we sig-
`nificantly increase the speed of an algorithm for molec-
`ular genetics we also increase the quality of the re
`sults produced by this algorithm.
`Indeed decreasing
`the computation time needed by any algorithm by two
`or three orders of magnitudes enables the statistical
`validation by Monte Carlo like methods, for instance.
`
`This paper is organized as follows. The first section
`introduces the types of algorithms used for scanning
`the GSDBS, then proposes a generic pattern including
`those from FAST, BLAST and FLASH.
`ln the second
`section. after a brief introduction of the reconfigurable
`hardware concept, we show how an exhaustive search
`based on the pattern previously described can be effi-
`ciently implemented. Finally, comparative results be-
`tween other molecular genetic dedicated systems and
`our own are presented and discussed.
`
`Scanning Algorithmic
`A great variety of algorithms have been written to com—
`pare or align genomic sequences. However, a small
`number of mechanisms are at the heart of most of these
`
`algorithms.
`We have restricted our interest to the search for in-
`
`formation in the genetic data base. To simplify, this
`comes to comparing a sequence or part of a sequence
`to all the sequences of the data base.
`We have examined how to implement the common
`basis of these algorithmics on a reconfigurable hard-
`ware. As the biologists have associated a semantic to
`their current methods, one of the goals of our proposal
`is to speed up their methods without breaking up the
`semantics.
`
`The two principles of sequence comparison are the
`search for consensus patterns and the alignment by dy-
`namic programming. These methods consist in ofl'ering
`aligned sequences in such a way that the anchor points
`overlap each other. The anchoring points are set either
`by the biologist or by the pattern matching algorithm.
`
`Dynamic Programming
`Dynamic programming dedicated to molecular genet—
`ics was rediscovered at the begining of the Seventies
`by Needlman and W'unsh, then improved by Sellers,
`Gotoli, Smith and Waterman. (Needlmann 85 Wunsch
`1970; Sellers 1974; Gotoh 1982; 'Watermann 1984)
`Dynamic programming computation is extremely
`time consuming,
`therefore many dedicated machine
`projects have arisen. (Gokhale of at.
`199]; Chow et
`at 1991; White et at. 1991}
`
`270
`
`[SM 3—94
`
`We have evaluated an implementation of the N eedle
`man and Wunseh algorithm on our hardware. We find
`that a. scanning speed of only 3 million nucleotides per
`second is poSsible. Moreover, the computation only
`takes place on a band stretching along the main diag—
`onal of the matrix. This adds up to an under optimal
`computation heuristic on the whole of the matrix.
`On its own, dynamic programming remains too com—
`puting power consuming to search efficiently in large
`data bases, even with a speed up from specialized sys
`terns. This is even truer as several rounds are necessary
`to validate the results. Therefore. another kind of al-
`gorithm that deals better with the trade ofl‘s between
`scanning speed, flexibility and cost (in development
`time and finance) must be used for a specialized ma-
`chine. The pattern matching algorithms fulfills these
`requirements.
`
`Pattern Matching
`
`FAST, BLAST and FLASH are three examples of pattern
`matching algorithms, they are significant because of
`the extent to which they have been distributed. The al-
`gorithmic choice on which they were based were guided
`by computing time considerations, in terms of length
`of the data base, but also the nature of patterns, their
`number and length. They illustrate the state of the art
`on how to obtain the best performances on sequential
`machines. They establish a hierarchy in the structure
`of the words searched for. The differences between al—
`
`gorithms lie mainly in the way they look for a fragment,
`as well as the law of acceptance or rejection.
`These three algorithms are based upon the splitting
`up of the pattern searched for into small segments. In
`the case of FLASH this is even done in the base.
`To summarize the characteristics of these three al-
`
`gorithms they allow:
`
`I A comparison of a sequence against a base of se—
`quences.
`
`I A splitting up of the sequence searched for into
`smaller pieces. They are continuous, of size two for
`FAST, four for BLAST and discontinuous for FLASH.
`
`I A search based upon lookup tables.
`
`I rl'he filtering by the means of a minimum score of
`the sequences possessing the most fragments.
`
`Califano and Rigoutsos‘s paper (Califano Se Rigout-
`505 1993) provides an excellent statistical evaluation of
`the BLAST and FAST pattern definition, compared to
`their own in FLASH. One of the interesting points dis-
`played by Califano and Rigoutsos is that the increase
`in sensitivity of the method is related to the nature of
`the fragments, this respects our intuition.
`Besides.
`it is better to use discontinuous segments
`to characterize the information that is contained in the
`
`sequence and thus optimize the search.
`The term of segment has become inappropriate be-
`cause inserting spaces in the fragments has made them
`
`Petitioner Microsoft Corporation - Ex. 1042, p. 270
`Petitioner Microsoft Corporation - EX. 1042, p. 270
`
`

`

`become more complicated, therefore we prefer to speak
`about a motif. The motifs are still very simple in the
`case of FLASH but we intend to complexify them.
`
`The Generic Motif
`
`In what follows we summarize the most important
`characteristics of a pattern matching algorithm.
`
`I The scanning of several hundreds or even thousands
`of motifs. Moreover, a degree of flexibility in the
`description of the motif is required to stretch the
`algorithm’s research field. This implies the motifs
`must not only be words but rather signatures of their
`belonging to a class of words.
`
`I Sufficient velocity. This implies a few seconds to scan
`current bases, as in the near future the sequencing
`aims at entering whole genomes that would increase
`the volume to several billions of nucleotides (3.5 bil-
`lion of nucleotides for the entire human genome).
`
`The motif isn’t described any more from an alpha-
`bet of nucleotides or amino acids even extended with
`
`joker characters, but from a list of authorized letters at
`a position. A motif becomes an array of lists of letters.
`The motifs must also be rather short, but this isn’t
`critical. Indeed, Califano and Rigontsos have demon—
`strated that the motif must have a length around 10
`for the search on DNA and 5 for the proteins, to obtain
`a scanning with a good sensitivity.
`In order to take into account these considerations.
`the motifs are looked for with vectors of binary sub~
`stitution. The motifs are formed by a set of Binary
`Substitution Vectors (BSVs). A st is simply the list
`of authorized letters at the position of the vector. A
`motif of length N on an alphabet of L letters is made
`up by a list of N BSV ofL bits.
`
` Alphabet
`
`l
`0
`0
`
`(l
`1
`0
`
`l
`l
`l
`
`1
`0
`0
`
`0
`0
`0
`
`0
`0
`1
`
`T
`G
`C
`
`l
`0
`0
`1
`
`0
`0
`1
`0
`
`This motif of length 8 on the DNA alphabet can recognize
`words like AGTGCTAA or CGTGATAC and so on...
`
`This motif structure has several advantages. On one
`hand it encompasses the FAST, BLAST and FLASH mo~
`tifs and at the same time it maintains a certain sim—
`
`plicity and a direct access to the underlying seman-
`tic. This makes the production of motifs by a program
`quite easy. On the other hand, these motifs can be
`used as anchor points for multiple alignment (Gracy,
`Chiche, & Sallantin 1992), searching by profile, or for
`producing examples for learning systems. The latter
`exploits the possibilities of systematic exploring with
`motifs of the size of a data base to find consensus fields
`
`in a set of sequences. (Mephu N guifo & Sallantin 1992)
`To adress the problem of sensitivity of the search, let
`us first notice that it depends only of the discriminant
`power of the motif. We have set the maximum length
`
`Indeed the most specific
`of a motif to be 16 sts.
`motif (ie with only one letter at each position) 1will be
`found, in average, once in a random sequence of length
`416 x 4.109, which is greater than the size ofthe human
`genome. Thus the selectivity of the motif can be tuned
`from a probability of one (ie the motif match with any
`sequence in the data base) to a probability less than
`(data base length)‘1. Furthermore the selectivity can
`be tuned independentely of the efficiency of the search.
`
`AER? Algorithm and Its Implementation
`The A2132 algorithm aims at searching exhaustively
`on the whole of a GSDE, for the situation of words or
`short sequences. The motifs looked for, described in
`the previous section, can be established either by the
`user, or automatically by a. software.
`
`Introduction to Reconfigurable Hardware
`
`Switch Matrix
`
`e-IZIEIIIIIZIEEIZ
`
`
`
`IOBfl--s-I:l|:ll:ll:|l__ll_|l__lmf“'ll_ll:
`
`Figure I: View of part of Xilinx FPGA architecture
`
`The reconfigurable hardware concept, as well as its
`implementation with Field Programmable Gate Ar—
`rays (FPGAs) was introduced by different teams at the
`end of the Eighties. Vuillernin, Lopresti and Kean de—
`scribed various systems based on this concept. (Bertin,
`Roncin,
`Sc Vuillemin 1989; Gokhale et at.
`1991;
`Gray 5:, Kean 1989) In fact, this concept had been
`latent in the literature since the Seventies. One of
`
`the first to have expressed it was Schafiner (Schaffner
`1978) in 1972. However, it is only with the arrival
`of the SHAM based FPGA in 1985 by Xilinx that it
`became possible to implement this concept. The char-
`acteristics of reconfigurable hardware are, a great de-
`velopment facility, together with a flexibility that is
`only encountered in programmable systems. The level
`of performances reached by a reconfigurable hardware
`are those of a dedicated system. The FPGA is mainly
`used because of its aptitude for rapid prototyping.
`
`Lemoine
`
`271
`
`Petitioner Microsoft Corporation - Ex. 1042, p. 271
`
`

`

`ACK
`
`
`
`match-mismatch lines
`
`pipeline bus : DNA 2bits, Protein 4 or 5 bits
`
`Figure 2: Scheme of a single motif detector
`
`Basically a FPGA is a grid of small memories 16x1
`bits interconnected by a network across the whole grid.
`These small memories are lookup tables that can im—
`plement any logic functions of four variables.
`In the
`FPGA that we used,
`the lookup tables are grouped
`by two with two additional 1 bit registers and form a
`CLB (Configurable Logic Block) the basic element of
`the Xilinx FPGA. The reader must refer to (Fagin &
`Watt 1992) for a comparison between supercomputer,
`ASIC2 and FPGA based hardware.
`
`The Experimental Platform
`
`The experimental platform is constitued by a DEC
`workstation and a co—processing card attached to it.
`This card contains 22 Xilinx FPGAs and 4 MBytcs of
`SRAM. 16 FPGAS are connected in a QD array with 10—
`cal interconnection and memory buses. Therefore 5120
`3 binary functions, equivalent to 1 bit arithmetic and
`logic unit, can be evaluated‘ at each cycle. in the ar-
`ray. The maximum bandwidth between this array and
`the memory is 320 MBytes/s with an access time from
`SRAM to FPGAs of Wm. The hardware is connected
`
`to a Decstation 5000/2-10 by a bus TurboC-hannel at
`100 MBytes/s.
`On this UNIX workstation the design is developed
`in C++ and translated through Xilinx tools in a bit-
`stream configuration ready to be loaded on the FP-
`GAs. The bitstream configuration can be considered
`as a unique nanoinstruction of 1.4 Mcgabits width that
`loads itselfin less than 50 ms on the card.(Touati 1992')
`
`The implementation on a Reconfigurable
`Hardware
`
`Let us now examine how the detection of a given motif
`is translated into the hardware. The key point is to fit
`a Binary Substitution Vector with a lookup table. Then
`
`2Application Specific integrated Circuit
`316 x 320 : 16 FPGA, 320 CLBs per FPGA
`
`272
`
`ISM 3—94
`
`a 4x1 (20):!) bit lookup table implements a nucleotide
`[amino acid) BSV. This is due to the number of bits
`necessary to code the nucleotide or amino acid alpha—
`bet. As we have already pointed out one GLB contains
`two 16x1 bit lookup tables. One CLB can implement
`two nucleotide BSV or one amino acid as" by bringing
`together each of their lookup tables of 16 bits. A motif
`is made up of a certain number of RSV therefore a mo-
`tif detector is made up of several CLBs whose outputs
`are combined by an AND gate. See figure 2.
`In fact.
`many optimisations can be made in order to increase
`either the speed or the number of motif detectors. But
`we don't intend to bother the reader with too many
`electronics details of the implementation.
`The data base is injected nucleotide by nucleotide
`sequentially and is pipelincd in the FPGA in such a
`way that a new word can be presented at each cycle
`to the motif detector. The pipeline can be seen as a
`window that moves along the sequence. Once a motif
`has been recognized. a signal is sent to the best. station.
`The number associated with the motifand its position
`in the data base are the relevant informations sent to
`the host station.
`
`The whole design can be considered as a sieve in
`which each hole fits a motif. The number of motifs that
`
`can be looked for simultaneously on the card is 512. In
`the case of amino acids the motifs are up to 8 BSV longl
`whereas the nucleotide motifs are up to 16 35v long.
`The maximum Speed in this case is constraint by the
`FPGA’s intrinsic performance and is around 50ns per
`cycle.
`One must point out that the motifs are coded in hard
`in the FPGA design. This is not a. handicap since only
`the lookup tables must be changed. From afunctional
`point of view this comes to breaking up the whole mo-
`tif detector into elementary functions. For a given set
`of motifs that need to be matched. the only changes
`involved are those of the functions that code for the
`
`substitution vectors. The overall function decompo—
`
`Petitioner Microsoft Corporation - Ex. 1042, p. 272
`Petitioner Microsoft Corporation - EX. 1042, p. 272
`
`

`

`sition is not changed. This can be done by writing
`straight into the bitstream configuration of the FP—
`GAs as the motifs are generated. The figure 3 presents
`the two phases of this process as it can be seen by the
`end user.
`
`If the search does not require more than 256 mo-
`tifs the design can run at the highest speed supported
`by the 1/0. A new character from the GSDE can be
`injected every 40 us into it. A flow of 25 million nu—
`cleotides per second is achieved. The parallelism in
`this case is massive, 4k comparisons per cycle which
`represents 100 Giga. operations—1. To compare these
`results with a sequential machine, a program obtain-
`ing exactly the same results has been implemented on
`a DEC station. The most favorable situation for the
`sequential machine has been considered, that is the
`comparison where there is no matching at any posi-
`tion. This reduces the parallelism really useful to 250
`comparaisons per cycles. Eight instructions are car-
`ried out on the workstation to make a comparison and
`2000 instructions are necessary to execute what the
`hardware does in one cycle. Therefore, under the opti—
`mistic assumption of one instruction per cycle, a min-
`imum speed up of 2000 times is reached.
`In fact, on
`the basis of realistic data (ie not only mismatches) the
`speed of the reconfigurable hardware is more than 5000
`times that of the workstation.
`
`
`
`Figure 3: The two phases of a data base scanning
`
`Related Study and Discussion
`
`We present our dedicated hardware and software as
`well as four other specialized systems. Two were re-
`alized with full custom A310 and the two others with
`FPGA. For a fair comparison the reader has to take
`into account that the SPLASH 2 project uses F PGA
`of a more recent technology than the 3 others and than
`our hardware.
`In fact, we cannot compare these dif-
`ferent systems because each of them uses a particular
`algorithm and there is no common quality benchmark
`available for these 4 systems. The four systems imple—
`ment different algorithm with the same goal; search—
`
`ing for a sequence in a data base.
`
`I BISP : The BISP system consists of a Motorola
`68020, SRAM and EEPROM memories and 48 BISP
`chips in the full configuration. The BISP chips inte—
`grate 400000 transistors, 75% is implemented with
`full-custom logic, and runs at 12.5 Mhs for an 1 pm
`CMOS process. It implements 16 cells of systolic ar-
`ray conceived for dynamic programing. (Chow ct al.
`1991)
`
`I BioSCAN : The BioSCAN chips. 650000 transis-
`tors, integrates 800 cells of a systolic array. It runs
`at 32 Mhz but needs 16 cycles to execute one step
`of the systolic algorithm. A 1.2 pm CMOS process
`is used for the design technology. Only one chip is
`required for the scanning. (White et at 1991)
`
`I SPLASH l : is a ring of 32 X03090 interconnected
`with a hoststation through two FIFOs. 744 cells are
`implemented in a one dimensional mesh that runs at
`1 Mhz. (Gokhale et at. 1991)
`I SPLASH 2 : This new version of SPLASH looks
`like an SIMD supercomputer and is based on the
`new generation of FPGA from Xilinx. The perfor-
`mance indicated in the table 1 are for one board; 17
`FPGAs interconnected to their nearest neighbour in
`a ring fashion or through a crossbar. The algorithms
`are the same as SPLASH l; a kind of dynamic pro-
`grammation. (Hoang 1993)
`
`In order to take into account these differences we
`
`compared the five systems on the same basis. That is
`the time it. would take each system to search for a 500
`nucleotide DNA sequence in a data base. In table 1 one
`can see the number of chips required to implement the
`heart of the algorithms, the scanning speed of a data
`base in mega nucleotides per seconds, and the speedup
`over a workstation claimed by the authors.
`From this table it stands out that the reconfigurable
`hardware concept associated with our generic motif
`reaches and even over—takes the performances of a cus-
`tom ASIC. Moreover the comparison between [1ng
`and SPLASH shmvs that our implementation makes a
`better deal with the trade off between area efficiency
`and speed.
`
`Let us now examine the performances of FLASH in
`terms of scanning speed. Califano and Rigoutsos (Cal-
`ifano 35 Rigoutsos 1993) report they require 24 seconds
`to match a 100 nucleotides DNA sequence against the
`Genbank. For a similar Search with BLAST they report
`a time of 5 minutes. Our system does the same job in
`only 4 seconds.
`To compare our result to FLASH we have to take
`into consideration another parameter, the additional
`cost of a dedicated hardware to the cost of a worksta-
`tion versus the cost of a. workstation alone. However
`
`FLASH compiles the data base in a huge lookup table.
`For example in Genbank, 106 nucleotides are compiled
`
`Lemoine
`
`273
`
`Petitioner Microsoft Corporation - Ex. 1042, p. 273
`Petitioner Microsoft Corporation - EX. 1042, p. 273
`
`

`

`
`
`_Systemname
`
`N b chips
`required
`
`Scanning
`speed
`
`Speed up
`over a. WS
`
`
`
`32 XCSOQO
`
`l
`
`10000 (vi)
`
`A2 R2
`250 motifs
`
`A2 R2
`500 motifs
`
`16 xcaeso
`
`5mm [vi]
`
`16 XC3090
`
`20M one
`
`.3“
`
`i) SparcStation I: 12.5 Mips. ii] Sun 4/310: 8 Mips.
`iii) Sparc lofsoox: so Mips vi) DECstation 5000,1240: 4t}
`Mips
`
`Table 1: Performances of Systems Dedicated to Molec-
`ular Genetics
`
`in a 9 Gigabytes lookup table that does not fit in the
`main memory of a workstation. The user of FLASH
`then needs an extra 9 Gigabytes of external storage
`and would require even more in the future. This is
`because the lookup table for the whole human genome
`will require around 0.5 ’I‘erabytes of disk storage. The
`price of that external storage is higher than the price
`for our hardware (equivalent to the price of a worksta~
`tion). Moreover the performances ofintegrated circuits
`continue to improve dramatically and the performance
`of the 1/0 subsystem has not kept pace. (Patt 1994}
`
`Algorithmic versus Dedicated Hardware One
`question remains : how far can hardware solutions go
`without new advances in the computational complex-
`ity of algorithms '3'
`In (Vuillernin 1993) Jean Vuillemin introduced the
`main equations one should bear in mind.
`
`P = G x F where G is the number of gates which
`one can effectively fit. within a unit area by the
`end of the year y, F is the frequency at which one
`can reliably operate such gates and P the comput—
`ing power of the resultant system. By the end of
`the next technology year yr : y + l, the comput—
`ing power increases according to G" : 03 G and
`F' = or! F. The last 25 years the overall-increase
`ing factor of the computing power has remained
`equal to two and by choosing t1 : (is, = (1;. a good
`approximation, we find 0' = {3/5.
`
`Let us now apply the above equations to :13}?2 and
`its implementation. The FPGA used are in 1.2;“?! 1990
`CMOS technology and the board was realized and lin—
`ished in 1992. On this experimental platform A”?
`operates at 25Mhz and uses 5000 time. So by the year
`
`2’74
`
`lSMB—94
`
`2000 with a 1998 FPGA technology we will implement
`A2182 on a or?” x G x 2.105 CLBs and operate at
`as x F m 150 Mhz.
`
`But all of the hardwares will not completely take
`advantage of this technology increase in performances.
`Jean Vuillemin demonstrated that reconfigurable hard—
`ware based system effectively grow by a factor two each
`year whereas general purpose microprocessors grow
`only by a factor of two each three years. This is because
`the. peak performance of a microprocessor is ultimately
`limitated by the frequency of the clock no matter the
`amount of gates4 that were used to make the micro—
`processor.
`
`The question is now whether current algorithms can
`exploit the whole hardware’s increase in computing
`power.
`
`If we focus on data base scanning we can see two
`kinds of parallelism (Miller, N adkarni, & Carriero
`1991). The first one is obvious, since the data base is
`composed of a huge amount of independent5 sequences
`we can divide the initial data base in many smaller data
`bases and search independantly in each of these data
`base. The other kind of parallelism can be found in the
`pattern matching algorithm itself. AER2 is an exam-
`ple of highly parallelized pattern matching algorithm.
`If we add the flexibility of reconfigurable hardware we
`can balance between two ways of using the increase in
`gates by the year 2000 :
`- Increasin
`only the number of motifs scanned :
`250 X 0:1 x 10000 motifs at a scanning speed of
`150Mnuc.s_1.
`
`I Increasing only the scanning speed of the data base
`for the actual numbers of motifs [250). Therefore the
`data base is split in ale m 40 data bases resulting
`in a 40 times 150 l't’lnuc.‘1 (ie 0 GIGAnucs—1 or
`scanning the entire human genome in half a second)
`To conclude, under the asumption that no break-
`through in integrate circuit technology will take place,
`a. compu ting power improvement of two orders of mag
`nitudc must be gained by the year 2000, thanks to the
`reconfigurable hardware. Whereas only an optimistic
`'5 times improvement should be gained by general pur-
`pose workstation in the saute time.
`
`Conclusion and Future Work
`
`We demonstrate that the couple formed by A2112 and
`reconfigurable hard ware system achieves very high per—
`formances. 1ndeed, a genomic Sequence data base of 1
`billion nucleotides is scanned in less than a minute.
`
`The following statements summarize the advantages
`of both 212R2 and reconfigurable system:
`I Efficiency, due to the massive parallelism achiev-
`able without any restriction on the quality of the
`results produced.
`
`the number of gates only control the ratio (average 3'
`peak) performances.
`5for the pattern matching algorithm
`
`Petitioner Microsoft Corporation - Ex. 1042, p. 274
`Petitioner Microsoft Corporation - EX. 1042, p. 274
`
`

`

`I Generality, because the motifs structures include
`patterns previously used by FAST, BLAST and FLASH
`algorithms.
`
`I Simplicity, thanks to clear semantics, the motif can
`be generated by hand or automatically by a pro—
`gram.
`
`I Scalability, the increase by a factor two each three
`year in speed and a factor of four each three years
`in density of the new generation of FPGAs gives one
`matter to think about the future use of reconfig-
`urable hardware.
`
`I Flexibility, the motif detector can be seen as a.
`building block for more sophisticated pattern match-
`ing algorithms.
`
`One restriction remains, the quality of the result is
`directly connected to the motifs given to :1sz and
`therefore brought back to the user the responsability
`of the quality of the search.
`We think that the Speed sustained by our systems
`opens a new way for high level genomic analysis com-
`putation. Indeed, Artificial Intelligence algorithms be-
`cause of their nature need a huge amount of computing
`power. Usually the main drawback of such algorithms
`is the latency of data. base request. With A2 R2 we offer
`a solution to this problem as well as the possibility of
`exhaustive exploration of the current and future data
`bases. Moreover, the acceleration of the time consum-
`ing subroutine of pattern matching methods allows a
`better control of the result. Statistical methods, like
`Monte Carlo, can be used to validate knowledges ob-
`tained by learning methods in AI systems.
`We are now concentrating our efforts on speeding
`up the main part of AI algorithms produced by our
`team. The first candidate is the Galois lattice algo-
`rithms used in the LEGAL system. (Mephu N guifo Sc
`Sallantin 1992) And also implement in hardware qual-
`ity analyse of results produces by :4ng and other al—
`gorithms.
`
`Acknowledgements
`
`A great thanks to P. Bertin, C. Boegner, P. Boucard,
`II. Touati and J. Vuillemin for helping us along the
`way, and to the anonymous referees for their valuable
`suggestions.
`
`References
`
`Bertin, P.; Boneln, D.; and Vuillemin, J. 1989. Introduc-
`tion to programmable active memories. In Systolic Array
`Processors. J. McCanny et a]. 300—309.
`Califano, A., and Rigoutsos, I. 1993. FLASH: A fast look-
`up algorithm for string homology.
`In Pmcc. of the First
`Int. Conf. on Intelligent Systems for Molecular Biology.
`56—64. AAAI.
`
`Chow, E. T.; Hunkapiller, T.; Peterson, .1. Cu, and Zim-
`merman, B. A. 1991. Biological information signal proces-
`sor. In Proceedings of the Int. Conf. on Aplication Specific
`Array Processor, 144—160.
`
`Fagin, B., and Watt, .1. G. 1992. FPGA and rapid proto-
`typing technology use in a. special purpose computer for
`molecular genetics.
`In Proee. of the Int. Conf. on Com-
`puter Design, 496—501. IEEE.
`1985. Align—
`Feng, D., Johnson, M.; and Doolittle, R.
`ing amino acid sequences: comparison of commonly used
`methods. J. of Molecular Evolution 21:112—125.
`Gokhale, M.; Holmes, W.; Kasper, 21.; Lucas, 8.; Minnich,
`R.; Sweely, D.; and Lopresti, D.
`1991. Building and
`using a highly parallel programmable logic array.
`IEEE
`Computer 24(1]:81—89.
`Gotoh, 0. 1982. An improved algorithm for matching
`biological sequences. J. of Molecular Biology 162:705—708.
`Gracy, J.; Chiche, L.; and Sallantin, J. 1992. A modular
`learning environnment for protein modeling. In Procc. of
`the First Int. Conf. on Intelligent Systems for Molecular
`Biology, 145—153. AAAI.
`Gray, J. P.. and Kean, T. A. 1989. Configurable hardware:
`A new paradigm for computation.
`In Proceedings of the
`1989 Deeenniol Cultech Con}. C. L. Seits. 279—295.
`Hoang, D. T. 1993. Searching genetic database on splash
`2. In Proc. of the lVorkshop on FPGAJ as Custom Com-
`puting Machines, 185—191. IEEE.
`Lemoine, E. 1993. Reconfigurable hardware for molecular
`biology computing systems.
`In Procc. of Int. Conf. on
`Application-Specific Array Processors, 184—187.
`Mephu Nguifo, E., and Sallantin,
`.l. 1992. Prediction of
`primate splice junction gene sequences with a cooperative
`knowledge acquisition system. In Procc. of the Firsl Int.
`Conf. on Intelligent Systems for Molecular Biology. 292—
`300. AAAI.
`
`Miller, P. L.; Nadkarni, P. M.; and Carriero, N. M. 1991.
`Parallel computation and PASTA: confronting the prob-
`lem of parallel database search for a fast sequence com-
`parison algorithm. CABIOS T(5):71-—73.
`Needlmann, S., and Wunsch, C. 1970. A general method
`applicable to the search for sirnillarities in the amino acid
`sequence of two proteins. J. of Molecular Biology 48:443—
`453.
`
`Patt, Y. N. 1994. The I/O subsystem : a candidate for
`improvements. IEEE Computer 27(3):]5—16.
`Schaifner. M.
`1978. Processing by data and program
`blocks.
`IEEE Transactions on Computers 27(11):“115—
`1028.
`
`Sellers, D. 197-1. 0n the theory an computation of evo-
`lutionnary distances. SLAJI. J. Appl. Math. 26(4):?87—
`793.
`
`Touati. H. 1992. PerlelDC: a C++ library for the simu-
`lation and generation of DECPeRLe-l designs. Technical
`Report TN4, Paris Research Laboratory.
`Vuillemin. J. E. 1993. On computing power. Lecture Notes
`on Computer Sciences 782:69—86.
`Watermann, M. 1984. General methods of sequence com—
`parison. Bull. .Moth. Biol. 46:473-500.
`White, C. T.; Singh, R. K.; Reintjes, P. B.; Lampe, J.; Er-
`ickson, B. W.; Dettloif, W. D.; Chi, V. L.; and Altschnl,
`S. F. 1991. BioSCAN: A VLSI—based system for biase—
`quence analysis. In Proceedings of the IEEE Int. Conf. on
`Computer Design.
`
`Imoine
`
`275
`
`Petitioner Microsoft Corporation - Ex. 1042, p. 275
`Petitioner Microsoft Corporation - EX. 1042, p. 275
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket