(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY(PCT)
`
`(19) World Intellectual Property Organization
`International Bureau
`
`(43) International Publication Date
`4 April 2002 (04.04.2002)
`
`
`
`PCT
`
`(10) International Publication Number
`WO 02/27638 Al
`
`(51)
`
`International Patent Classification’:
`
`GO06F 19/00
`
`(21) International Application Number:
`
`=PCT/NO01/00394
`
`(74) Agent: MIDTTUN,Gisle; Bryns Zacco as, P.O. Box 765,
`Sentrum, N-0106 Oslo (NO).
`
`(22) International Filing Date:
`27 September2001 (27.09.2001)
`
`(81) Designated States (national): AE, AG, AL, AM, AT, AU,
`AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CO, CR, CU,
`CZ, DE, DK, DM, DZ, EC, EL, US, FI, GB, GD, GE, GH,
`GM, HR, HU, ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC,
`LK, LR, LS, LT, LU, LV, MA, MD, MG, MK, MN, Mw,
`MX, MZ, NO, NZ, PH, PL, PT, RO, RU, SD, SE, SG,SI,
`SK, SL, ‘TJ, 1M, TR, VT, TZ, UA, UG, US, UZ, VN, YU,
`(30) Priority Data:
`ZA, ZW.
`20004869 28 September 2000 (28.09.2000)=NO
`
`
`(25) Filing Language:
`
`(26) Publication Language:
`
`English
`
`English
`
`(71) Applicant (for all designated States except US): SEE-
`BERG, Erling, Christen [NO/NO]; Borgestadveien 25B,
`N-0875 Oslo (NO).
`
`(71)
`(72)
`
`Applicant and
`Inventor; ROGNES, Torbjern [NO/NO]; Motzfeldts
`gate 16, N-0187 Oslo (NO).
`
`(84) Designated States (regional): ARIPO patent (GH, GM,
`KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, 7W), Eurasian
`patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), European
`patent (AT, BE, CH, CY, DE, DK,ES, FI, FR, GB, GR,IE,
`TT, LU, MC, NL, PT, SE, TR), OAPI patent (BF, BJ, CF,
`CG, CI, CM, GA, GN, GQ, GW, ML, MR, NE, SN, TD,
`TG).
`
`[Continued on next page]
`
`(54) Title: DETERMINATION OF OPTIMAL LOCAL SEQUENCE ALIGNMENTSIMILARITY SCORE
`
`query sequence.
`
`
`
`
`
`
` aouanbaseseqeyep
`
`Sequence alignment and sequence database
`(57) Abstract;
`similarity searching are among the most important and challeng-
`ing task in bio informatics, and are used for several purposes,
`including protein function prediction. An efficient parallelisation
`of the Smith-Waterman sequence alignment algorithm using
`parallel processing in the form of SIMD (Single-Instruction,
`Multiple-Data) technology is presented. The method has been
`implementation using the MMX (MultiMedia eXtensions) and
`SSE (Streaming SIMD Extensions) technology that is embedded
`in Intel’s latest microprocessors, but
`the method can also be
`implemented using similar technology cxisting in other modern
`microprocessors. Near eight-fold speed-up relative to the fastest
`previously an optimised eight-way parallel processing approach
`achicved know non-parallel Smith-Watcrman implementation on
`the same hardware. A speed of about 200 million cell updates per
`second has been obtained on a single Intel Pentium IT 500MHz
`microprocessor.
`
`query sequence
`
` aouanbes
`
`
`
`
`
`aseqejep
`
`
`WO02/27638Al
`
`

`

`WO 02/27638 Ad
`
`—_IMMIIMTIAVTITTAT TITIAN TAMIA
`
`Declaration under Rule 4.17:
`
`of inventorship (Rule 4.17(iv)) for US only
`
`Published:
`
`with international search report
`
`For two-letter codes and other abbreviations, refer to the "Guid-
`ance Notes on Codes andAbbreviations" appearing at the begin-
`ning ofeach regular issue ofthe PCT Gazette.
`
`

`

`WO 02/27638
`
`PCT/NO01/00394
`
`Determination of optimal local sequence alignment similarity score
`
`Field of invention
`
`The present invention relates to the comparison of biological sequences and, more
`specifically, the invention relates to a method, a computer readable device and an
`electronic device for determination of optimal local sequence similarity score in
`accordance with claims 1,8 and 16, respectively.
`
`Backgroundof invention
`
`The rapidly increasing amounts of genetic sequence information available represent a
`constant challenge to developers of hardware and software database searching and
`handling. The size of the GenBank/EMBL/DDBJnucleotide database is now doubling
`at least every 15 months (Benson et al. 2000). The rapid expansion of the genetic
`sequence information is probably exceeding the growth in computing poweravailable at
`a constant cost, in spite of the fact that computing resources also have been increasing
`exponentially for many years. If this trend continues, increasingly longer time or
`increasingly more expensive computers will be needed to search the entire database.
`Searching databases for sequences similar to a given sequenceis one of the most
`fundamental and important tools for predicting structural and functional properties of
`uncharacterised proteins. The availability of good tools for performing these searchesis
`hence important. When looking for sequences in a database similar to a given query
`sequence, the search programs compute an alignment score for every sequencein the
`database. This score represents the degree of similarity between the query and database
`sequence. The score is calculated from the alignment of the two sequences, and is based
`on a substitution score matrix and a gap penalty function. A dynamic programming
`algorithm for computing the optimal local alignment score wasfirst described by Smith
`and Waterman (1981), improved by Gotoh (1982) for linear gap penalty functions, and
`optimised by Green (1993).
`Database searches using the optimal algorithm are unfortunately quite slow on
`ordinary computers, so many heuristic alternatives have been developed, such as
`FASTA (Pearson and Lipman, 1988) and BLAST(Altschul et al., 1990; Altschulet al.,
`1997). These methods have reduced the running time by a factor of up to 40 compared
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`

`

`WO 02/27638
`
`PCT/NO01/00394
`
`to the best-known Smith-Waterman implementation on non-parallel general-purpose
`computers, however, at the expense of sensitivity. Because of the loss of sensitivity,
`somedistantly related sequences might not be detected in a search using the heuristic
`algorithms.
`Due to the demand for both fast and sensitive searches, much effort has been
`made to produce fast implementations of the Smith-Waterman method. Several special-
`purpose hardware solutions have been developed withparallel processing capabilities
`(Hughey, 1996), such as Paracel’s GeneMatcher, Compugen’s Bioccelerator and
`TimeLogic’s DeCypher. These machines are able to process more than 2 000 million
`matrix cells per second, and can be expanded to reach much higher speeds. However,
`such machinesare very expensive and cannot readily be exploited by ordinary users.
`Some hardware implementations of the Smith-Waterman algorithm are described in
`patent publications, for instance US patents 5553272, 5632041, 5706498, 5964860 and
`6112288.
`
`A more general form of parallel processing capability is available using Single-
`Instruction Multiple-Data (SIMD) technology. A SIMD computeris able to perform the
`same operation (logical, arithmetic or other) on several independent data sources in
`parallel. It is possible to exploit this by dividing wide registers into smaller units in the
`form of micro parallelism (also known as SIMD withina register - SWAR). However,
`modern microprocessors have added special registers and instructions to make the
`SIMD technologyeasier to use. With the introduction of the Pentium MMX
`(MultiMedia eXtensions) microprocessor in 1997, Intel made computing with SIMD
`technology available in a general-purpose microprocessor in the most widely used.
`computer architecture — the industry standard PC. The technologyis also available in
`the Pentium II and has been extended in the Pentium HI under the name of SSE
`
`(Streaming SIMD Extensions) (Intel, 1999). Further extension of this technology has
`been announcedfor the Pentium 4 processor (also known as Willamette) under the
`name SSE2 (Streaming SIMD extensions 2) (Intel 2000). The MMX/SSE/SSE2
`instruction sets include arithmetic (add, subtract, multiply, min, max, average,
`compare), logical (and, or, xor, not) and other instructions (shift, pack, unpack) that may
`operate on integer or floating-point numbers. This technologyis primarily designed for
`speeding up digital signal processing applicationslike sound, images and video, but
`seemssuitable also for genetic sequence comparisons. Several other microprocessors
`with SIMD technology are or will be made available in the near future, as shown in
`table 1 (Dubey, 1998).
`
`20
`
`25
`
`30
`
`35
`
`

`

`WO 02/27638
`
`PCT/NO01/00394
`
`Table 1. Examples of microprocessors with SEMD technology
`
`
`Manufacturer
`AMD
`
`Microprocessor
`K6/K6-2/K6-IIT
`
`Nameof technology
`MMX / 3DNow!
`
`Compag (Digital)
`Hewlett Packard (HP)
`HP / Intel
`Intel
`
`Athlon / Duron
`
`Extended MMX / 3DNow!
`
`MVI (Motion Video Instruction)
`Alpha
`MAX(-2) (Multimedia Acceleration eXtensions)
`PA-RISC
`SSE (Streaming SIMD Extensions) ?
`Itanium (Merced)
`Pentium MMX/IT MMX (MultiMedia eXtensions)
`Pentium II
`SSE (Streaming SIMD Extensions)
`Pentium 4
`SSE2 (Streaming SIMD Extensions 2)
`
`Motorola
`
`PowerPC G4
`
`Velocity Engine (AltiVec)
`
`SGI
`
`Sun
`
`MIPS
`
`SPARC
`
`MDMX (MIPSDigital Media eXtensions
`
`VIS (Visual Instruction Set)
`
`10
`
`1s
`
`20
`
`Several investigators have used SIMD technology to speed up the Smith-
`Waterman algorithm, but the increase in speed relative to the best non-parallel
`implementations have been limited.
`The general dynamic programming algorithm for optimal local alignment score
`computation wasinitially described by Smith and Waterman (1981).
`Gotoh (1982) described an implementation of this algorithm with affined gap
`penalties, where the gap penalty for a gap of size k is equal to q+rk, where q is the gap
`open penalty and ris the gap extension penalty. Undertheserestrictions the running
`time of the algorithm wasreducedto be proportional to the product of the lengths of the
`
`two sequences.
`Green (1993) wrote the SWAT program and applied some optimisations to the
`algorithm of Gotoh to achieve.a speed-up of a factor of about tworelative to a
`straightforward implementation. The SWAT-optimisations have also been incorporated
`into the SSEARCHprogram of Pearson (1991).
`The Smith-Waterman algorithm has been implemented for several different
`SIMD computers. Sturrock and Collins (1993) implemented the Smith-Waterman
`algorithm for the MasPar family of parallel computers, in a program called MPsrch.
`This solution achieved a speed of up to 130 million matrix cells per second on a MasPar
`MP-1 computer with 4096 CPUsand up to 1 500 million matrix cells per second on a
`
`

`

`WO 02/27638
`
`PCT/NO01/00394
`
`MasPar MP-2 with 16384 CPUs. Brutlag et al. (1993) also implemented the Smith-
`Waterman algorithm on the MasPar computers in a program called BLAZE.
`Alpern et al. (1995) presented several ways to speed up the Smith-Waterman
`algorithm including a parallel implementation utilising micro parallelism by dividing
`the 64-bit wide Z-buffer registers of the Intel Paragon i860 processors into 4 parts. With
`this approach they could compare the query sequence with four different database
`sequences simultaneously. They achieved more than a fivefold speedup over a
`conventional implementation.
`Wozniak (1997) presented a way to implement the Smith-Watermanalgorithm
`using the VIS (Visual Instruction Set) technology of Sun UltraSPARC microprocessors.
`This implementation reached a speed of over 18 million matrix cells per second on a
`167MHz UltraSPARC microprocessor. According to Wozniak (1997), this represents a
`speedup of a factor of about 2 relative to the same algorithm implemented with integer
`instructions on the same machine.
`
`Taylor (1998 and 1999) applied the MMX technologyto the Smith-Waterman
`algorithm and achieved a speed of 6.6 million cell updates per second on an Intel
`Pentium II 500 MHz microprocessor.
`Sturrock and Collins (2000) have implemented the Smith-Waterman algorithm
`using SIMD on Alpha microprocessors. Howeverno details of their method has been
`published. As far as we can see from the web service available at
`hittp://www.ebi.ac.uk/MPsrch/, they have achieved a speed of about 53 million cell
`updates per second using affine gap penalties. It is unknown exactly what computerthis
`system is running on.
`Recently, Barton et al. (2000) employed MMX technology to speed up their
`SCANPS implementation of the Smith-Waterman algorithm. They claim a speed of 71
`million cell updates per second on a Intel Pentium HI 650 MHz microprocessor. Only a
`poster abstract without any details of their implementation is currently available.
`
`Disclosure of the Invention
`
`It is described an efficient parallelisation related to a method for computing the optimal
`local sequence alignmentscore of two sequences. Increased speed ofthe overall
`computation is achieved by performing several operationsin parallel.
`
`The invention enables Smith-Waterman based searches to be performed at a much
`higher speed than previously possible on general-purpose computers.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`

`

`WO 02/27638
`
`PCT/NO01/00394
`
`Using the MMX and SSEtechnologyin an Intel Pentium III 500MHz
`microprocessor, a speed of about 200 million cell updates per second was achieved
`when comparing a protein with the sequencesin a protein database. As far as known,
`this is so far the fastest implementation of the Smith-Waterman algorithm on a single-
`microprocessor general-purpose computer. Relative to the commonly used SSEARCH
`program, which is consider as a reference implementation, it represents a speedup of
`about eight. It is believed that an implementation of the present invention on the
`forthcoming Intel Pentium 4 processor running at 1.4 GHz will obtain a speed of more
`than 1000 million cell updates per second. These speeds approach or equal the speed of
`expensive dedicated hardware solutions for performing essentially the same
`calculations.
`
`Using the present invention, high speed is achieved on commonlyavailable and
`inexpensive hardware, thus significantly reducing the cost and/or computation time of
`performing sequence alignment and database searching using an optimal local
`alignment dynamic programming approach.
`
`The present invention is based on the principle that all calculations are performed using
`vectors that are parallel to the query sequence (“query-based vectors’’) (see figure 2b).
`This is in contrast to the traditional approach in which the calculations are performed
`using vectors that are parallel to the minor diagonal in the matrix (“diagonal vectors’’)
`(see figure 2a). In both cases, the vectors contain two or more elements that represent
`different cells in the alignment matrix. The values in the elements of the vectors
`represent score values, e.g. the h-, e- or f-variables in the recurrence relations described
`later.
`
`The advantage ofthe traditional approach is that the calculations of the
`individual elements of the vectors are completely independent(see figure 3a). Hence the
`use of this approach may seem obvious. The traditional approach is described in detail
`by Hughey (1996), Wozniak (1997) and Taylor (1998, 1999), and seems to be almost
`universal to all parallel implementations of the Smith-Waterman alignment procedure
`that are known in detail. However, there are several disadvantages with this approach,
`most notably the complexity of forming the vector of substitution score values. This
`problem is especially noticeable at high degrees ofparallelism. The query-based vector
`approach has not been described earlier, in spite of numerous attempts to parallelise the
`Smith-Waterman algorithm.It is believed that the present invention is sufficiently novel
`and different from the current “state of the art”.
`
`20
`
`25
`
`30
`
`35
`
`

`

`WO 02/27638
`
`PCT/NO01/00394
`
`The disadvantage of the query-based vector approach is the dependence between the
`individual vector elements in some of the calculations (see figure 3b). However, by
`exploiting a vector generalisation of the principles used in the SWAT-optimisations,
`these dependencies will only affect a small fraction of the calculations, and hence not
`have a major impact on the performance. The advantage of the query-based approachis
`the greatly simplified loading of the vector of substitution score values from memory
`when using a query sequenceprofile (also known as a query-specific score matrix).
`
`The present invention method uses a query profile, which is a matrix of scores for all
`combination of query positions and possible sequence symbols. The scores are arranged
`in memory in such a manner that a vector representing the scores for matching a single
`database sequence symbol with a consecutive range of query sequence symbols can
`easily be loaded from memory into the microprocessorusing a single instruction. This is
`achieved by first storing the scores for matching the first possible database sequence
`symbol with each of the symbols of the query sequence, followed by the scores for
`matching the second possible database sequence symbol with each query position, and
`so on.
`
`The scores in the query profile are all biased by a fixed amount(e.g. 4, in case of the
`BLOSUM62 matrix) so that all values become non-negative, and stored as an unsigned
`integer. This allows subsequent calculations to be performed using unsigned arithmetic.
`The constantbias is later subiracted using an unsigned integer subtraction operation.
`
`Vector elements are represented by few bits (e.g. 8), and may thus represent only a
`narrow range of scores. When the total vector size is limited to a specific numberofbits
`(e.g. 64), narrow vector elements allows the vector to be divided into more elements
`(e.g. 8) than if wider elements were used. This allows more concurrent calculations to
`take place, further increasing speed.
`
`In order to make the narrow score range useful even in cases where the scores are larger
`than what can be represented by a vector element, we employsaturated arithmetic to
`detect overflow in the score calculations. If overflow is detected, the entire alignment
`score is subsequently recomputed using an implementation with a wider score range,
`e.g. 16 bits. Because such high scoresarerelatively rare, the performance impactis
`small.
`
`15
`
`20
`
`25
`
`30
`
`35
`
`

`

`WO 02/27638
`
`PCT/NO01/00394
`
`Score computations are performed using unsigned values. Unsigned score values allows
`the widest possible score range. This reduces the number of recomputations necessary
`as described above.
`
`The computations are performed using saturated unsigned arithmetic. This allows easy
`clipping of negative results of subtractions at zero, an operation that is frequently
`performedin the calculations.
`
`The method, the computer readable device and the electronic device have their
`respective characteristic features as stated in the attached claims 1, 8 and 16
`respectively, and further embodiments thereof are disclosed in claims 2-7, 9-15 and 17-
`22, respectively.
`
`Brief Descriptions of Drawings
`
`Figure 1 illustrates computational dependencies in the Smith-Waterman alignment
`matrix, the arrows indicate dependencies between matrix cells that are involved in the
`computations of the h-, e- and f-values in each cell of the alignment matrix.
`
`20
`
`25
`
`30
`
`35
`
`Figure 2 illustrates vector arrangements in SIMDimplementations of the Smith-
`Waterman algorithm, each column represent one symbolof the database sequence, and
`each row represents on symbol of the query sequence, where
`
`a)
`
`represent traditional approach with vectors parallel to the minor diagonal in the
`matrix, and
`
`b)
`
`represents the novel approach with vectors parallel to the query sequence.
`
`Figure 3 illustrates vector calculation dependencies, the arrows indicates dependencies
`between matrix cells that are involved in the computationsofthe h-, e- and f-values in
`the elements of the vector using (a) the traditional and (b) the novel approach.
`
`Figure 4a illustrates pseudo-code for the new approach, the pseudo-code being just a
`detailed example showing how our method may be implemented in a computer
`
`program.
`
`Figure 4b illustrates a flow diagram of the pseudo-codein fig. 4a
`
`

`

`WO 02/27638
`
`PCT/NO01/00394
`
`Table 2: The symbols and phrases used in the pseudo-code.
`
`Phrase / statement / symbol
`
`BYTE a
`
`INTEGER a
`
`BYTE A[m]
`
`value
`
`integer values
`
`BYTE A[m][n]
`
`VECTOR A
`
`Il
`

`
`o
`
`
`
`
`{
`
`is represented by a byte value
`Indicates that A is a two-dimensionalarray (matrix), where
`each element is represented by a byte value
`Indicates that A is a vector of containing eight elements, each
`represented by a byte value
`
`The variable a is assigned the value of expression b
`
`The vector variable A is assigned the value of expression B.
`
`Can be implemented using the MOVQ (move quadword)
`instruction on processors supporting MMX technology.
`
`
`Parenthesises indicated expressions that take precedence in the
`order of computation.
`
`
`L
`
`The FUNCTION and RETURN statements indicate the
`FUNCTIONa(b,c)
`
`
`beginning and ending of a function a taking b and c as
`parameters and returning the resulting value c.
`
`|FORi=aTObDO
`
`
`This FOR-statement indicates that the statements enclosed by
`
`
`the brackets should be repeated (b-a+1) times, with a loop
`statements
`index variable i, taking consecutive integer values a., at+1, at+2,
`
`
`
`}
`..» b-1, b.
`
`
`IF expression THEN
`This IF-statement indicates that the if-statements shall be
`
`{
`executed if the expression is true, and that the else-statements
`shall be executed if the expression is false.
`
`
`{
`
`
`else-statements
`
`Brackets enclosing the statements of a for-loop, if-statements
`
`and else-statements
`
`An expression representing the sum of the values of a and b
`
`(A+B)-C
`
`RETURN c
`
`if-statements
`
`$ E
`
`LSE
`
`
`
`

`

`WO 02/27638
`
`PCT/NO01/00394
`
`An expression representing the value of b subtracted from the
`value ofa
`
`
`
`
`
`Indicates scalar integer multiplication
`
`
`
`Indicates scalar integer division
`
`A scalar expression representing the largest of its arguments
`
`
`
`A vector operation taking A and B as arguments and
` representing a new vector where each element is equal to the
`
` pair wise sum of the corresponding elements of vector A and B.
`
` The computations are performed using unsigned saturated
`
` arithmetic, meaning that if any resulting elementis larger than
`
` the highest representative number, that element is replaced by
`
` the largest representative integer.
`
`
`cj = min(aj + b;, 255)
`Can be implemented using the PADDUSB(packedaddition of
`
`unsigned saturated bytes) instruction on processors supporting
`
`MMxX technology.
`
`A vector operation taking A and B as arguments and resulting
` In a new vector where each element is equal to the pair wise
`
` difference between the corresponding elements of vector A and
` B, when subtracting each value of B from the value of A.
`
` The computations are performed using unsigned saturated
`
` arithmetic, meaning that if the difference is negative, zero
`
` replaces the result.
`
` cj = max(aj — bj, 0)
`
`
`Can be implemented using the PSUBUSB (packed subtract of
`
`unsignedsaturated bytes) instruction on processors supporting
`MMX technology.
`
`A vector operation taking A and B as arguments and
` representing a new vector where each elementis equal to the
`
` larger of two corresponding elements of A and B.
`
` cj = max(aj, bi)
`
` Can be implemented using the PMAXUB (packed maximum
` unsigned byte) instruction processors supporting SSE
`technology, or using the PSUBUSBinstruction followed by the
`PADDUSB(packed subtract/add unsigned saturated bytes)
`instruction on
`
`processors supporting only MMX technology.
`
`
`
`

`

`WO 02/27638
`
`PCT/NO01/00394
`
`10
`
`A SHIFT b
`
`
`
`
`AORB
`
`Afi]
`
`SHIFTis a vector operation taking the vector A andthe scalar b
`as arguments and resulting in a new vector where the elements
`of vector A is shifted a numberofpositions. Element i of vector
`A is placed at position it+b in the new vector. The remaining
`
`elements of the resulting vector are assigned a value ofzero.
`
`Thus:
`
`IF ( i>=b) AND (i<b+x))
`
`{ cj =ajy }
`
`ELSE
`
`{ G=0 }
`
`The sign of b hence indicates the direction of the shift, while
`
`the magnitude of b indicates the numberofpositions the
`
`elements shall be shifted.
`
`On Intel Pentiumprocessors and otherlittle-endian
`
`microprocessors, a positive b value correspondsto a left shift,
`
`while a negative b-value correspondsto a right shift. On big-
`
`endian microprocessors the shift direction is reversed.
`
`Can be implemented using the PSLLQ and PSRLQ (packed
`shift left/right logical quadword) instructions on processors ©
`
`
`
` supporting MMX technology.
`
`Bit wise OR of all bits in vector A and B.
`
`
`
`
`Used in combination with the SHIFT-operation to combine
`elements from two vectors into a new vector.
`
`An expression representing element number1 of vectoror array
`A
`
`position b to c of the array A.
`
`An expression representing a vector consisting of elements with
`values a, b, c, d, e, f, g, and h.
`
`Preferred Embodiment of the Invention
`
`5
`
`The preferred embodimentwill be described with reference to the drawings. For each
`sequence in the database, a sequence similarity-searching program computes an
`alignment score that represent the degree of similarity between the query sequence and
`
`

`

`WO 02/27638
`
`PCT/NO01/00394
`
`11
`
`the database sequence. By also taking the length and composition of the query and
`database sequences into account, a statistical parameter can be computed and used to
`rank the database sequencesin order of similarity to the query sequence. The raw
`alignment score is based on a substitution score matrix, (e.g. BLOSUM62), representing
`the similarity between two symbols, and an affined gap penalty function based on a gap
`open and a gap extension penalty.
`In the following description vectors of 8 elements are used with 8 bits each,
`totalling 64 bits. However, this is only for the ease of description. Many other
`combinations of vector and element sizes are possible and can easily be generalised
`from the description below. Any numberof vector elements larger than one, and any
`numberof bits for each element is possible. However, an element size of 8 bits, and a
`vector size of 8, 16 or 32 elements are probably the most useful.
`In the following description the methodis illustrated with a possible
`implementation using Intel’s microprocessors and their MMX and SSE technology.
`However, this is just for the ease of description. An implementation using other
`microprocessors and other SIMD technologyis also possible.
`
`The Smith-Waterman algorithm
`
`To compute the optimal local alignment score, the dynamic programmingalgorithm by
`Smith and Waterman (1981), as enhanced by Gotoh (1982), is used. Given a query
`sequence A of length m, a database sequence B of length m, a substitution score matrix
`Z, a gap open penalty g and a gap extension penalty r, the optimal local alignment score
`¢ can be computed by the following recursion relations:
`
`e,, = maxie,jo h,,pit qj-r
`
`fy = mifo, ohia,, -a}—1marr[iLBLibe,,.f,,.0}
`t="max, }
`
`Here, e,, and f,, represent the maximum local alignment score involving thefirst 7
`symbols ofA and the first7 symbols ofB, and ending with a gap in sequence B or A,
`respectively. The overall maximum local alignment score involving the first i symbols
`ofA andthe firstj symbols ofB, is represented by h,,. The recursions should be
`calculated with 7 going from 1 to m andj from 1 to 1, starting with @,,= Fiz =h,; =0
`for all i= 0 or j =0. The order of computation of the values in the alignment matrix is
`
`20
`
`25
`
`30
`
`

`

`WO 02/27638
`
`PCT/NO01/00394
`
`12
`
`strict because the value of any cell cannot be computed before the valueofall cells to
`the left and above it has been computed, as shownby the data interdependence graph in
`figure 1. An implementation of the algorithm as described by Gotoh (1982) has a
`running time proportional to mn.
`
`Parallelisation
`
`The Smith-Waterman algorithm can be madeparallel on twoscales. It is fairly easy to
`distribute the processing of each of the database sequences on a numberof independent
`processors in a symmetric multiprocessing (SMP) machine. On a lowerscale, however,
`distributing the work involved within a single database sequenceis a bit more
`complicated. Figure 1 showsthe data interdependencein the alignment matrix. The final
`value, h, of any cell in the matrix cannot be computed before the value ofall cells to the
`left and aboveit has been computed. But the calculations of the values of diagonally
`arranged cells parallel to the minor diagonal (see figure 2a and 3a) are independent and
`can be done simultaneously in a parallel implementation. This fact has been utilised in
`earlier SIMD implementations (Hughey, 1996; Wozniak, 1997).
`
`The inventive approach
`
`The main features of the implementation accordingto the invention, are:
`
`e Vectors parallel to the query sequence
`e Vector generalisation of the SWAT-optimisations
`e
`8-way parallel processing with 8-bit values
`e Query sequence profiles
`e Unsigned arithmetics
`
`e
`
`Saturated arithmetics
`
`e General code optimisations
`
`These concepts are described in detail below.In orderto illustrate and exemplify the
`description, pseudo-code for the present method is shown in figure 4.
`
`In the pseudo-code, the method accordingto the invention through figures 4a and 4b,is
`illustrated using vectors of 8 elements, howeverthe present invented method is general
`and can be implemented with vectors of any numberof elements.
`
`10
`
`20
`
`25
`
`30
`
`35
`
`

`

`WO 02/27638
`
`PCT/NO01/00394
`
`13
`
`The pseudo-code assumesthat the query sequence length (m) is a multiple of the vector
`size, 8. This can be achieved by padding the query sequence and query score profile.
`—
`
`All vector indices start at zero as is usual in programming languages (notone,as is
`usual in ordinary mathematics notation).
`
`Uppercaseletters are generally used to represent vector or array variables or operations,
`while lowercase letters are generally used to represent scalar variables or operations.
`
`The S-matrix is an x times n query-specific score matrix representing the score for
`substituting any of the x different possible database sequence symbols with the query
`symbolat any of the n query positions. In general, x just represents the size of the
`alphabet from which the sequence symbols belong to. For amino acid sequences, x is
`typically 20 (representing the 20 natural aminoacids) or slightly larger (to include also
`ambiguous and other symbols). For nucleotide sequences, x is typically 4 (representing
`the 4 nucleotides adenine, cytosine, guanine and thymine/uracil), or larger (to include
`also ambiguous and other symbols). The S-matrix is usually precomputed from a query
`sequence and a substitution score matrix, but may also represent a general queryprofile,
`which may be based on scores calculated from a multiple alignmentof a protein
`sequence family.
`
`The H-vectors holds the h-values, representing the optimal local alignment score for a
`given matrix cell. The H-vectors represent a part of the HH-array. The E-vector holds
`the e-values, representing scores from alignments involving a database sequence gap.
`The E-vectors represent a part of the EE-array. The F-vector holds the f-values, and
`temporary f-values, representing scores from alignments involving a query sequence
`
`gap.
`
`In the LASTFvector, only a single element (at index 0) is used. It represents the
`potential score from previous rounds involving a gap in the query sequence. The actual
`value used has the gap opening penalty added,in order to simplify later calculations.
`
`Vectors parallel to the query sequence
`
`Despite the loss of independence between the computations of each of the vector
`elements, it was decided to use vectors of cells parallel to the query sequence (as shown
`
`10
`
`20
`
`25
`
`30
`
`35
`
`

`

`WO 02/27638
`
`PCT/NO01/00394
`
`14
`
`in figure 2b and 3b), instead of vectors ofcells parallel to the minor diagonal in the
`matrix (as shown in figure 2a and 3a). The advantage of this approach is the much-
`simplified and faster loading of the vector of substitution scores from memory. The
`disadvantage is that data dependencies within the vector related to the f-value
`computations (see figure 3b) must be handled. Eight cells are processed simultaneously
`along each column as indicated in figure 2b. Vectors are used to represent the h-, e- and
`f-values in the recurrence relations in groups of eight consecutive cells parallel to the
`query sequence. Vectors are also used to represent zero, q, r, qtr and other constants.
`Using vector processing, the value of eight h-, e- or f-values may be computed in
`parallel (the h, e and f as described in the equations on page 11).
`
`Vector generalisation of the SWAT-optimisations
`
`Asalready indicated, we have to take into account that each elementin the vectoris
`dependent on the element aboveit, as shown in figure 3b, because of the possible
`introduction of gaps in the query sequence. We employ a vector generalisation of the
`SWAT-optimisations and other optimisations to make the principle of query-parallel
`vectors efficient. The conceptis illustrated in the p

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.

We are unable to display this document.

PTO Denying Access

Refresh this Document
Go to the Docket