throbber
PROTEIN SEQUENCE
`AND STRUCTURE
`COMPARISON ON
`MASSIVELY
`PARALLEL COMPUTERS
`
`Robert Jones
`
`THINKING MACHINES CORPORATION
`CAMBRIDGE, MASSACHUSETTS 02142
`
`Summary
`Sequence comparison has become a standard tool in
`the analysis of newly determined protein sequences.
`As the database of known sequence grows not only
`does the cost of database searching increase but so
`too does the demand for that service. These factors
`conflict directly with the desire to use the most sensi-
`tive methods available. The use of massively parallel
`computers for database searching provides a solution
`to this problem and is helping in the development of
`new methods for both sequence and structure com-
`parison.
`
`138
`
`Introduction
`At the heart of computational molecular biology lie
`methods for comparing macromolecules. By compar-
`ing the sequences of regions of DNA one can locate
`potential genes and their regulatory signals. By com-
`paring the sequences of proteins, the function of a
`novel sequence can be suggested and domains that are
`important for structure and function can be discovered.
`The comparison of the three-dimensional structures of
`proteins can help identify common folding domains
`and may lead to improved methods of structure
`prediction.
`The power of these comparative methods is a direct
`result of the actions of natural selection. All the proteins
`that we observe today have evolved from ancestral
`forms, with natural selection favoring those most suited
`for the present environment. In the regions of a protein
`that perform a valuable function, evolution is tightly
`constrained with respect to the mutations that are ac-
`ceptable. Less important parts of the molecule are less
`strongly constrained, and related proteins are likely to
`exhibit diversity in these regions. Proteins that perform
`the same function but are from distantly related species
`often contain clearly delimited domains of conserved
`sequence that are a strong indication of functional
`importance.
`A striking example of the power of the conserva-
`tion of sequence domains between distantly related pro-
`teins, and of the power of sequence comparison in es-
`tablishing such relationships, involves the gene involved
`in the disease cystic fibrosis. At the time of its isolation
`the function of this gene was unknown. After the DNA
`sequence was determined, the translated protein se-
`quence was compared with the database of all known
`proteins. Striking similarities were found to a large and
`diverse family of proteins involved in solute transport
`across cell membranes (Riordan et al., 1989). Figure 1
`shows part of the alignment of the protein sequence
`with three bacterial proteins. On the basis of sequence
`similarity it was proposed that the human protein was
`involved in membrane transport and contained a bind-
`ing site for ATP. Further work has revealed that the
`protein is involved in chloride ion transport, and that
`
`Petitioner Microsoft Corporation - Ex. 1040, p. 138
`
`

`

`knowledge is being used to develop therapies with
`which to treat the disease.
`
`Sequence Comparison
`Figure 1 also serves to illustrate one of the difficulties of
`sequence comparison in that the task is not simply one
`of string matching. Amino acids resemble each other to
`varying degrees according to the physical and chemical
`properties of their side chains. Based on some similarity
`metric that includes this information, sequence compar-
`ison algorithms must find the best alignment between
`two sequences, matching up positions to achieve the
`best alignment score and introducing gaps into the se-
`quences if this would improve the score. This task can
`be described as a problem in optimization, for which the
`most sensitive method available is based on the tech-
`nique of dynamic programming.
`The most versatile algorithm that has emerged in
`this area is that of Smith and Waterman (1981) which
`identifies the most conserved segments within a pair of
`sequences. If the two sequences are closely related, then
`these regions are likely to cover the entire length of the
`molecules. If they are only distantly related, then the
`algorithm will identify the best region within otherwise
`very different sequences. When one wants to find dis-
`tant relationships between proteins it is exactly this be-
`havior that is desired. There are many variants of the
`basic method that identify the best N suboptimal align-
`ments and produce alignments of multiple sequences
`(Taylor, 1987; Waterman and Eggert, 1987).
`For practical purposes the sensitivity and versatility
`of dynamic programming must be viewed in the con-
`text of its relatively high computational cost. The com-
`parison of two sequences can be accomplished by any
`workstation, but the thorough study of a sequence re-
`quires that it be compared with all other known se-
`quences. Release 67 of the GenBank DNA sequence
`database contains 43,903 sequences, representing
`55,169,276 nucleotides. Even with fast serial machines a
`database search using dynamic programming will take
`several hours. This problem is destined to get worse, as
`the size of the database is growing exponentially, as
`shown in Figure 2. With the initiation of projects to map
`
`Fir. 1. Sequence
`conservation between
`the protein involved in
`cystic fibrosis and three
`bacterial transport
`proteins By convention the 20
`possible amino acids are represented
`as characters of the alphabet Amino
`acids resemble each other to varying
`degrees as a result of the physical
`and chemical properties of their side
`chains. The most conserved positions
`in the alignment are shown in
`boldface.
`
`Fig. 2. Growth of the
`GenBank DNA sequence
`database
`
`Petitioner Microsoft Corporation - Ex. 1040, p. 139
`
`

`

`and sequence a number of genomes, most notably the
`Human Genome Project, this pattern of growth is likely
`to be sustained.
`Several approaches have been followed to improve
`performance while attempting to retain the sensitivity
`of the original method. Software based on the search
`technique of hashing (FASTP) is widely used, and re-
`cently a faster method that employs a discrete state fi-
`nite automaton search (BLAST) has been developed
`(Pearson and Lipman, 1988; Altschul et al., 1990). Both
`approaches have contributed greatly to the field but fail
`to achieve the best possible sensitivity and may miss im-
`portant similarities between distantly related proteins.
`Very high performance may be possible with cus-
`tom VLSI devices that embody a specific algorithm,
`such as those commonly found in graphics worksta-
`tions. Special-purpose hardware for sequence compar-
`ison has been designed and is under active develop-
`ment (Lipton and Lopresti, 1985; T. Hunkapiller, M. S.
`Waterman, R. Jones, J. Peterson, and E. Chow, unpub-
`lished results). Such devices are inevitably less flexible,
`in terms of the underlying algorithm, than a piece of
`software that may be changed at will to reflect improve-
`ments in the method. Here is where general-purpose
`supercomputers can play a major role, achieving high
`performance but retaining the flexibility of software.
`
`Sequence Comparison on Massively
`Parallel Computers
`As described above, the method of choice for sequence
`comparison is based on dynamic programming (Smith
`and Waterman, 1981). In this algorithm, for two se-
`quences of length M and N, a matrix of M-by-N ele-
`ments must be computed using the recurrence relation
`shown in Figure 3. The term s(aj,bj) is the similarity
`between amino acid i of sequence a and amino acid j of
`sequence b. The term w is the cost of inserting a gap into
`one or the other of the sequences. The recurrence re-
`lation lays down the order in which elements of the
`matrix must be computed, in that the values for ele-
`ments (i - 1, j - 1), (i, j - 1), and (i - 1, J) must al-
`ready be known before element (ij) can be evaluated.
`The score of the best subalignment is simply the max-
`imum score H;,j over the entire matrix (Fig. 4). The
`
`alignment itself is found by tracing back from the loca-
`tion of that maximum until the score falls to zero.
`On massively parallel architectures the dependency
`that is inherent in the recurrence relation can be ex-
`ploited to allow many elements to be computed in par-
`allel. As shown in Figure 5, all elements that lie on an
`antidiagonal can be computed in parallel provided all
`those that lie above and to the left have already been
`computed. This is exploited in the implementation of
`this algorithm on the Connection Machine CM-2, a
`massively parallel computer with a SIMD architecture
`(Thinking Machines Corporation, 1991). In this partic-
`ular application the processors of the machine are con-
`figured as a one-dimensional array that computes an
`entire antidiagonal at a time. To compute the entire
`matrix of M-by-N elements an array of M processors
`must compute M + N - 1 consecutive antidiagonals.
`In the context of the matrix one can visualize the array
`of processors sweeping across the matrix from left to
`right. Each processor is responsible for computing one
`row of the matrix, as shown in Figure 6 (Lander,
`Mesirov, and Taylor, 1988; Jones et al., 1990).
`An alternative implementation of the algorithm on
`the AMT Distributed Array Processor (DAP) computes
`one row of the matrix in parallel, as opposed to an
`antidiagonal (Collins, Coulson, and Lyall, 1987; Collins
`and Reddaway, 1990). This has twice the processor uti-
`lization of the antidiagonal form and therefore achieves
`higher performance. Its disadvantage lies in its require-
`ment for a restricted form of gap penalty, in which a
`single cost is charged for a gap regardless of length.
`The antidiagonal form allows the cost to rise as the
`length of the gap increases. In particular, it permits
`the most general case of affine gap penalties that ap-
`pear to mimic most accurately the distribution of gaps
`in well-studied alignments.
`On a 65,536 processor Connection Machine CM-2,
`the &dquo;antidiagonal&dquo; implementation of the algorithm can
`compute approximately 100 million matrix entries per
`second whereas the &dquo;row&dquo; implementation can achieve
`approximately 150 million matrix entries per second.
`The difference in processor utilization between the two
`approaches and the type of communications operations
`involved account for the performance figures. Nicholas
`
`Petitioner Microsoft Corporation - Ex. 1040, p. 140
`
`

`

`and associates (1991) have used a Connection Machine
`CM-2 to compute the scoring matrices and a CRAY
`Y-MP, linked to the CM-2 via a high bandwidth com-
`munications channel, to perform the tracebacks and
`alignments.
`A very different approach is possible with coarse-
`grain parallel architectures such as the Intel iPSC/860
`and the Connection Machine CM-5. Each node has a
`considerable amount of local memory and can run a
`totally separate program from what its neighbors are
`running. This has been exploited for database search-
`ing by simply placing a different section of the database
`in the memory of every node. A separate comparison is
`performed in each node, with the results being merged
`on completion of the task. This approach involves little
`change to existing serial codes and hardly any interpro-
`cessor communications. It has yielded performance of
`12 million matrix entries per second on a 32-node Intel
`iPSC/860 (Deshpande, Richards, and Pearson, 1991)
`and 33 million matrix entries per second on a 32-node
`CM-5 (R. Jones, unpublished results).
`
`Sequence Comparison with
`Multiple Alignments
`There are numerous examples in which biologists have
`demonstrated a close structural or functional relation-
`ship between two proteins but the sequence compari-
`son, as described above, has failed to detect any signif-
`icant similarities. With further biological knowledge
`about the proteins involved, one might be able to bias
`the alignment process in some way such that the simi-
`larity was revealed. In the general case, however, no
`such information is available.
`Some information of this form is generated auto-
`matically whenever two related protein sequences are
`aligned. From examination of the alignment it is clear
`that some positions are conserved and some differ.
`Given that the proteins have been subject to evolution,
`it is reasonable to assume that the conserved positions
`are more important to structure and function than the
`variable ones. These positions should receive a higher
`weight in the alignment process. Multiple sequence
`alignments may be used in comparisons with individual
`sequences using a method known as &dquo;profile analysis&dquo;
`
`Fig. 3. The recurrence
`relation used in the
`Smith-Waterman
`algorithm M.i,}) is the score in
`element (il), s(aM is the similarity
`between two amino acids, and w is
`the cost of introducing a gap into the
`alignment
`
`Fig. 4. Calculation of
`the scoring matrix
`
`Petitioner Microsoft Corporation - Ex. 1040, p. 141
`
`

`

`(Gribskov, MacLachlan, and Eisenberg, 1987). Implicit
`in this approach is that conserved positions receive a
`higher weight and play a greater role in the alignment
`process.
`The sequence comparison software on the CM-2
`has been adapted to use alignments in this manner. Eric
`Lander and the author have used this approach to re-
`peatedly search the sequence databases using a multiple
`sequence alignment (R. Jones, E. Lander, unpublished
`results). On every cycle the best matching database se-
`quence is added to the multiple alignment. This ap-
`proach shows promise in extending the sensitivity of
`database searches in those cases where an initial align-
`ment of at least two sequences can be established. Be-
`cause it involves repeated database searches this ap-
`proach is not feasible for serial computers. This high-
`lights the role of supercomputers in advancing not only
`the performance of existing software but also the
`sensitivity of these methods through more extensive
`computation.
`
`Protein Structure Comparison
`The determination of the full three-dimensional struc-
`ture of a protein at atomic resolution provides biologists
`with a wealth of information. Such detailed knowledge,
`however, requires a great deal of work, and as a result
`there are approximately 600 structures known com-
`pared to approximately 30,000 protein sequences. Even
`with dramatic technological advances this disparity is
`not likely to change much in the near future. The ability
`to predict structure directly from sequence has there-
`fore become a major research goal for many groups,
`and the success of this effort would revolutionize the
`field of molecular biology. The basic approach of struc-
`ture prediction is to examine known structures, identify
`conserved structural motifs, and attempt to find se-
`quence patterns that correlate with them. The compar-
`ison of three-dimensional structures is fundamental to
`these efforts, but until recently such comparisons have
`been performed using simple techniques that lack sen-
`sitivity and are not suited for database searching.
`Most of these methods rely on the rotation and
`translation of one structure to minimize the root mean
`square difference between atomic coordinates (Mat-
`
`Petitioner Microsoft Corporation - Ex. 1040, p. 142
`
`Fig. 5. All elements of
`a given antidiagonal can
`be computed in parallel
`
`Fig. 6. On each
`iteration an entire
`antidiagonal is
`computed in parallel
`
`

`

`hews and Rossman, 1985). Extensions that performed
`this on limited regions of the two structures allowed
`weaker matches to be found, but these lack the ability to
`introduce gaps into the alignment (Rossman and Argos,
`1976; Remington and Matthews, 1978). In the context
`of two structures a gap may represent two loops of dif-
`ferent length such that one or more alpha carbon atoms
`have no equivalent atoms in the other structure.
`The algorithms of sequence comparison have sev-
`eral properties that are desirable in the alignment of
`structures. They involve a measure of how similar two
`positions are, and they include a penalty for introduc-
`ing gaps into an alignment. On the basis of these pa-
`rameters the algorithms will produce the best subalign-
`ment of the two sequences, Sequence comparison uses a
`measure of the physical and chemical similarity of
`amino acids to identify equivalent positions in the two
`sequences. Within a three-dimensional structure each
`amino acid is represented by a number of atoms. This is
`simplified by considering only the a-carbon atom of
`each amino acid that contributes to the peptide back-
`bone. To compare structures in the same manner as
`sequences it is necessary to develop a concise measure
`of the structural similarity of two a-carbon atoms. An
`elegant similarity measure of this form has been devel-
`oped by Taylor and Orengo (1989a, 1989b). Other
`groups have developed related approaches based in
`part on sequence comparison algorithms (Zuker and
`Somorjai, 1989; Sati and Blundell, 1990).
`The approach of Taylor and Orengo is to &dquo;view&dquo;
`the rest of the protein from the local frame of reference
`of a single- a-carbon atom. This view can be represented
`as a set of vectors through space to all the other a-car-
`bon3, as shown in Figure 7. To compare two a-carbon
`äWffi8, one then compares two arrays of the interatomic
`vectors. These arrays may themselves be thought of as
`two sequences of three element vectors, and one can
`apply conventional sequence comparison algorithms to
`find the highest scoring alignment of the two arrays. At
`this level, the similarity measure used is the reciprocal
`of the difference between two vectors. The score be-
`tween two atoms is the sum of this measure over the
`best matching region of the vector arrays.
`Clearly this procedure involves a large amount of
`
`&dquo;On every cycle the best matching
`database sequence is added to the
`multiple alignment This approach
`shows promise in extending the
`sensitivity of database searches in
`those cases where an initial align-
`ment of at least two sequences can
`be established Because it involves
`repeated database searches this
`approach is not feasible for serial
`computers.-
`
`Fig. 7. The structural
`environment of an atom
`is described by the set
`of vectors between it
`and all other atoms in
`the same protein
`
`143
`
`Petitioner Microsoft Corporation - Ex. 1040, p. 143
`
`

`

`-Often the more distant relationships
`provide the most interesting biologi-
`cal insight but it is precisely these
`that may be missed by an approxi-
`mate method. The performance of
`massively parallel computers makes
`the use of our most sensitive meth-
`ods feasible once again.&dquo;
`
`Fig. 8. The most similar
`substructure between
`the proteases of virus
`HIV1 and bread mold
`
`Fig. 9 The sequence
`segments that
`correspond to the
`structural alignment This
`includes the conserved sequence
`DTG that plays an important role in
`the active sites of aspartic proteases.
`
`computation. Not only do we compute a &dquo;top-level&dquo;
`structure comparison matrix to produce the final align-
`ment, but for every element (2*, 1) in that matrix we com-
`pute an additional comparison matrix of the same di-
`mensions. As the top-level matrix is of order N2, the
`overall procedure becomes a substantial task of order
`N4. This effectively precludes searching structural da-
`tabases on serial machines. Approaches that speed up
`this computation by avoiding substantial parts of the
`computation have been developed, but these inevitably
`sacrifice sensitivity to some degree (Orengo and Taylor,
`1990).
`By using the massively parallel implementations of
`sequence comparison described above, Nomi Harris
`and the author have been able to achieve performance
`with the full algorithm that is of order N3 on the CM-2
`(R. Jones and N. Harris, unpublished results). As with
`sequence comparison, the use of a massively parallel
`machine enables high performance while retaining use
`of the most sensitive method available. Computing the
`similarity between all pairs of a-carbon atoms takes
`most of the time. With appropriate layout of the data on
`the CM-2, we are able to compute the similarity between
`a single a-carbon atom in one protein and all atoms in
`the other protein in parallel. Once all the pairwise sim-
`ilarities have been calculated, it becomes a simple matter
`to perform the top-level comparison and generate the
`alignment.
`The output of the program is in the form of a se-
`quence alignment that indicates which a-carbon atoms
`in the two structures are equivalent. For purposes of
`visualization the alignment is used to perform a least
`squares fit of one structure onto the other. Although
`the rotations and translations are applied to the entire
`protein, only atoms that are included in the alignment
`are used in the evaluation of the least squared fit. In
`addition, the similarity between individual pairs of at-
`oms is used to weight the more similar positions more
`heavily. Although this form of display becomes confus-
`ing for alignments that contain large gaps, the ap-
`proach serves very well in most cases. Figure 8 shows an
`alignment of two protease structures from the virus
`HIV 1 and from common bread mold (R/Mze~y chinen-
`sis). The most similar substructure is outlined here in
`
`Petitioner Microsoft Corporation - Ex. 1040, p. 144
`
`

`

`boldface. Figure 9 shows the sequences of the two sub-
`structures. The most similar substructure is found to
`contain the active sites of both enzymes, including the
`conserved sequence DTG. Examples like this provide
`strong support for the basic tenet of comparative meth-
`ods, namely that functionally important domains within
`proteins are conserved through evolution.
`As described above, sequence comparison tools are
`available for a variety of purposes. We are currently
`developing an analogous set of tools for structure com-
`parison that will permit database searching and multi-
`ple alignment. A major use for these tools will be to
`identify conserved structural motifs in a more rigorous
`manner than has yet been possible and to study rela-
`tionships between their sequence and structure.
`
`Conclusions
`A variety of sensitive sequence comparison tools exist
`on serial computers for studying small numbers of se-
`quences, but many of these become impractical when
`the task is scaled up to involve thousands of sequences.
`The development of fast approximate methods for se-
`rial machines allows the tasks to be achieved, but at the
`cost of some sensitivity. Often the more distant relation-
`ships provide the most interesting biological insight, but
`it is precisely these that may be missed by an approxi-
`mate method. The performance of massively parallel
`computers makes the use of our most sensitive methods
`feasible once again. Given the exponential increase in
`the size of sequence databases it is critical that compu-
`tational biology should exploit the leading edge of com-
`puter hardware and software in order to keep pace with
`the data.
`In particular, over the next few years a need will
`arise for a substantial coordinated effort to support da-
`tabases of the relationships between sequences in addi-
`tion to the sequences themselves. New sequences will be
`entering the databases at the rate of perhaps thousands
`each day, and every one will have to be compared to
`existing sequence families to determine its relation-
`ships. These tasks pose substantial challenges both in
`terms of the algorithms we use for comparisons and in
`terms of the computing infrastructure we develop in
`support of molecular biologists.
`
`Perhaps the most exciting prospect for the next few
`years is that the distinction between sequence and struc-
`ture as two separate fields of study is beginning to blur.
`The new algorithms for structure comparison have
`united the methodology of the two areas, and their use
`should assist in the accurate prediction of structure
`from sequence.
`
`ACKNOWLEDGMENT
`
`Connection Machine
`Model CM-2 and CM-5
`are registered trademarks
`of Thinking Machines
`Corporation. Distributed
`Array Processor (DAP) is
`a registered trademark of
`Active Memory Technolo-
`gies. Intel iPSC/860 is a
`registered trademark of
`Intel Supercomputing
`Systems Division. Gen-
`Bank is a registered
`trademark of the U.S.
`Department of Health
`and Human Services.
`
`BIOGRAPHY
`
`Robert Jones received his
`B.A. from Pembroke Col-
`lege, Oxford, in 1981 and
`his D.Phil. from the Uni-
`versity of Sussex in 1985.
`He has held postdoctoral
`research associate posi-
`tions in the Department
`of Molecular Genetics and
`Cell Biology at the Uni-
`versity of Chicago and in
`the Department of Math-
`ematics at the University
`of Southern California.
`He is currently a scientist
`with Thinking Machines
`Corporation in Cam-
`bridge, Massachusetts,
`where he works on a
`number of problems in
`computational molecular
`biology.
`
`SUBJECT AREA
`EDITOR
`
`Dan Sulzbach
`
`REFERENCES
`
`Altschul, S. F., Gish, W.,
`Miller, W., Myers, E. W.,
`and Lipman, D. J. 1990.
`Basic local alignment
`search tool. J. Mol. Biol.
`215:403-410.
`Collins, J. F., Coulson,
`A. W. F., and Lyall, A.
`1987. Protein and nucleic
`acid sequence database
`searching: a suitable case
`for parallel processing.
`Computer J. 30:420-424.
`Collins, J. F., and
`Reddaway, S. F. 1990.
`High efficiency database
`searching: use of the dis-
`tributed array processor.
`In Computers and DNA.
`Reading, Massachusetts:
`Addison-Wesley, pp. 85-
`91.
`Deshpande, A. S.,
`Richards, D. S., and
`Pearson, W. R. 1991. A
`platform for biological
`sequence comparison on
`parallel computers. Com-
`puter Applications in the
`Biological Sciences 7:237-
`247.
`Gribskov, M.,
`MacLachlan, A. D., and
`Eisenberg, D. 1987. Pro-
`file analysis: detection of
`distantly related proteins.
`Proc. Natl. Acad. Sci. USA
`84:4355-4358.
`
`Petitioner Microsoft Corporation - Ex. 1040, p. 145
`
`

`

`Jones, R., Taylor, W.,
`Zhang, X., Mesirov, J. P.,
`and Lander, E. 1990. Pro-
`tein sequence comparison
`on the Connection Ma-
`chine CM-2. In Computers
`and DNA. Reading,
`Massachusetts: Addison-
`Wesley, pp. 99-107.
`Lander, E., Mesirov, J.,
`and Taylor, W. 1988. Pro-
`tein sequence comparison on
`a data parallel computer.
`1988 Conference on
`Parallel Processing.
`Philadelphia: Penn. State
`Press.
`Lipton, R.J., and
`Lopresti, D. 1985. A sys-
`tolic array for rapid string
`comparison, edited by H.
`Fuchs. Rockville,
`Maryland: Computer
`Science Press.
`
`Matthews, B. M., and
`Rossman, M. G. 1985.
`Comparison of protein
`structures. Meth. Enzymol.
`115:397-420.
`
`Nicholas, H., Giras, G.,
`Hartonas-Garmhausen,
`V., Kopko, M., Maher, C.,
`
`and Ropelwski, A. 1991.
`Distributing the compari-
`son of DNA and protein
`sequences across hetero-
`geneous supercomputers.
`In Proceedings of Supercom-
`puting ’91. New York: As-
`sociation of Computing
`Machinery, pp. 139-146.
`Orengo, C. A., and
`Taylor, W. R. 1990. A
`rapid method of protein
`structure alignment. J.
`Theoret. Biol. 147:517-551.
`Pearson, W. R., and
`Lipman, D. 1988. Im-
`proved tools for biological
`sequence comparison. Sci-
`ence 85:2444-2448.
`
`Remington, S. J., and
`Matthews, B. W. 1978. A
`general method to assess
`similarity in protein struc-
`tures, with applications to
`T4 bacteriophage
`lysozyme. Proc. Natl. Acad.
`Sci. USA 75:2180-2184.
`Riordan, J. R., Rommens,
`J. M., Kerem, B., Alon,
`N., Rozmahel, R.,
`Grzelczak, Z., Zielenski, J.,
`Lok, S., Plavsic, N., Chou,
`
`J-L., Drumm, M. L.,
`Iannuzzi, M. C., Collins,
`F. S., and Tsui, L-C.
`1989. Identification of the
`cystic fibrosis gene: clon-
`ing and characterization
`of complementary DNA.
`Science 245:1066-1073.
`Rossman, M. G., and
`Argos, P. 1976. Exploring
`structural homology of
`proteins. J. Mol. Biol.
`105:75-96.
`Sali, A., and Blundell,
`T. L. 1990. Definition of
`the general topological
`equivalence in protein
`structures—a procedure
`involving comparison of
`properties and relation-
`ships through simulated
`annealing and dynamic-
`programming. J. Mol.
`Biol. 212:403.
`
`Smith, T. F., and
`Waterman, M. S. 1981.
`Identification of common
`molecular subsequences.
`J. Mol. Biol. 147:195-197.
`Taylor, W. R. 1987. Mul-
`tiple sequence alignment
`by a pairwise algorithm.
`
`Computer Applications in the
`Biological Sciences 3:81-87.
`Taylor, W. R., and
`Orengo, C. A. 1989a. A
`holistic approach to pro-
`tein structure alignment.
`Protein Engineering 2:505-
`519.
`Taylor, W. R., and
`Orengo, C. A. 1989b.
`Protein structure align-
`ment. J. Mol. Biol. 208:1-
`22.
`Thinking Machines Cor-
`poration. 1991. Connec-
`tion Machine Model
`CM-2 Technical Sum-
`mary. Cambridge,
`Massachusetts: Thinking
`Machines Corporation.
`Waterman, M. S., and
`Eggert, M. E. 1987. A
`new algorithm for best
`subsequence alignments
`with application to tRNA-
`rRNA comparisons. J.
`Mol. Biol. 197:723-728.
`Zuker, M., and Somorjai,
`R. L. 1989. The align-
`ment of protein structures
`in three dimensions. Bull.
`Math. Biol. 51:55.
`
`Petitioner Microsoft Corporation - Ex. 1040, p. 146
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket