throbber
359
`
`Computational genetics: finding protein function by
`nonhomology methods
`Edward M Marcotte
`
`During the past year, computational methods have been
`developed that use the rapidly accumulating genomic data to
`discover protein function. The methods rely on properties shared
`by functionally related proteins other than sequence or structural
`similarity. Instead, these ‘nonhomology’ methods analyze
`patterns such as domain fusion, conserved gene position and
`gene co-inheritance and coexpression to identify protein—protein
`relationships. The methods can identify functions for proteins
`that are without characterized homologs and have been applied
`to genome-wide predictions of protein function.
`
`Addresses
`Molecular Biology Institute, UCLA-DOE Laboratory of Structural
`Biology and Molecular Medicine, University of California Los Angeles,
`PO Box 951570, Los Angeles, CA 90095-1570, USA and Protein
`Pathways Inc., 1145 Gayley Avenue, Ste 304, Los Angeles, CA
`90024, USA; e-mail: marcotte@mbi.ucla.edu
`
`Current Opinion in Structural Biology 2000, 10:359–365
`
`0959-440X/00/$ — see front matter
`© 2000 Elsevier Science Ltd. All rights reserved.
`
`Abbreviations
`COGS clusters of orthologs
`EST
`expressed sequence tag
`
`Introduction
`Biologists are in a delightful quandary. Thousands of
`potential genes are being discovered in the various
`genome sequencing projects, including those encoding
`many new families of proteins. Often, these proteins are
`evolutionarily conserved, but are of unknown function.
`This poses a fundamental problem to biologists: how can
`we discover the functions of these thousands of unknown
`proteins quickly and efficiently? Even more ambitious
`than knowing their specific biochemical functions, can we
`discover their broader functions — the cellular context,
`such as pathways and complexes, in which they operate?
`
`As difficult as this goal is, significant progress has been
`made in the past year both experimentally, by conducting
`genome-wide experiments measuring, for example,
`mRNA expression [1] or biochemical activity [2•], and
`computationally, by developing new analyses that work on
`fundamentally different principles from homology- or
`structure-based methods.
`
`This in silico progress stemmed from the realization that
`genomes contain considerable information about the func-
`tions of and relationships between genes and proteins.
`This functional information is encoded in forms such as
`patterns of gene fusion, conservation of gene position, pat-
`terns of gene co-inheritance and other sorts of
`evolutionary information. Such patterns are revealed by
`comparisons of multiple genomes, making these analyses
`
`only recently tractable. Also, additional data, such as gene
`coexpression measurements, provide analogous informa-
`tion within single organisms.
`
`The power of these new methods is that they produce net-
`works of functionally related proteins, even when the
`proteins have never been characterized. Protein function is
`defined by these methods in terms of context, that is,
`which cellular pathways or complexes the protein partici-
`pates in, rather than by suggesting a specific biochemical
`activity. However, in cases in which some of the proteins
`have a known function, their function can be extended to
`the most intimately linked uncharacterized proteins.
`Thus, the methods can be used both to find functional
`relationships and to assign general protein function.
`
`This results in an approach to finding protein function that
`is strikingly different from directly comparing amino acid
`sequences, although sequence comparisons are the basic
`tool used in many of the methods. The functional informa-
`tion discovered also differs from what might be learned
`either from direct sequence comparisons or from structural
`analyses, giving three relatively independent and comple-
`mentary routes to protein function, as shown in Figure 1.
`This review will discuss the main ideas behind nonhomol-
`ogy methods, the newest route to protein function.
`
`Evolution (some homology required)
`Several nonhomology methods take advantage of genetic
`variations among organisms to find protein function. The
`domain fusion method [3•] finds functionally related proteins
`by analyzing patterns of domain fusion. As illustrated in
`Figure 2, proteins found separately in one organism can
`often be found fused into a single polypeptide chain in
`another organism. That the separated proteins have a func-
`tional relationship can be inferred from knowledge of the
`fused protein, named the Rosetta stone protein for its abil-
`ity to reveal the relationship among its component parts.
`
`In many cases, the proteins linked by such a domain fusion
`event may even physically interact, especially in the case
`of protein pairs that have been filtered for false-positive-
`producing ‘promiscuous domains’ [3•] and in cases of
`high-scoring sequence matches to the Rosetta stone pro-
`tein [4•]. An example of this is the two Escherichia coli
`gyrase subunits GyrA and GyrB, which are found as fused
`homologs in yeast topoisomerase II [3•]. Such relationships
`are also common among separated eukaryotic proteins
`found fused in a prokaryotic Rosetta stone protein
`(EM Marcotte, unpublished data).
`
`In its simplest form, this analysis can be implemented
`[3•] by searching a large sequence database for homologs
`
`Petitioner Microsoft Corporation - Ex. 1037, p. 359
`
`

`

`360 Sequences and topology
`
`Figure 1
`
`Blast and
`Smith–Waterman
`
`Protein sequence
`
`Threading and
`fold recognition
`
`Figure 2
`
`(a)
`
`Homology
`
`Nonhomology
`
`Structure
`
`A
`
`B
`Organism 2
`
`D
`
`A
`
`D
`
`B
`
`F
`
`C
`
`Organism 3
`
`mRNA of C and D
`coexpressed
`in many different
`cellular conditions
`
`D
`
`Correlated mRNA
`expression
`
`C
`B
`Organism 1
`
`D
`
`A
`
`A
`
`Domain
`fusion
`
`Trees and clustering
`
`Domain fusions
`Phylogenetic profiles
`Correlated expression
`Conserved gene position
`
`Structural motifs and
`evolutionary tracing
`
`FUNCTION
`
`(b)
`
`Current Opinion in Structural Biology
`
`One can take several computational routes to discovering the function
`of a protein. On the left-hand route, the protein sequence is compared
`directly with other protein sequences [44,45]. Characterized sequence
`homologs or phylogenetic analyses (as in [17–19]) may suggest
`functional information. On the right-hand route, the protein sequence
`may be tested for compatibility with known three-dimensional protein
`structures [46]. Knowledge of the structure may then suggest
`functional information (e.g. as in [47,48]). Along the middle route are
`nonhomology methods. Sequence and structural homology reveal
`proteins of identical or equivalent function, whereas nonhomology
`methods identify interacting proteins, proteins with related functions or
`proteins operating in the same cellular context. Nonhomology methods
`return a network of relationships among proteins functionally linked to
`the query protein and function is both defined and inferred by this
`network of related proteins.
`
`of a query protein AA. Hits in this search will include
`direct homologs of the query protein (A′) and potential
`Rosetta stone fusion proteins (A–B). Each hit is then
`used as a query to search the genome of A and function-
`ally related B proteins will be found in this second
`search. Along with B proteins, hits in this search will
`include A, homologs of A (A′) and very distant homologs
`of A (A′′) [5], but the B proteins can be identified by
`their lack of homology to A or by their homology to dif-
`ferent regions of A–B than those homologous to A. Such
`an analysis recently proved useful in identifying a func-
`tional
`relationship between CHORD-containing
`proteins and Sgt1, proteins important for plant disease
`signaling and nematode development [6].
`
`In a related fashion, two proteins can be inferred to be
`functionally related if their genes are repeatedly found as
`neighbors on the chromosomes of different organ-
`isms [7•,8,9•], as shown in Figure 2. This conservation of
`relative gene position presumably derives from the organiza-
`tion of prokaryotic genes into operons in which each
`protein encoded by the operon performs a closely related
`task, such as the proteins of the lactose system [10] or pro-
`teins involved in iron uptake [11]. To find operons
`directly would require the identification of promoters and
`
`B
`
`Conserved
`gene
`position
`
`C
`
`Known metabolic
`pathway
`
`E
`
`Current Opinion in Structural Biology
`
`An example of deriving protein–protein relationships by nonhomology
`methods. Genes (labeled white boxes) are shown on the
`chromosomes (thick horizontal lines) of three different organisms. (a) It
`can be inferred that the proteins encoded by genes A–D are
`functionally related through patterns such as the conserved gene
`positions of B and C in organisms 1 and 3, the fusion of A and B into
`A–B in organism 2 and the coexpression of the mRNAs of C and D in
`organism 1. These results can be represented as a network of
`functional relationships, as shown in (b). If, for example, the function of
`B was unknown, it might be inferred from the functions of proteins A
`and C. The computational linkages may be supplemented by any
`experimentally observed interactions or known protein–protein
`relationships, such as those described in [35,38–40].
`
`regulatory elements; however, for operons containing evo-
`lutionarily conserved genes, large portions of the operons
`can often be reconstructed automatically simply by iden-
`tifying pairs of conserved gene neighbors [9•].
`
`As with domain fusion analysis, this approach can often
`identify interacting proteins [7•]. How well the approach
`will extend to eukaryotic genes remains to be seen, as
`eukaryotes generally lack operons. Examples of function-
`ally related eukaryotic gene neighbors do exist, however,
`such as in the TCL1 locus [12] or the cadherin pro-
`teins [13], so the technique may be useful. The quality of
`the functional relationships identified by this method is
`exceptional, but the coverage is unfortunately low because
`of the dual requirement of identifying orthologs in anoth-
`er genome and then finding those orthologs that are
`adjacent on the chromosome.
`
`Petitioner Microsoft Corporation - Ex. 1037, p. 360
`
`

`

`Nonhomology methods Marcotte 361
`
`A third nonhomology method works on the premise that
`proteins that operate together in the cell are often inherit-
`ed in a correlated fashion [14•]. That this is a reasonable
`assumption follows from the fact that proteins rarely work
`alone and many pathways or complexes are crippled by the
`loss of individual components. Thus, any organism that
`requires the complex or pathway carries the genes for most
`or all of its components; any organism lacking the complex
`or pathway often lacks all of the component genes. The co-
`inherited proteins can be identified in an automated
`fashion by comparing their phylogenetic profiles, strings that
`encode the presence or absence of sequence homologs in
`known genomes, as shown for a few proteins in Figure 3.
`
`Each phylogenetic profile is analogous to an abstract rep-
`resentation of an evolutionary tree; matching phylogenetic
`profiles therefore identifies proteins with similar patterns
`of inheritance. Note that no homology is required among
`the proteins with similar phylogenetic profiles; the pro-
`teins are co-inherited and, when many genomes (n > 10)
`are analyzed, usually functionally related, as in the exam-
`ples shown in Figure 3. The involvement of the
`uncharacterized SmpB protein family in protein synthesis,
`
`predicted by phylogenetic profiles [14•], was recently con-
`firmed by Karzai et al. [15].
`
`The differential genome analysis method of Huynen
`et al. [16] also takes advantage of gene presence and
`absence to associate phenotypes with genes: a list is pre-
`pared of genes shared among organisms that also share a
`given phenotype. This list of genes is filtered by removing
`the genes that occur in organisms lacking the phenotype.
`The remaining genes are correspondingly enriched for
`those that confer the phenotype.
`
`Homology (and evolution)
`The distinction between homology and nonhomology
`methods can be blurred, as even direct sequence compar-
`isons are enhanced by taking advantage of evolutionary
`variations. For example, Lichtarg et al. [17] showed that
`functional sites on proteins could be identified by analyz-
`ing amino acids conserved at different branching depths in
`phylogenetic trees of protein homologs. Likewise, varia-
`tions among protein homologs found by clustering the
`proteins in phylogenetic trees often reveal subtle special-
`ization in protein function. Recent examples of this
`
`cele
`tpal
`tmar
`syne
`rpxx
`pyro
`paer
`paby
`mtub
`mthe
`mpne
`mjan
`mgen
`hpy9
`hpyl
`hinf
`ecol
`ctra
`cpne
`bsub
`bbur
`aqua
`aful
`aero
`
`MSH1
`~PMS1
`MLH1
`PMS1
`MSH5
`MSH3
`MSH6
`MSH2
`MLH1
`YPL207W
`RPS4B
`RS4E
`EIF2A
`RNAP27
`R37A
`RPL43B
`RPS6A
`RS6
`RS33
`RS33
`RL31A
`R27A
`ADE13
`PUR2
`PURA
`
`DNA repair
`
`Ribosomal
`
`metabolism
`
`Purine
`
`Figure 3
`
`Phylogenetic profiles [14•] for three groups
`of yeast proteins (ribosomal proteins and
`proteins involved in DNA repair and purine
`metabolism) sharing similar co-inheritance
`patterns. Each row is a graphical
`representation of a protein phylogenetic
`profile, with elements colored according to
`whether a homolog is absent (white box) or
`present (colored box) in each of
`24 genomes (columns). When homology is
`present, the elements are shaded on a
`gradient from light gray (low homology) to
`black (strong homology). In this case,
`homologs are considered absent when no
`BLAST hits [44] are found with expectation
`(E) values < 1 × 10–5. When homologs are
`present, the profile receives a score
`(–1/log E) that describes the degree of
`sequence similarity with the best match in
`that genome. Note that an uncharacterized
`protein (YPL207W) clusters with the
`ribosomal proteins and can now be
`assigned a function in protein synthesis.
`
`Current Opinion in Structural Biology
`
`Petitioner Microsoft Corporation - Ex. 1037, p. 361
`
`

`

`362 Sequences and topology
`
`analysis include the MutS protein family [18] and proteins
`conserved between worm and yeast [19].
`
`A different aspect of evolutionary information is used in
`the calculation of COGS (clusters of orthologs) [20], in
`which proteins from different organisms are grouped
`together in such a way as to maximize their functional
`equivalence. COGS are generated by
`identifying
`orthologs or equivalent proteins among different organ-
`isms. Orthologs can be defined operationally as the
`symmetric top-scoring protein sequences in a sequence
`homology search. That is, a query sequence from
`genome 1 has an ortholog in genome 2 if searching the
`query versus genome 2 turns up the ortholog as the best
`match and searching the ortholog versus genome 1 turns
`up the query protein as the best match.
`
`COGS effectively cluster functionally equivalent pro-
`teins because of the power of orthology, which dictates
`that not only are sequences homologous, but also that
`they are the best homologs regardless of search direction.
`This symmetric homology detection relies on the
`absence of better homologs from each genome and,
`therefore, incorporates both evolutionary information and
`sequence matching. Phylogenetic profiles can be con-
`structed from orthologs, rather than from best homologs,
`and searched for exact matches at the COGS web site
`(http://www.ncbi.nlm.nih.gov/COG/) [20].
`
`No homology required (made up for with extra
`data)
`Each of the methods discussed above requires that a query
`protein have some sequence homologs in the database,
`even though direct sequence homology with these pro-
`teins may not be the basis for the analysis. This
`requirement is lifted for analyses of other genomic data,
`however, such as analysis of correlated mRNA expression lev-
`els, reviewed in [21,22]. Therefore, these techniques can
`find relationships among proteins that are absolutely
`unique. The premise of all expression clustering methods
`is, as in phylogenetic profiles, that proteins rarely work
`alone, but are often expressed at the same time or place as
`functionally related proteins. By varying the conditions
`that cells are grown in or by choosing different cell types or
`cells from different tissues, enough variation in gene
`expression can be observed to identify coexpressing genes.
`
`Such clustering requires additional data beyond genome
`sequences, to date relying on measurements of cellular
`mRNA levels by DNA microarrays, as in [23], serial analy-
`sis of gene expression (SAGE) libraries [24] or expressed
`sequence tag (EST) libraries [25]. Underlying the cluster-
`ing of genes by their mRNA coexpression levels is the
`assumption that coexpressed genes will generally be func-
`tionally related if enough different conditions have been
`tested. Such clustering performs well for strongly coex-
`pressed genes, such as ribosomal subunits, and poorly for
`other gene groups. It requires fairly large sets of data, such
`
`as more than 70 DNA chip measurements of yeast mRNA
`levels [26••,27••] or hundreds of human EST libraries from
`different tissues and cells [28•].
`
`In a manner analogous to analyzing gene co-inheritance
`or mRNA expression patterns, an organism’s proteins can
`probably be clustered effectively by their own protein
`coexpression patterns under varying growth conditions.
`For protein coexpression analysis, one directly measures
`the functional species (proteins) and it is likely that the
`clusters calculated on this basis will be more powerful for
`protein function assignment than mRNA expression
`clustering, especially given that protein and mRNA lev-
`els are often surprisingly uncorrelated [29•]. Protein
`expression patterns have been measured directly by
`mass spectrometry of protein mixtures [30] and by vari-
`ous two-dimensional gel electrophoresis techniques,
`with the proteins on the gel identified by amino acid
`content [31] or mass spectrometry [32•,33•]. The tech-
`niques are labor intensive and have not, just yet,
`produced a sufficient dataset for coexpression analyses.
`This is likely to change in the very near future, given the
`current emphasis on genome-wide and proteome-wide
`analyses. Protein expression patterns have also been
`measured, not as a function of growth conditions, but
`spatially, as β-galactosidase
`fusions
`in Xenopus
`embryos [34•], allowing functionally related proteins to
`be grouped by their spatial coexpression patterns.
`
`Building a genome-wide network of proteins
`The methods described above are easily applied on a
`genome-wide scale, combining results from each method to
`build a network of the functional relationships among an
`organism’s proteins. Such a network was calculated recently
`for yeast proteins [27••], identifying 93,750 functional links
`among 4701 of the 6217 proteins in yeast. A subset of this
`network is drawn in Figure 4, showing the amazing com-
`plexity of the connections generated by these methods.
`Perhaps even more surprising is the high degree of connec-
`tivity among the proteins, attributable in part to homology
`and false-positive predictions, but still observable even in
`entirely experimentally derived networks, such as the con-
`nected set of 542 proteins linked according to 727
`experimentally observed protein–protein interactions from
`the Database of Interacting Proteins (inset, Figure 4) [3•,35].
`These studies reinforce the idea that proteins rarely work in
`isolation, but are instead linked into an interconnected net-
`work of physical interactions and functional relationships.
`
`Why is this computational genetics?
`Unlike sequence homology and inferences from protein
`structure, nonhomology methods reveal protein function in
`the same manner that experimental geneticists do: by defin-
`ing the context that the protein operates in. Function is then
`determined from the pathway neighbors of a protein. For
`this reason, we might consider nonhomology methods to be
`computational genetics, a bioinformatics analysis that proceeds
`in a fashion analogous to experimental genetics.
`
`Petitioner Microsoft Corporation - Ex. 1037, p. 362
`
`

`

`Figure 4
`
`Current Opinion in Structural Biology
`
`The network of 12,012 functional relationships among 2240 proteins
`from yeast generated from protein phylogenetic profiles, showing links
`that occur with an expectation (E) value < 1 × 10–3. Each vertex
`represents a protein and each line represents a functional link,
`modeled as springs to position functionally related proteins close
`together in space [27••]. In this case, the phylogenetic profiles are
`calculated for n = 24 genomes and the E value of a link from protein A
`to protein B is calculated as p(A)•V(n,dAB)•N•C, where p(A) is the
`probability of observing the profile of protein A, V(n,dAB) is the volume
`of an n-dimensional hypersphere centered on A of radius dAB, N is the
`number of proteins with informative vectors and C is a scale factor. For
`comparison, the inset shows an experimentally derived network of
`protein–protein interactions from the Database of Interacting
`Proteins [35], courtesy of I Xenarios.
`
`In fact, the method of phylogenetic profiles [14•] is an
`exact computational equivalent of the experimental
`genetics approach of mapping a mutant gene’s phenotype
`to the gene. When we compare one organism with anoth-
`er, we can generalize each organism as having a collection
`of mutations, gene knockouts and extra genes relative to
`the other organisms. By grouping genes with similar phy-
`logenetic profiles, we are mapping genes that produce
`shared phenotypes (the genes are expressed or absent in
`the same sets of organisms) and are essentially perform-
`ing a standard genetic mapping. Of course, the
`experiment is performed computationally and in a mas-
`sively parallel form, but it is essentially the same analysis
`as in experimental genetics.
`
`Conclusions
`This past year has seen an explosion of new experimental
`and computational tools to identify protein function,
`including the development of ‘nonhomology’ computa-
`tional methods. These methods take advantage of the
`many properties shared among functionally related pro-
`teins, such as patterns of domain fusion, evolutionary
`co-inheritance, conservation of relative gene position and
`
`Nonhomology methods Marcotte 363
`
`correlated expression patterns. Such analyses, building on
`existing genomic sequence and expression data, allow the
`assignment of preliminary protein function on a genome-
`wide scale. Even more exciting is the potential for
`increasing the power of the methods as more genome
`sequences and expression libraries accumulate; for exam-
`ple, the number of possible phylogenetic profile vectors
`and, therefore, the potential to separate unrelated proteins
`grows on the order of 2n for n genomes. An important goal
`will be working out proper statistical evaluations of results
`from each of the methods.
`
`In the next year, we can expect these techniques to be
`integrated, for each genome, with homology- and struc-
`ture-derived protein functions, as well as with known
`experimental data, as researchers are beginning to extract
`experiments from the scientific literature into computer-
`analyzable databases, such as for protein–protein
`interactions [35,36], functional relationships derived from
`the co-occurrence of gene names in articles [37], metabol-
`ic pathways [38,39] and general gene function [40–42].
`Beyond even these, we can expect many new types of
`data, such as the recent genome-wide gene disruption phe-
`notypic studies of yeast [43••] and protein expression
`datasets, that should really open up the power of these
`methods and allow researchers to finely map many of the
`functions and relationships among the genes so tantaliz-
`ingly revealed in each newly sequenced genome.
`
`Acknowledgements
`This work was supported by a Department of Energy/Oak Ridge Institute
`for Science and Education Hollaender Distinguished Postdoctoral
`Fellowship and grants from the DOE. The author would like to thank
`David Eisenberg, Matteo Pellegrini, Michael Thompson, Todd Yeates and
`Ioannis Xenarios for support and fruitful scientific collaboration.
`
`References and recommended reading
`Papers of particular interest, published within the annual period of review,
`have been highlighted as:
`• of special interest
`•• of outstanding interest
`Perou CM, Jeffrey SS, van de Rijn M, Rees CA, Eisen MB, Ross DT,
`Pergamenschikov A, Williams CF, Zhu SX, Lee JC et al.: Distinctive
`gene expression patterns in human mammary epithelial cells
`and breast cancers. Proc Natl Acad Sci USA 1999,
`96:9212-9217.
`
`1.
`
`2. Martzen MR, McCraith SM, Spinelli SL, Torres FM, Field S,
`•
`Grayhack EJ, Phizicky EM: A biochemical genomics approach for
`identifying genes by the activity of their products. Science 1999,
`286:1153-1155.
`The genome-wide expression and purification of yeast proteins for associat-
`ing biochemical activities with specific yeast proteins is described.
`
`4.
`•
`
`3. Marcotte EM, Pellegrini M, Ng H-L, Rice DW, Yeates TO, Eisenberg D:
`•
`Detecting protein function and protein–protein interactions from
`genome sequences. Science 1999, 285:751-753.
`This paper shows that domain fusions can be used to predict functionally
`related and physically interacting proteins.
`Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA: Protein
`interaction maps for complete genomes based on gene fusion
`events. Nature 1999, 402:86-90.
`The authors developed a scoring scheme for domain fusion analysis that
`accurately predicts interacting proteins.
`Park J, Teichmann SA, Hubbard T, Chothia C: Intermediate
`sequences increase the detection of homology between
`sequences. J Mol Biol 1997, 273:249-254.
`
`5.
`
`Petitioner Microsoft Corporation - Ex. 1037, p. 363
`
`

`

`364 Sequences and topology
`
`8.
`
`6. Shirasu K, Lahaye T, Tan M-W, Zhou F, Azevedo C, Schulze-Lefert P:
`A novel class of eukaryotic zinc-binding proteins is required for
`disease resistance signaling in barley and development in
`C. elegans. Cell 1999, 99:355-366.
`7. Dandekar T, Snel B, Huynen M, Bork P: Conservation of gene order:
`•
`a fingerprint of proteins that physically interact. Trends Biochem
`Sci 1998, 23:324-328.
`The authors demonstrated that genes that are found as neighbors in multi-
`ple organisms often encode interacting proteins and suggested using con-
`served gene order to predict protein function.
`Tamames J, Casari G, Ouzounis C, Valencia A: Conserved clusters
`of functionally related genes in two bacterial genomes. J Mol Evol
`1997, 44:66-73.
`9. Overbeek R, Fonstein M, D’Souza M, Pusch GD, Maltsev N: The use
`•
`of gene clusters to infer functional coupling. Proc Natl Acad Sci
`USA 1999, 96:2896-2901.
`The authors describe a method that systematizes the analysis of the conser-
`vation of relative gene position among organisms for finding functionally
`related genes.
`10. Jacob F, Monod J: Genetic regulatory mechanisms in the synthe-
`sis of proteins. J Mol Biol 1961, 3:318-356.
`11. Laird AJ, Young IG: Tn5 mutagenesis of the enterochelin gene
`cluster of Escherichia coli. Gene 1980, 11:359-366.
`
`12. Hallas C, Pekarsky Y, Itoyama T, Varnum J, Bichi R, Rothstein JL,
`Croce CM: Genomic analysis of human and mouse TCL1 loci
`reveals a complex of tightly clustered genes. Proc Natl Acad Sci
`USA 1999, 96:14418-14423.
`13. Wu Q, Maniatis T: A striking organization of a large family of human
`neural cadherin-like cell adhesion genes. Cell 1999, 97:779-790.
`
`14. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO:
`•
`Assigning protein functions by comparative genome analysis: protein
`phylogenetic profiles. Proc Natl Acad Sci USA 1999, 96:4285-4288.
`This paper shows that functional relationships among proteins can be iden-
`tified by analyzing protein co-inheritance.
`15. Karzai AW, Susskind MM, Sauer RT: SmpB, a unique RNA-binding
`protein essential for the peptide-tagging activity of SsrA (tmRNA).
`EMBO J 1999, 18:3793-3799.
`16. Huynen M, Dandekar T, Bork P: Differential genome analysis
`applied to the species-specific features of Helicobacter pylori.
`FEBS Lett 1998, 426:1-5.
`17. Lichtarg O, Bourne HR, Cohen FE: An evolutionary trace method
`defines binding surfaces common to protein families. J Mol Biol
`1996, 257:342-358.
`18. Eisen JA: A phylogenomic study of the MutS family of proteins.
`Nucleic Acids Res 1998, 26:4291-4300.
`
`19. Chervitz SA, Aravind L, Sherlock G, Ball CA, Koonin EV, Dwight SS,
`Harris MA, Dolinski K, Mohr S, Smith T et al.: Comparison of the
`complete protein sets of worm and yeast: orthology and
`divergence. Science 1998, 282:2022-2028.
`20. Tatusov RL, Koonin EV, Lipman DJ: A genomic perspective on
`protein families. Science 1997, 278:631-637.
`21. Zhang MQ: Large-scale gene expression data analysis: a new
`challenge to computational biologists. Genome Res 1999, 9:681-688.
`22. Brown PO, Botstein D: Exploring the new world of the genome
`with DNA microarrays. Nat Genet 1999, 21:33-37.
`
`23. Lashkari DA, De Risi JL, McCusker JH, Namath AF, Gentile C,
`Hwang SY, Brown PO, Davis RW: Yeast microarrays for genome
`wide parallel genetic and gene expression analysis. Proc Natl
`Acad Sci USA 1997, 94:13057-10362.
`24. Velculescu VE, Zhang L, Vogelstein B, Kinzler KW: Serial analysis of
`gene expression. Science 1995, 270:484-487.
`
`25. Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH,
`Xiao H, Merril CR, Wu A, Olde B, Moreno RF et al.: Complementary
`DNA sequencing: expressed sequence tags and human genome
`project. Science 1991, 252:1651-1656.
`26. Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and
`••
`display of genome-wide expression patterns. Proc Natl Acad Sci
`USA 1998, 95:14863-14868.
`The authors recognized that functionally related genes could be grouped by
`their mRNA expression patterns and introduced a visual presentation that
`has now become common in the field.
`
`27. Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D:
`••
`A combined algorithm for genome-wide prediction of protein
`function. Nature 1999, 402:83-86.
`A synthesis of nonhomology methods allowing genome-wide prediction of
`protein function in yeast.
`28. Walker MG, Volkmuth W, Klingler TM: Pharmaceutical target
`•
`discovery using Guilt-by-Association: schizophrenia and
`Parkinson’s disease genes. In Proceedings of the 7th International
`Conference on Intelligent Systems for Molecular Biology: 1999
`August 6–10; Heidelberg, Germany. Edited by Lengauer T,
`Schneider R, Bork P, Brutlag D, Glasgow J, Mewes H-W, Zimmer R.
`Menlo Park, California: AAAI Press; 1999:282-286.
`A method for discovering functionally related genes by analysis of their co-
`occurrence in expressed sequence tag libraries is described.
`29. Gygi SP, Rochon Y, Franza BR, Aebersold R: Correlation between
`•
`protein and mRNA abundance in yeast. Mol Cell Biol 1999,
`19:1720-1730.
`This paper provides the interesting observation that protein levels and mRNA
`levels are often poorly correlated, often varying by up to 30-fold in a test of
`about 300 sequences, providing the promise that protein coexpression may
`surpass mRNA coexpression for clustering genes of related function.
`30. Ducret A, Oostveen IV, Eng JK, Yates JR, Aebersold R: High
`throughput protein characterization by automated reverse-phase
`chromatography/electrospray tandem mass spectrometry. Protein
`Sci 1998, 7:706-719.
`
`31. Garrels JI, Futcher B, Kobayashi R, Latter GI, Schwender B, Volpe T,
`Warner JR, McLaughlin CS: Protein identification for a
`Saccharomyces cerevisiae protein database. Electrophoresis 1994,
`15:1466-1486.
`
`32. Neubauer G, King A, Rappsilber J, Calvio C, Watson M, Ajuh P,
`•
`Sleeman J, Lamond A, Mann M: Mass spectrometry and EST-
`database searching allows characterization of the multi-protein
`spliceosome complex. Nat Genet 1998, 20:46-50.
`A model example of the rapid identification of components of protein
`complexes.
`
`33. Gygi SP, Rist B, Gerber SA, Turacek F, Gelb MH, Aebersold R:
`•
`Quantitative analysis of complex protein mixtures using isotope-
`coded affinity tags. Nat Biotech 1999, 17:994-999.
`A potentially important paper describing the accurate, high-throughput mea-
`surement of protein expression patterns.
`
`34. Gawantka V, Pollet N, Delius H, Vingron M, Pfister R, Nitsch R,
`•
`Blumenstock C, Niehrs C: Gene expression screening in Xenopus
`identifies molecular pathways, predicts gene function and
`provides a global view of embryonic patterning. Mech Dev 1998,
`77:95-141.
`This paper demonstrates the analysis of the tissue expression patterns of
`1765 cDNAs by in situ hybridization in Xenopus embryos; each gene’s
`expression pattern is photographed and cataloged in a database, allowing
`the clustering of genes with similar spatial expression patterns.
`
`35. Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM,
`Eisenberg D: DIP: the Database of Interacting Proteins. Nucleic
`Acids Res 2000, 28:289-291.
`36. Blaschke C, Andrade MA, Ouzounis C, Valencia A: Automatic
`extraction of biological information from scientific text:
`protein–protein interactions. In Proceedings of the 7th International
`Conference on Intelligent Systems for Molecular

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket