throbber
Comparative sequencing provides insights about the
`structure and conservation of marsupial and
`monotreme genomes
`
`Elliott H. Margulies*, NISC Comparative Sequencing Program*†‡, Valerie V. B. Maduro*, Pamela J. Thomas†,
`Jeffery P. Tomkins§, Chris T. Amemiya¶, Meizhong Luo储, and Eric D. Green*†**
`
`*Genome Technology Branch and †NISC, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892; §Clemson
`University Genomics Institute, Department of Genetics and Biochemistry and Life Science Studies, Clemson University, Clemson, SC 29634; ¶Benaroya
`Research Institute at Virginia Mason, Seattle, WA 98101; and 储Arizona Genomics Institute, Department of Plant Sciences, University of Arizona,
`Tucson, AZ 85721
`
`Communicated by Francis S. Collins, National Institutes of Health, Bethesda, MD, November 18, 2004 (received for review August 30, 2004)
`
`Sequencing and comparative analyses of genomes from multiple
`vertebrates are providing insights about the genetic basis for
`biological diversity. To date, these efforts largely have focused on
`eutherian mammals, chicken, and fish. In this article, we describe
`the generation and study of genomic sequences from noneuther-
`ian mammals, a group of species occupying unusual phylogenetic
`positions. A large sequence data set (totaling >5 Mb) was gener-
`ated for the same orthologous region in three marsupial (North
`American opossum, South American opossum, and Australian
`tammar wallaby) and one monotreme (platypus) genomes. These
`ancient mammalian genomes are characterized by unusual archi-
`tectural features with respect to G ⴙ C and repeat content, as well
`as compression relative to human. Approximately 14% and 34% of
`the human sequence forms alignments with the orthologous
`sequence from platypus and the marsupials, respectively; these
`numbers are distinctly lower than that observed with nonprimate
`eutherian mammals (45–70%). The alignable sequences between
`human and each marsupial species are not completely overlapping
`(only 80% common to all three species) nor are the platypus-
`alignable sequences completely contained within the marsupial-
`alignable sequences. Phylogenetic analysis of synonymous coding
`positions reveals that platypus has a notably long branch length,
`with the human–platypus substitution rate being on average 55%
`greater than that seen with human–marsupial pairs. Finally, anal-
`yses of the major mammalian lineages reveal distinct patterns with
`respect to the common presence of evolutionarily conserved ver-
`tebrate sequences. Our results confirm that genomic sequence
`from noneutherian mammals can contribute uniquely to unravel-
`ing the functional and evolutionary histories of the mammalian
`genome.
`
`comparative genomics 兩 genome sequencing 兩 genome analysis 兩
`phylogenetics 兩 mammalian evolution
`
`Comparisons of genome sequences from evolutionarily di-
`
`3354 –3359 兩 PNAS 兩 March 1, 2005 兩 vol. 102 兩 no. 9
`
`analyses, the optimal phylogenetic distances among species vary,
`depending on the question(s) being addressed [with the distance
`between humans and eutherian mammals sometimes being too
`close, and that between humans and birds (or fish) sometimes
`being too far].
`Within this large phylogenetic gap between eutherian mam-
`mals and birds reside the marsupials and monotremes (12, 13).
`These metatherian and prototherian mammals diverged before
`the eutherian radiation, estimated at 185 and 200 million years
`ago (mya), respectively (14). Indeed, these divergence dates, as
`well as the origins of prototherian mammals relative to met-
`atherian mammals, remain a source of scientific debate, in part
`because of insufficient molecular data (13, 15–17). Until re-
`cently, very little marsupial or monotreme DNA sequence was
`available in public databases. Although comparative studies
`involving small amounts of genomic sequence from a marsupial
`species [the stripe-faced dunnart (Sminthopsis macroura)] have
`been described (18), no comparisons involving large, contiguous
`blocks of marsupial or monotreme sequence have been reported
`to date.
`In this article, we present the results of comparative sequence
`analyses involving ⬎5 Mb of sequence from four noneutherian
`mammals. Specifically, we describe the features of their ge-
`nomes, provide insights about their phylogenetic relationships,
`and reveal similarities and differences among mammalian lin-
`eages with respect to the presence of evolutionarily conserved
`vertebrate sequences.
`
`Materials and Methods
`Genomic Sequence Data Set. Genomic segments orthologous to a
`1.9-Mb region on human chromosome 7q31.3, encompassing the
`
`Freely available online through the PNAS open access option.
`
`Abbreviations: NISC, National Institutes of Health Intramural Sequencing Center; mya,
`million years ago; N.A., North American; S.A., South American; BAC, bacterial artificial
`chromosome; TBA, THREADED BLOCKSET ALIGNER; MCS, multispecies conserved sequence; 4D,
`4-fold degenerate; SINEs, short interspersed nucleotide elements; LINEs, long interspersed
`nucleotide elements.
`
`Data deposition: The sequences reported in this paper have been deposited in the GenBank
`database [accession nos. AC127465, AC129065, AC129066, AC129885, AC142561,
`AC144364, AC144365, AC144600, AC144690, AC144691, AC144755, and AC144756 (N.A.
`opossum); AC147869, AC147870, AC147871, AC147872, AC147873, AC147874, AC148151,
`and AC148214 (S.A. opossum); AC127464, AC129882, AC129883, AC129884, AC130185,
`AC138553, AC144363, AC144689, AC144753, AC144754, AC144788, AC146535, and
`AC146754 (platypus); and AC145041, AC145042, AC145183, AC145184, AC145249,
`AC145250, AC145407, AC145408, AC145409, and AC145841 (wallaby)]. See Table 3, which
`is published as supporting information on the PNAS web site for specific versions of all
`GenBank accession nos. used in this study.
`‡National Institutes of Health Intramural Sequencing Center (NISC) Comparative Sequenc-
`ing Program: Leadership provided by Robert W. Blakesley, Gerard G. Bouffard, Nancy F.
`Hansen, Baishali Maskeri, and Jennifer C. McDowell.
`
`**To whom correspondence should be addressed at: National Human Genome Research
`Institute, National Institutes of Health, 50 South Drive, Building 50, Room 5222, Be-
`thesda, MD 20892. E-mail: egreen@nhgri.nih.gov.
`
`© 2005 by The National Academy of Sciences of the USA
`
`verse species are central to decoding the functions of
`vertebrate genomes (1). Of particular interest is the use of highly
`diverged species for detecting and characterizing sequences
`under purifying selection (2). Large-scale sequence comparisons
`have been reported for eutherian (commonly referred to as
`‘‘placental’’) mammals (3) or fish (4), with the most detailed
`studies to date emphasizing human–rodent comparisons (5, 6).
`We previously described our efforts to sequence the same
`orthologous regions from large collections of vertebrates (7, 8)
`and to perform multispecies sequence comparisons (9). These
`analyses have helped to refine phylogenetic relationships (7), to
`gain insight about the mutational process (10, 11), and to reveal
`differences between eutherian mammals and other vertebrates
`(e.g., birds and fish) with respect to their utility for detecting
`highly conserved regions in the human genome (9). However,
`these studies also demonstrate that for comparative sequence
`SEQUENOM EXHIBIT 1091
`www.pnas.org兾cgi兾doi兾10.1073兾pnas.0408539102
`Sequenom v. Stanford
`SEQUENOM EXHIBIT 1091
`IPR2013-00390
`
`

`

`Table 1. General characteristics of comparative sequence data set
`
`Species
`
`N.A. opossum
`S.A. opossum
`Wallaby
`Platypus
`
`No. sequenced
`BACs
`
`No. sequencing
`gaps*
`
`No. mapping
`gaps†
`
`Total nonredundant
`sequence, Mb
`
`Amount relative to
`human,‡ Mb
`
`12
`8
`10
`13
`
`3
`7
`5
`0
`
`3
`7
`5
`0
`
`1.63
`1.17
`1.35
`1.26
`
`1.36
`1.19
`1.18
`1.65
`
`*Gaps reflecting missing sequence in the assembly of shotgun sequence data from an individual BAC; these are typically 100 bp or less.
`See the supplement in ref. 7 for details.
`†Gaps reflecting the lack of BAC coverage across an interval. See the supplement in ref. 7 for details.
`‡The amount of human sequence in or between pair-wise alignments for the covered portions of each species’ sequence; this value
`includes an estimate of sequence that might be proximal to the first and distal to the last alignment (utilizing the estimated degree
`of compression relative to human for that species).
`
`GENETICS
`
`cystic fibrosis transmembrane conductance regulator (CFTR)
`gene (referred to as the ‘‘greater CFTR region’’), were isolated
`from the North American (N.A.) opossum, South American
`(S.A.) opossum, Australian tammar wallaby, and duckbilled
`platypus, and the segments were subjected to shotgun sequenc-
`ing, as detailed in the supporting information, which is published
`on the PNAS web site. Sequences from an additional 23 verte-
`brates were generated and used for comparative analyses; the
`sequence data [including a listing of individual GenBank records
`for each bacterial artificial chromosome (BAC), assimilated and
`annotated sequences for each species, and multispecies sequence
`alignments (see below)] are available in the supporting infor-
`mation and at www.nisc.nih.gov兾data.
`
`Repeat Identification. Repetitive elements in noneutherian mam-
`malian sequences were identified by using a RECON-based ap-
`proach (19), as described in the supporting information. Impor-
`tantly, this approach was tuned to correctly detect repetitive
`elements in the human sequence at high specificity (99.8%) but
`at the cost of a lower sensitivity (63%). In turn, the identified
`repeats were used with REPEATMASKER (July 13, 2002; www.re-
`peatmasker.org) and the standard REPEATMASKER mammalian
`repeat libraries to detect and mask all repetitive sequences. This
`process involved adding the identified repeats in the noneuth-
`erian mammalian sequence to the standard artiodactyl repeat
`library and then running REPEATMASKER with the ⫺cow option.
`
`Generation and Characterization of Sequence Alignments. A multi-
`sequence alignment of the assembled sequences from 27
`vertebrates was generated by using the THREADED BLOCKSET
`ALIGNER (TBA) (20). The resulting alignment then was ‘‘pro-
`jected’’ onto the human reference sequence for subsequent
`analyses (see the supporting information for details). A por-
`tion of the sequenced interval (541 kb distributed across nine
`distinct regions; see the supporting information) was selected
`where there was complete sequence coverage in a subset of
`species (chimpanzee, cat, cow, mouse, wallaby, N.A. opossum,
`S.A. opossum, platypus, and chicken). For each human–species
`pair-wise combination, the number of human-referenced po-
`sitions of TBA-aligned bases was determined; these data then
`were used to calculate the number of bases in alignments for
`each human–species combination.
`
`Estimating Phylogenetic Branch Lengths. A ‘‘virtual’’ multisequence
`alignment consisting solely of synonymous [4-fold degenerate
`(4D)] coding positions was generated by using the human-
`referenced annotations. Sites that fell within sequence gaps or
`that were no longer synonymous (because of changes in the first
`two bases) were treated as missing data. Substitution rates were
`estimated from this multisequence alignment by maximum like-
`lihood with the PHAST package (21). A generally accepted tree
`topology for the analyzed species was used (7, 22). The most
`
`general reversible substitution model (REV) was used, and no
`molecular clock was assumed. Errors associated with the result-
`ing branch length calculations were estimated by bootstrapping
`(both nonparametric and parametric methods; see the support-
`ing information), with the tree topology fixed.
`
`Examining Lineage Specificity of Multispecies Conserved Sequences
`(MCSs). MCSs were identified by using the multisequence align-
`ment generated with sequences from 27 vertebrate species (8).
`A portion of the sequenced interval (571 kb distributed across
`seven separate regions; see the supporting information) was
`selected where there was complete sequence coverage in a subset
`of species (cat, dog, cow, pig, rat, mouse, N.A. opossum, wallaby,
`and platypus). Note that this limited data set is distinct from the
`one above used for characterizing the multisequence alignments.
`Each of the nine species’ sequences was analyzed for the
`presence of the above-identified MCSs; specifically, each MCS
`in the relevant interval was scored as being present or absent
`based on BLASTZ analysis (see the supporting information).
`
`Results
`Comparative Sequence Data Set. We generated large blocks of
`high-quality sequence from three marsupial species (N.A. opos-
`sum, S.A. opossum, and wallaby) and one monotreme species
`(platypus). All sequences correspond to genomic segments
`orthologous to the greater CFTR region on human chromosome
`7q31.3 (7), with 1.17–1.63 Mb of nonredundant sequence gen-
`erated from each species (Table 1). Based on comparisons with
`available genome-wide human (23), mouse (5), and rat (6)
`sequence, the greater CFTR region is close to average with
`respect to general genomic properties (e.g., repeat content,
`G ⫹ C content, fraction of coding sequence, and synonymous
`substitution rate). The resulting sequences from the four non-
`eutherian mammals were analyzed individually and also com-
`pared with corresponding sequences from 23 additional verte-
`brates (7, 8).
`
`Genomic Architecture. Analysis of the orthologous genes in this
`region reveals no gross differences in the content, order, orien-
`tation, or intron-exon structure between human and the non-
`eutherian mammals (note that there are two instances of a
`missing exon within noneutherian sequence, but these appear to
`be due to gaps in sequence coverage; data not shown). However,
`examination of several architectural features associated with
`each species’ sequence uncovered a number of differences. For
`example, the size of this genomic region (relative to human)
`varies by as much as 24% among the noneutherian mammals
`(Table 2). Specifically, evidence of both genome compression
`(e.g., 24% in platypus) and expansion (e.g., 17% and 15% in N.A.
`opossum and wallaby, respectively) is seen; these findings are
`generally consistent with previous estimates of genome sizes
`(refs. 24 and 25; also see www.genomesize.com).
`
`Margulies et al.
`
`PNAS 兩 March 1, 2005 兩 vol. 102 兩 no. 9 兩 3355
`
`

`

`Table 2. Architectural features of different species’ sequences
`G ⫹ C content*
`
`Species
`
`Human
`Cat
`Pig
`Mouse
`N.A. opossum
`S.A. opossum
`Wallaby
`Platypus
`Chicken
`Fugu
`
`Total
`
`0.384
`0.383
`0.377
`0.401
`0.358
`0.358
`0.373
`0.459
`0.412
`0.486
`
`Nonrepetitive
`sites
`
`Synonymous
`4D sites
`
`Relative
`size†
`
`Percentage
`repetitive‡
`
`0.369
`0.372
`0.366
`0.391
`0.358
`0.358
`0.374
`0.457
`0.407
`0.485
`
`0.432
`0.434
`0.455
`0.479
`0.415
`0.380
`0.412
`0.642
`0.423
`0.721
`
`NA
`0.95
`0.92
`0.90
`1.17
`0.99
`1.15
`0.76
`0.44
`0.16
`
`40.3
`36.4
`31.9
`32.6
`43.2
`34.2
`37.0
`44.9
`6.0
`2.3
`
`Boldface indicates the data for noneutherian mammals.
`*Fraction of G ⫹ C bases in the entire sequence (total), the nonrepetitive portion of sequence (i.e., sequence not
`masked by REPEATMASKER), and synonymous 4D sites (the third position of codons that can be any base and still
`code for the same amino acid).
`†Ratio of sequence length in each species to the amount of corresponding human sequence (as defined in Table 1).
`‡Percentage of sequence masked by REPEATMASKER.
`
`The asserted correlation between genome size and repeat
`content (4, 26) prompted us to investigate the amount and
`composition of repetitive elements within each species’ se-
`quence. Because repetitive sequences in noneutherian mammals
`have not been fully characterized, this analysis first required
`assembling repeat libraries for each marsupial and monotreme
`species (see Materials and Methods). Fig. 1 shows a summary of
`the content and types of repeats in each species’ sequence, with
`data from several other vertebrates provided for comparison.
`Note the considerable variation in total repeat content among
`these species and the lack of correlation with genome size
`
`Comparison of the content and types of repetitive elements among
`Fig. 1.
`different species’ sequences. Sequences from the orthologous regions of the
`indicated species’ genomes were analyzed by REPEATMASKER, allowing detec-
`tion and quantification of the indicated types of repetitive elements. The data
`for the noneutherian mammals are highlighted for emphasis. SINEs, short
`interspersed nucleotide elements; LINEs,
`long interspersed nucleotide
`elements.
`
`(relative to human; see Table 2). Specifically, the orthologous
`platypus genomic region is smaller than the human region yet
`contains a larger proportion of repetitive sequences; similarly,
`the wallaby genomic region is larger than the human region yet
`contains a smaller proportion of repetitive sequences. Another
`finding is the relatively large proportion of short interspersed
`nucleotide elements (SINEs) in the platypus sequence (27, 28),
`markedly different from other vertebrate sequences. The latter
`is consistent with the PCR-based identification of an abundant
`SINE repeat within monotreme genomes (J. A. M. Graves and
`P. J. Kirby, personal communication).
`The overall G ⫹ C content is similar among the three mar-
`supial sequences (35.8–37.3%; see Table 2), which is slightly
`lower than that of the orthologous human genomic region
`(38.4%). In contrast, the overall G ⫹ C content of the platypus
`sequence is notably high (45.9%), more like that seen with the
`orthologous Fugu genomic region (48.6%). A similarly high
`G ⫹ C content for platypus is seen in the nonrepetitive sites and
`at synonymous 4D sites (see Table 2). Examining the distribution
`of G ⫹ C content in 1-kb windows across the noneutherian
`sequences reveals the same general trends (see the supporting
`information).
`
`Multispecies Sequence Comparisons. Analyses of a multisequence
`alignment generated by using data from 27 vertebrates revealed
`notable patterns of sequence conservation. For example, the
`fraction of the human sequence forming alignments with non-
`primate eutherian mammals is typically 45–70% (Fig. 2A) (7);
`these alignments include both neutrally evolving and functionally
`constrained portions of the sequence. This fraction of alignable
`sequence is significantly lower for the noneutherian mammals
`(14–34%), with the decrease mostly reflecting fewer alignments
`within nonannotated regions (i.e., those reflecting sequences not
`thought to be genes or repeats). A substantially larger amount of
`noneutherian sequence could be aligned to the human sequence
`by generating a true multisequence alignment with the program
`TBA (20) as opposed to simple pair-wise alignments (Fig. 2 A,
`purple bars). In the case of eutherian mammals (where no such
`difference is seen), it is thought that both pair-wise and multi-
`sequence alignments contain virtually all neutrally evolving
`sequence (5). However, with the noneutherian mammals, the
`dramatic difference likely reflects a larger amount of neutrally
`evolving sequence within the multisequence alignment; it re-
`
`3356 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.0408539102
`
`Margulies et al.
`
`

`

`Phylogenetic tree of vertebrate species. By using the generated
`Fig. 3.
`27-species multisequence alignment, branch lengths were calculated based on
`analysis of synonymous coding positions. The branch lengths (as substitutions
`per synonymous site) between human and each species are listed (with
`additional pair-wise branch lengths provided in the supporting information).
`The last common ancestor among the catarrhine primates (A) is estimated at
`25 mya (36, 37), between the rodents and primates (B) at 75 mya (5, 6),
`between eutherians and metatherians (C) at 185 mya (14), between
`monotremes and other therians (D) at 200 mya (14), and between mammals
`and birds (E) at 310 mya (13).
`
`GENETICS
`
`quences may represent neutrally evolving, lineage-specific inser-
`tions and deletions.
`To better understand the phylogenetic relationships among
`the noneutherian mammals, as well as their relationship to other
`vertebrate species, we calculated the substitution rates at syn-
`onymous coding positions within the multisequence alignment.
`These rates then were used to scale the branch lengths of the
`phylogenetic tree depicted in Fig. 3; note that the total branch
`lengths between human and each species also are indicated (with
`all possible pair-wise branch lengths provided as supporting
`information). The synonymous substitution rate (per site) be-
`tween the two opossum species is 0.09, whereas that between
`wallaby and either opossum species is 0.18. These rates are
`similar to those observed with primate–primate comparisons.
`Interestingly, platypus has a notably long branch length, with the
`platypus–marsupial substitution rate averaging 0.85. Also note
`that the human–platypus substitution rate is 55% higher (on
`average) than that for all human–marsupial pairs, providing
`
`Patterns of sequence conservation among different vertebrates. (A)
`Fig. 2.
`The fraction of human sequence forming alignments with sequences from
`each of the indicated species is shown, broken down for four annotated
`categories. The additional alignable sequence (indicated in purple, see text for
`details) found exclusive to the TBA-generated multisequence alignment (20)
`falls largely within nonexonic regions. For data that include a larger set of
`vertebrates, see the supporting information. (B) The relationships between
`the fraction of human sequence aligned and estimated branch length from
`human (calculated as substitutions per site) are shown for the indicated
`vertebrate species.
`
`mains to be determined whether this accounts for all neutrally
`evolving sequence.
`We examined more closely the relationships among the hu-
`man-alignable portions of each species’ sequence, focusing our
`analyses on a 571-kb portion of the region with complete
`sequence coverage in a representative subset of species (see
`Materials and Methods). Although each marsupial sequence
`individually aligns with ⬇34% of the human sequence, only 27%
`of the human sequence aligns with all three marsupial sequences,
`indicating that the human-alignable portions of each marsupial
`sequence are not completely overlapping. Similarly, whereas the
`platypus sequence aligns with ⬇14% of the human sequence,
`only 11% of the human sequence aligns with all four noneuth-
`erian sequences, indicating that 21% of the human sequence that
`aligns with the platypus sequence is distinct from that aligning
`to all three marsupial sequences. These results demonstrate that
`the human-alignable sequence from more distantly related spe-
`cies is not fully contained within that from more closely related
`species. This finding also was observed with additional combi-
`nations of species (i.e., cat and mouse, but not cow; see the
`supporting information). These nonoverlapping alignable se-
`
`Margulies et al.
`
`PNAS 兩 March 1, 2005 兩 vol. 102 兩 no. 9 兩 3357
`
`

`

`further evidence for the considerable divergence of monotremes
`relative to both the marsupial and eutherian mammals (16). The
`synonymous substitution rates we calculated for the mouse and
`rat sequences are similar to the genome-wide estimates (5, 6),
`whereas that for the chicken sequence is substantially lower than
`the genome-wide estimate (29). The latter is likely attributable
`to differences in the methods and assumptions used and兾or
`characteristics of the respective data sets (i.e., pair-wise whole-
`genome analyses vs. multisequence targeted analyses).
`These findings reinforce the distinct phylogenetic positions of
`marsupials and monotremes within the vertebrate and mamma-
`lian radiations (12, 13). In addition, the simultaneous examina-
`tion of alignment and branch length properties of each species’
`sequence compared to human (Fig. 2B) reveals a clear grouping
`of the marsupials at an intermediate position between the
`eutherian mammals and birds, consistent with the purported
`phylogenetic relationships. In contrast, the grouping of platypus
`and chicken in this analysis is surprising based on the significant
`evolutionary distance thought to separate these species (30, 31).
`
`Presence of Evolutionarily Conserved Sequences in Different Lineages.
`The unique genomic properties of marsupials and monotremes
`make their sequences of particular interest for identifying and
`characterizing the small portion of the mammalian genome
`under purifying selection (5, 32, 33). We previously described an
`approach for using sequences from multiple vertebrates to detect
`evolutionarily conserved sequences in the human genome
`(called MCSs) and demonstrated that different species’ se-
`quences vary greatly in their relative contribution to the iden-
`tification of MCSs (7–9).
`Given the diverse representation of mammalian species in our
`sequence data set, especially with the inclusion of metatherian
`and prototherian sequences, we next investigated the presence of
`MCSs among the different mammalian lineages. For this anal-
`ysis, we studied a set of 418 MCSs falling within a 571-kb portion
`of the targeted genomic region where there was complete
`sequence coverage from cat and dog (carnivores), cow and pig
`(artiodactyls), rat and mouse (rodents), N.A. opossum and
`wallaby (marsupials), and platypus (monotreme). Note that S.A.
`opossum sequence was not included in this analysis, so that each
`lineage would be represented by two species (except
`monotremes, where only one species was available). The pres-
`ence or absence of each of the 418 MCSs in each species’
`sequence was determined based on whether there was a human–
`species sequence alignment that overlapped that MCS in the
`human sequence (note that virtually all such alignments reflect
`high levels of sequence identity). Although virtually all 58 MCSs
`overlapping coding regions and 46 MCSs overlapping UTRs are
`present in all species, the remaining noncoding MCSs show
`interesting patterns of conservation (Fig. 4; also see the sup-
`porting information for additional details).
`Just over one-half (52%) of the human-referenced noncoding
`MCSs are present in all nine nonhuman mammals analyzed.
`These regions thus represent the most anciently constrained
`sequences in the mammalian lineage. An additional 3.8% of the
`MCSs are present in all mammals except one or both rodents;
`this could be due to the known high deletion rate in the rodent
`lineage (5) or imprecision of current MCS-detection methods.
`An additional 17% of MCSs are present in all mammals except
`monotremes, with an additional 2% present in all mammals
`except monotremes and both rodents. The other major combi-
`nations are MCSs in all mammals except N.A. opossum (4.5%),
`in all mammals except N.A. opossum and platypus (4.5%), and
`in all eutherian mammals (4.0%). Together, these data provide
`evidence for lineage specificity with respect to the presence of
`evolutionarily conserved sequences in the human genome.
`
`Lineage specificity of MCSs. The proportion of nonexonic MCSs found
`Fig. 4.
`in the sequences of species in each category is indicated. Note that virtually all
`MCSs overlapping known exonic sequences are present in all mammals (data
`not shown). All Mammals: cat, dog, cow, pig, rat, mouse, N.A. opossum,
`wallaby, and platypus; Eutherian: cat, dog, cow, pig, rat, and mouse; Marsu-
`pials: N.A. opossum and wallaby; and Other: species combinations containing
`⬍2% of the analyzed MCSs (see the supporting information for the complete
`data set). Hashed areas of ‘‘All Mammals’’ reflect portions lacking one or both
`rodents, and hashed portions of ‘‘Eutherian ⫹ Marsupials’’ reflect portions
`lacking both rodents.
`
`Discussion
`Phylogenetic diversity is an important component of compara-
`tive genomic studies (8, 34). To date, the comparative sequencing
`of mammalian genomes largely has involved species within the
`eutherian radiation, each contributing relatively short branch
`lengths. Although short branch lengths allow for accurate se-
`quence alignments, many species’ sequences then are needed to
`identify those bases under purifying selection. The more di-
`verged metatherian and prototherian mammals contribute
`longer branch lengths, making their sequences particularly valu-
`able for identifying genomic regions under purifying selection,
`while still allowing for reliable alignments to the human se-
`quence. The latter has been challenging with nonmammalian
`vertebrates, such as chicken and fish (W. Miller, personal
`communication).
`Here, we report the large-scale generation and comparative
`studies of genome sequences from noneutherian mammals. This
`initial in-depth glimpse revealed several intriguing properties of
`these species’ genomes. The platypus genome, which, at least for
`the region studied, shows: (i) ⬇25% compression relative to the
`human genome; (ii) an unusually high G ⫹ C content for a
`mammal; (iii) a disproportionately high fraction of SINEs among
`its repetitive sequences; (iv) a notably low fraction of human-
`alignable sequence (14% compared with 34% for marsupials);
`and (v) a markedly long branch length revealed by phylogenetic
`analyses. Interestingly, these last two properties of platypus are
`quite similar to those of chicken (see Fig. 2B), despite the large
`difference in their evolutionary distances from human [esti-
`mated at 200 versus 310 mya, respectively (12–14)]. Although the
`long branch length for platypus is intriguing, it was calculated by
`using the reversible substitution model (REV), which assumes
`similar nucleotide composition among analyzed sequences. Be-
`cause this is not the case for platypus (Table 2), and because the
`synonymous 4D sites analyzed in this study might not be entirely
`neutrally evolving, caution should be used in making strong
`claims about the phylogenetic position of monotremes based on
`our data. Finally, it is interesting to note that the observed
`compression of the platypus genome (relative to human) cannot
`be explained fully by differences in gene or repeat content. The
`evolutionary events that led to this relative compression are not
`
`3358 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.0408539102
`
`Margulies et al.
`
`

`

`GENETICS
`
`obvious from the analyses performed here; however, more
`detailed examination of larger data sets of platypus sequence,
`with particular emphasis on cataloging repetitive versus nonre-
`petitive sequences and searching for evidence of insertions and
`deletions, should shed light on this issue.
`It is interesting to note that we were able to align a greater
`amount of sequence by using a multisequence alignment tool
`[TBA (20)] compared to simpler pair-wise alignment methods.
`Importantly, this enhancement was most evident with the se-
`quences from the noneutherian mammals, which showed roughly
`a 2-fold increase in the fraction of human-alignable sequence
`(purple portion of bars in Fig. 2 A). Similar improvements likely
`would enhance comparative sequence analyses involving more
`distantly related, nonmammalian vertebrates (e.g., birds, rep-
`tiles, and fish). At the same time, the observed increase in
`alignability in part reflected the large number of species’ se-
`quences being studied (a total of 27); the minimal number and
`phylogenetic characteristics of mammalian species required for
`such enhanced alignments remain to be established.
`Analyses of the multisequence alignment revealed that the
`14% of the human sequence that aligns with the platypus
`sequence is not completely contained within the larger fraction
`of the human sequence that aligns with all three marsupial
`sequences. Similar situations were encountered among the sim-
`ilarly diverged marsupials as well as other combinations of
`eutherians and nonmammals (see the supporting information).
`Although there is a general trend that alignments of more
`diverged sequences are contained within the alignments of more
`closely related sequences, significant exceptions emerge that may
`point to lineage-specific aspects of genome evolution.
`Our studies confirm that sequences from noneutherian mam-
`mals will play an important role in identifying evolutionarily
`conserved regions of the human genome, which is important for
`establishing a comprehensive catalog of all functional genomic
`
`1. Nobrega, M. A. & Pennacchio, L. A. (2004) J. Physiol. 554, 31–39.
`2. Cooper, G. M. & Sidow, A. (2003) Curr. Opin. Genet. Dev. 13, 604–610.
`3. Ureta-Vidal, A., Ettwiller, L. & Birney, E. (2003) Nat. Rev. Genet. 4, 251–262.
`4. Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.-M., Dehal, P.,
`Christoffels, A., Rash, S., Hoon, S., Smit, A., et al. (2002) Science 297,
`1301–1310.
`5. International Mouse Genome Sequencing Consortium (2002) Nature 420,
`520–562.
`6. Rat Genome Sequencing Project Consortium (2004) Nature 428, 493–521.
`7. Thomas, J. W., Touchman, J. W., Blakesley, R. W., Bouffard, G. G., Beckstrom-
`Sternberg, S. M., Margulies, E. H., Blanchette, M., Siepel, A. C., Thomas, P. J.,
`McDowell, J. C., et al. (2003) Nature 424,

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket